Extension Points
The benchmark provides several extension points for researchers:- RAG Architectures: Add novel retrieval strategies (
~/workspace/source/src/rag/) - Model Integration: Test new LLMs and SLMs (
~/workspace/source/src/common/model_provider.py) - Evaluation Metrics: Extend RAGAS evaluation (
~/workspace/source/src/evaluation/ragas_evaluator.py) - Data Sources: Integrate new medical corpora (
~/workspace/source/data/)
Adding New Data Sources
The benchmark uses a medical corpus on pregnancy and childbirth. To add new data sources:Prepare Document Chunks
Create a JSON file in Each chunk should contain:
data/chunks/ with the following structure:content: The text content of the chunksource: Original document filenamepage_number: Page number in source documentchunk_id: Unique identifier for the chunk
Generate Embeddings
Run the embedding generation script to create vector representations:This will:
- Load chunks from
data/chunks/chunks_final.json - Generate embeddings using OpenAI’s
text-embedding-3-small - Store them in ChromaDB at
data/embeddings/chroma_db/
Update Collection Name
If using a different medical domain, update the collection name in RAG implementations:
Modifying Retrieval Parameters
Each RAG implementation allows retrieval parameter tuning:Adjusting Number of Retrieved Documents
Tuning Hybrid Retrieval Weights
Modifying Temperature for Generation
Experimentation Best Practices
1. Version Control Your Experiments
Create separate branches for experimental changes:2. Track Configuration Changes
Document experiments in your evaluation metadata:3. Run Comparative Evaluations
Always compare against baseline:4. Analyze Results Systematically
Compare metrics across configurations:- Faithfulness: Did more context reduce hallucinations?
- Answer Relevancy: Are answers still focused on the question?
- Context Precision: Is the retrieved context more relevant?
- Context Recall: Are we capturing all necessary information?
Contributing to the Project
We welcome research contributions that advance RAG techniques for medical Q&A:Contribution Areas
Novel RAG Architectures
Implement and evaluate new retrieval strategies
Model Integration
Add domain-specialized medical language models
Evaluation Extensions
Propose additional metrics or analysis methods
Results & Analysis
Contribute comparative insights and visualizations
Contribution Workflow
Implement Your Changes
Follow the existing code structure and patterns. Add comprehensive docstrings.
Document Your Methodology
Include:
- Hypothesis and motivation
- Implementation details
- Experimental setup
- Results summary and analysis
Research Guidelines
Reproducibility
- Fix Random Seeds: Set random seeds for reproducible results
- Document Dependencies: Update
requirements.txtwith new packages - Save Configurations: Store all hyperparameters in configuration files
Statistical Rigor
- Run multiple trials to account for variance
- Report mean and standard deviation for metrics
- Use appropriate statistical tests for comparisons
Ethical Considerations
When extending to new medical domains:- Ensure data sources are properly licensed
- Validate medical accuracy with domain experts
- Include appropriate disclaimers about clinical use
- Respect patient privacy in any real-world data
Next Steps
Adding RAG Architectures
Learn how to implement new RAG strategies
Integrating Models
Add new LLMs to the model registry
Customizing Metrics
Extend evaluation with custom metrics
API Reference
Explore the complete API documentation
