Overview
The benchmark currently includes six RAG architectures:- Simple Semantic RAG: Direct vector similarity matching
- Hybrid RAG: BM25 + Semantic ensemble retrieval
- Hybrid RAG + RRF: Reciprocal Rank Fusion
- HyDE RAG: Hypothetical document embeddings
- Query Rewriter RAG: Multi-query reformulation
- PageIndex RAG: Page-aware retrieval
Implementation Requirements
Required Function Signature
Every RAG implementation must provide aquery_for_evaluation() function:
Return Dictionary Structure
The function must return:Step-by-Step Implementation
Create New RAG Module
Create a new file in Start with imports and basic setup:
src/rag/ for your RAG implementation:Integrate with Evaluator
Register your RAG architecture in Also add evaluation helper function:
src/evaluation/ragas_evaluator.py:Testing Your Implementation
Unit Testing
Create a test file to verify basic functionality:Interactive Testing
Run your implementation directly:Evaluation Testing
Run a full evaluation:Evaluation and Analysis
Single Model Evaluation
Multi-Model Comparison
Comprehensive Benchmark
Performance Considerations
Optimization Tips
- Cache Embeddings: Reuse embeddings when possible
- Batch Processing: Process multiple queries together
- Async Operations: Use async/await for parallel API calls
- Connection Pooling: Reuse HTTP connections
Monitoring Costs
The framework automatically tracks:- Input/output tokens per query
- Cost per query and total cost
- Execution time
Example: Semantic Reranking RAG
Here’s a complete example implementing semantic reranking:Next Steps
Integrating Models
Test your RAG with different LLMs
Customizing Metrics
Add custom evaluation metrics
API Reference
Explore the complete API
Contributing
Contribute your implementation
