Overview
Simple Semantic RAG is the baseline retrieval-augmented generation architecture that uses pure semantic search to find relevant documents in a ChromaDB vector store. It embeds the user’s query and retrieves the most semantically similar documents based on vector similarity. This approach is implemented insrc/rag/simple.py and serves as the foundation for comparison with more advanced retrieval strategies.
How It Works
The Simple Semantic RAG pipeline follows these steps:- Query Embedding: The user’s question is embedded using OpenAI’s
text-embedding-3-smallmodel - Semantic Retrieval: ChromaDB performs a similarity search to find the top-k most similar document chunks
- Context Formatting: Retrieved documents are formatted with metadata (source, page number)
- Answer Generation: An LLM generates the final answer based on the retrieved context
The retriever is configured to return the top 5 most similar documents by default (
k=5).Key Features
- Pure semantic search: Relies entirely on vector similarity for retrieval
- Simple and fast: No query preprocessing or result fusion steps
- Consistent embeddings: Uses the same embedding model for indexing and retrieval
- Ordered by relevance: Documents are naturally ranked by semantic similarity
Implementation Details
Retriever Configuration
Core Processing Function
Theprocess_semantic_query() function handles the complete pipeline:
Answer Prompt Template
The system uses a structured prompt that emphasizes:- Context-only answers: Base responses exclusively on provided medical context
- Relevance ordering: Prioritize the first few documents as most relevant
- Integrated responses: Provide direct, well-written paragraph answers
- Spanish language: All medical answers are in Spanish
Usage with query_for_evaluation()
Thequery_for_evaluation() function provides a standardized interface for benchmark evaluation:
Return Structure
When to Use This Approach
Best For
- Well-structured queries: Questions that naturally align with document content
- Baseline comparison: Establishing performance benchmarks for more complex methods
- Fast prototyping: Quick setup with minimal configuration
- Dense semantic content: Documents where meaning is more important than exact keywords
Limitations
- Keyword mismatches: May miss documents with exact terminology but different semantic framing
- Query ambiguity: Short or vague queries may not embed well
- Vocabulary gap: Struggles when query vocabulary differs significantly from document vocabulary
- No diversification: Can return very similar documents without variety
Performance Characteristics
Speed
- Fast retrieval: Single embedding + vector search
- Low latency: ~1-3 seconds total for typical queries
- No preprocessing overhead: Direct query-to-embedding conversion
Cost
- Embedding cost: ~$0.00001 per query (text-embedding-3-small)
- LLM cost: Depends on model (gpt-4o: ~$0.002-0.005 per query)
- Total: Most cost-efficient RAG architecture
Quality
- Good for clear queries: High precision when query intent is unambiguous
- Baseline recall: May miss relevant documents with different phrasing
- Context quality: Retrieved documents are semantically similar but may lack diversity
Source Files
- Implementation:
~/workspace/source/src/rag/simple.py:105-146 - Evaluation interface:
~/workspace/source/src/rag/simple.py:148-214
