Overview
The Simple Semantic RAG module implements a basic RAG (Retrieval-Augmented Generation) pipeline. It uses a semantic retriever to find relevant documents in a ChromaDB vector store and then uses a language model to generate an answer based on the retrieved context. Module:src.rag.simple
Source: src/rag/simple.py
Configuration
Default Models
Vector Store
Retriever Configuration
Prompt Template
The module uses a medical-focused prompt template:Functions
format_docs
A list of retrieved LangChain Document objects
A formatted string containing the content of the documents with source and page metadata
process_semantic_query
The user’s question
Custom LLM to use. If None, uses default llm (gpt-4o)
A dictionary containing:
answer(str): The generated answercontexts(List[str]): List of retrieved document contentsretrieved_documents(List[Document]): Full Document objectsmetrics(dict): Token usage and cost metricsinput_tokens(int): Number of input tokensoutput_tokens(int): Number of output tokenstotal_tokens(int): Total tokens usedusage_source(str): Source of usage datacost(float): Total cost in USDcost_source(str): Source of cost calculation
query_for_evaluation
The question to process
Model name to use. If None, uses default “gpt-4o”
Pre-configured language model. Takes precedence over llm_model
A dictionary containing:
question(str): The original questionanswer(str): The generated answercontexts(List[str]): Retrieved document contentssource_documents(List[Document]): Full retrieved documentsmetadata(dict): Comprehensive metadata including:num_contexts(int): Number of retrieved contextsretrieval_method(str): “semantic_only”llm_model(str): Model name usedprovider(str): Provider (e.g., “openai”)model_id(str): Full model identifierembedding_model(str): “text-embedding-3-small”execution_time(float): Total execution time in secondsinput_tokens(int): Input tokens usedoutput_tokens(int): Output tokens generatedtotal_cost(float): Total cost in USDtokens_used(int): Total tokens (input + output)usage_source(str): Source of usage metricscost_source(str): Source of cost calculation
Usage Example
Pipeline Flow
- Retrieve: Uses semantic search to find the top 5 most relevant documents from ChromaDB
- Format: Formats documents with source and page metadata
- Generate: Uses the LLM to generate an answer based on the retrieved context
- Track: Captures token usage and cost metrics
Key Features
- Simple and straightforward semantic search
- Automatic cost and token tracking
- Support for custom LLMs
- Medical domain-specific prompting
- Structured output for evaluation frameworks
