Skip to main content

Overview

The Simple Semantic RAG module implements a basic RAG (Retrieval-Augmented Generation) pipeline. It uses a semantic retriever to find relevant documents in a ChromaDB vector store and then uses a language model to generate an answer based on the retrieved context. Module: src.rag.simple Source: src/rag/simple.py

Configuration

Default Models

embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
llm = ChatOpenAI(model_name="gpt-4o", temperature=0)

Vector Store

vectorstore = Chroma(
    persist_directory=str(chroma_db_dir),
    embedding_function=embeddings,
    collection_name="guia_embarazo_parto"
)

Retriever Configuration

retriever = vectorstore.as_retriever(search_kwargs={"k": 5})
Retrieves the top 5 most similar documents using semantic search.

Prompt Template

The module uses a medical-focused prompt template:
qa_template = """
You are a medical expert specializing in pregnancy and childbirth. 
Your task is to analyze the provided medical context and answer the user's question accurately and concisely.

STRICT INSTRUCTIONS:
1.  **Base your answer exclusively on the information within the MEDICAL CONTEXT section.** Do not use any external knowledge.
2.  *The context is ordered by relevance.* Give the highest priority to the first few documents (e.g., Documents 1-2) as they are the most relevant. Use subsequent documents to supplement your answer if needed.
3.  *Provide a direct and integrated answer.* Your response should be a single, well-written paragraph. Start with a direct answer to the question, then seamlessly incorporate specific details, data, and recommendations from the context to support it.
4.  *If the context does not contain enough information to answer the question, state that clearly.* Do not try to invent an answer.
5.  *remember always answer in spanish*

MEDICAL CONTEXT (ordered by relevance):
{context}

QUESTION: {question}

DETAILED MEDICAL ANSWER:
"""

Functions

format_docs

def format_docs(docs: List[Document]) -> str
Formats the retrieved documents to be included in the final prompt.
docs
List[Document]
required
A list of retrieved LangChain Document objects
return
str
A formatted string containing the content of the documents with source and page metadata
Example Output Format:
--- Document 1 ---
Source: guide.pdf, Page: 42
Content: [document content]

--- Document 2 ---
Source: guide.pdf, Page: 43
Content: [document content]

process_semantic_query

def process_semantic_query(query: str, custom_llm: ChatOpenAI = None) -> Dict[str, Any]
Processes a query using the simple semantic RAG pipeline.
query
str
required
The user’s question
custom_llm
ChatOpenAI
default:"None"
Custom LLM to use. If None, uses default llm (gpt-4o)
return
Dict[str, Any]
A dictionary containing:
  • answer (str): The generated answer
  • contexts (List[str]): List of retrieved document contents
  • retrieved_documents (List[Document]): Full Document objects
  • metrics (dict): Token usage and cost metrics
    • input_tokens (int): Number of input tokens
    • output_tokens (int): Number of output tokens
    • total_tokens (int): Total tokens used
    • usage_source (str): Source of usage data
    • cost (float): Total cost in USD
    • cost_source (str): Source of cost calculation

query_for_evaluation

def query_for_evaluation(
    question: str, 
    llm_model: str = None, 
    custom_llm: Optional[BaseChatModel] = None
) -> dict
A wrapper function for RAG evaluation frameworks like Ragas. This function processes a question and returns a dictionary structured for easy integration with evaluation tools.
question
str
required
The question to process
llm_model
str
default:"None"
Model name to use. If None, uses default “gpt-4o”
custom_llm
BaseChatModel
default:"None"
Pre-configured language model. Takes precedence over llm_model
return
dict
A dictionary containing:
  • question (str): The original question
  • answer (str): The generated answer
  • contexts (List[str]): Retrieved document contents
  • source_documents (List[Document]): Full retrieved documents
  • metadata (dict): Comprehensive metadata including:
    • num_contexts (int): Number of retrieved contexts
    • retrieval_method (str): “semantic_only”
    • llm_model (str): Model name used
    • provider (str): Provider (e.g., “openai”)
    • model_id (str): Full model identifier
    • embedding_model (str): “text-embedding-3-small”
    • execution_time (float): Total execution time in seconds
    • input_tokens (int): Input tokens used
    • output_tokens (int): Output tokens generated
    • total_cost (float): Total cost in USD
    • tokens_used (int): Total tokens (input + output)
    • usage_source (str): Source of usage metrics
    • cost_source (str): Source of cost calculation

Usage Example

from src.rag.simple import query_for_evaluation

# Basic usage with default model
result = query_for_evaluation(
    question="¿Cuáles son los síntomas del embarazo temprano?"
)

print(result["answer"])
print(f"Cost: ${result['metadata']['total_cost']:.6f}")
print(f"Contexts retrieved: {result['metadata']['num_contexts']}")

# Using a custom model
result = query_for_evaluation(
    question="¿Qué es la preeclampsia?",
    llm_model="gpt-4o-mini"
)

# Using a pre-configured LLM
from langchain_openai import ChatOpenAI

custom_llm = ChatOpenAI(model_name="gpt-4o", temperature=0.3)
result = query_for_evaluation(
    question="¿Cuándo debo ir al hospital durante el parto?",
    custom_llm=custom_llm
)

Pipeline Flow

  1. Retrieve: Uses semantic search to find the top 5 most relevant documents from ChromaDB
  2. Format: Formats documents with source and page metadata
  3. Generate: Uses the LLM to generate an answer based on the retrieved context
  4. Track: Captures token usage and cost metrics

Key Features

  • Simple and straightforward semantic search
  • Automatic cost and token tracking
  • Support for custom LLMs
  • Medical domain-specific prompting
  • Structured output for evaluation frameworks

Build docs developers (and LLMs) love