Simple Semantic RAG

Overview

The Simple Semantic RAG module implements a basic RAG (Retrieval-Augmented Generation) pipeline. It uses a semantic retriever to find relevant documents in a ChromaDB vector store and then uses a language model to generate an answer based on the retrieved context. Module: src.rag.simple Source: src/rag/simple.py

Configuration

Default Models

embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
llm = ChatOpenAI(model_name="gpt-4o", temperature=0)

Vector Store

vectorstore = Chroma(
    persist_directory=str(chroma_db_dir),
    embedding_function=embeddings,
    collection_name="guia_embarazo_parto"
)

Retriever Configuration

retriever = vectorstore.as_retriever(search_kwargs={"k": 5})

Retrieves the top 5 most similar documents using semantic search.

Prompt Template

The module uses a medical-focused prompt template:

qa_template = """
You are a medical expert specializing in pregnancy and childbirth. 
Your task is to analyze the provided medical context and answer the user's question accurately and concisely.

STRICT INSTRUCTIONS:
1.  **Base your answer exclusively on the information within the MEDICAL CONTEXT section.** Do not use any external knowledge.
2.  *The context is ordered by relevance.* Give the highest priority to the first few documents (e.g., Documents 1-2) as they are the most relevant. Use subsequent documents to supplement your answer if needed.
3.  *Provide a direct and integrated answer.* Your response should be a single, well-written paragraph. Start with a direct answer to the question, then seamlessly incorporate specific details, data, and recommendations from the context to support it.
4.  *If the context does not contain enough information to answer the question, state that clearly.* Do not try to invent an answer.
5.  *remember always answer in spanish*

MEDICAL CONTEXT (ordered by relevance):
{context}

QUESTION: {question}

DETAILED MEDICAL ANSWER:
"""

Functions

format_docs

def format_docs(docs: List[Document]) -> str

Formats the retrieved documents to be included in the final prompt.

docs

List[Document]

required

A list of retrieved LangChain Document objects

return

str

A formatted string containing the content of the documents with source and page metadata

Example Output Format:

--- Document 1 ---
Source: guide.pdf, Page: 42
Content: [document content]

--- Document 2 ---
Source: guide.pdf, Page: 43
Content: [document content]

process_semantic_query

def process_semantic_query(query: str, custom_llm: ChatOpenAI = None) -> Dict[str, Any]

Processes a query using the simple semantic RAG pipeline.

query

str

required

The user’s question

custom_llm

ChatOpenAI

default:"None"

Custom LLM to use. If None, uses default llm (gpt-4o)

return

Dict[str, Any]

A dictionary containing:

answer (str): The generated answer
contexts (List[str]): List of retrieved document contents
retrieved_documents (List[Document]): Full Document objects
metrics (dict): Token usage and cost metrics
- input_tokens (int): Number of input tokens
- output_tokens (int): Number of output tokens
- total_tokens (int): Total tokens used
- usage_source (str): Source of usage data
- cost (float): Total cost in USD
- cost_source (str): Source of cost calculation

query_for_evaluation

def query_for_evaluation(
    question: str, 
    llm_model: str = None, 
    custom_llm: Optional[BaseChatModel] = None
) -> dict

A wrapper function for RAG evaluation frameworks like Ragas. This function processes a question and returns a dictionary structured for easy integration with evaluation tools.

question

str

required

The question to process

llm_model

str

default:"None"

Model name to use. If None, uses default “gpt-4o”

custom_llm

BaseChatModel

default:"None"

Pre-configured language model. Takes precedence over llm_model

return

dict

A dictionary containing:

question (str): The original question
answer (str): The generated answer
contexts (List[str]): Retrieved document contents
source_documents (List[Document]): Full retrieved documents
metadata (dict): Comprehensive metadata including:
- num_contexts (int): Number of retrieved contexts
- retrieval_method (str): “semantic_only”
- llm_model (str): Model name used
- provider (str): Provider (e.g., “openai”)
- model_id (str): Full model identifier
- embedding_model (str): “text-embedding-3-small”
- execution_time (float): Total execution time in seconds
- input_tokens (int): Input tokens used
- output_tokens (int): Output tokens generated
- total_cost (float): Total cost in USD
- tokens_used (int): Total tokens (input + output)
- usage_source (str): Source of usage metrics
- cost_source (str): Source of cost calculation

Usage Example

from src.rag.simple import query_for_evaluation

# Basic usage with default model
result = query_for_evaluation(
    question="¿Cuáles son los síntomas del embarazo temprano?"
)

print(result["answer"])
print(f"Cost: ${result['metadata']['total_cost']:.6f}")
print(f"Contexts retrieved: {result['metadata']['num_contexts']}")

# Using a custom model
result = query_for_evaluation(
    question="¿Qué es la preeclampsia?",
    llm_model="gpt-4o-mini"
)

# Using a pre-configured LLM
from langchain_openai import ChatOpenAI

custom_llm = ChatOpenAI(model_name="gpt-4o", temperature=0.3)
result = query_for_evaluation(
    question="¿Cuándo debo ir al hospital durante el parto?",
    custom_llm=custom_llm
)

Pipeline Flow

Retrieve: Uses semantic search to find the top 5 most relevant documents from ChromaDB
Format: Formats documents with source and page metadata
Generate: Uses the LLM to generate an answer based on the retrieved context
Track: Captures token usage and cost metrics

Key Features

Simple and straightforward semantic search
Automatic cost and token tracking
Support for custom LLMs
Medical domain-specific prompting
Structured output for evaluation frameworks

RAG Modules

Evaluation

Common Utilities

Scripts

Overview

Configuration

Default Models

Vector Store

Retriever Configuration

Prompt Template

Functions

format_docs

process_semantic_query

query_for_evaluation

Usage Example

Pipeline Flow

Key Features

Build docs developers (and LLMs) love

RAG Modules

Evaluation

Common Utilities

Scripts

​Overview

​Configuration

​Default Models

​Vector Store

​Retriever Configuration

​Prompt Template

​Functions

​format_docs

​process_semantic_query

​query_for_evaluation

​Usage Example

​Pipeline Flow

​Key Features

Build docs developers (and LLMs) love

Overview

Configuration

Default Models

Vector Store

Retriever Configuration

Prompt Template

Functions

format_docs

process_semantic_query

query_for_evaluation

Usage Example

Pipeline Flow

Key Features