HyDE RAG

Overview

The HyDE (Hypothetical Document Embeddings) RAG module implements a two-stage RAG pipeline. It first generates a hypothetical document that would perfectly answer the user’s query, then uses that document for semantic search. This approach can improve retrieval accuracy by searching for detailed content rather than a short query. Module: src.rag.hyde Source: src/rag/hyde.py

Configuration

Default Models

# Use a more creative model for HyDE document generation
llm_hyde = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0.7)

# Use a powerful model for final answer generation
llm_answer = ChatOpenAI(model_name="gpt-4o", temperature=0)

embeddings = OpenAIEmbeddings(model="text-embedding-3-small")

Vector Store

vectorstore = Chroma(
    persist_directory=str(chroma_db_dir),
    embedding_function=embeddings,
    collection_name="guia_embarazo_parto"
)

retriever = vectorstore.as_retriever(search_kwargs={"k": 5})

Prompt Templates

HyDE Document Generation Prompt

hyde_prompt_template = """
You are a medical expert writing a detailed section for a medical guide on pregnancy and childbirth.

Based on this question: {question}

Write a detailed and comprehensive medical document that would perfectly answer this question.
The document should include:
- Accurate medical information on the topic
- Relevant clinical details
- Appropriate medical recommendations
- Important considerations for maternal health
- Practical information and advice

Write the document as if it were part of an official medical guide on pregnancy and childbirth.
Be specific, detailed, and use appropriate medical terminology.

HYPOTHETICAL DOCUMENT:
"""

Answer Generation Prompt

Uses the standard medical expert prompt (same as Simple RAG).

Functions

generate_hypothetical_document

def generate_hypothetical_document(query: str) -> Dict[str, Any]

Generates a hypothetical document based on the user’s query.

query

str

required

The user’s question

return

Dict[str, Any]

Dictionary containing:

document (str): The generated hypothetical document
input_tokens (int): Input tokens used
output_tokens (int): Output tokens generated
total_tokens (int): Total tokens
usage_source (str): Source of usage data
cost (float): Cost in USD
cost_source (str): Source of cost calculation

format_docs

def format_docs(docs: List[Any]) -> str

Formats the retrieved documents to be included in the final prompt.

docs

List[Any]

required

A list of retrieved LangChain Document objects

return

str

A formatted string containing the content of the documents

process_hyde_query

def process_hyde_query(
    query: str, 
    custom_hyde_llm: ChatOpenAI = None, 
    custom_answer_llm: ChatOpenAI = None
) -> Dict[str, Any]

Processes a query using the full HyDE RAG pipeline.

query

str

required

The user’s question

custom_hyde_llm

ChatOpenAI

default:"None"

Custom model for hypothetical document generation

custom_answer_llm

ChatOpenAI

default:"None"

Custom model for answer generation

return

Dict[str, Any]

Dictionary containing:

answer (str): The final generated answer
contexts (List[str]): Retrieved document contents
hypothetical_document (str): The generated hypothetical document
hyde_metrics (dict): Metrics for HyDE generation
- input_tokens (int)
- output_tokens (int)
- cost (float)
- usage_source (str)
- cost_source (str)
answer_metrics (dict): Metrics for answer generation
- input_tokens (int)
- output_tokens (int)
- cost (float)
- usage_source (str)
- cost_source (str)
total_cost (float): Combined cost
total_input_tokens (int): Combined input tokens
total_output_tokens (int): Combined output tokens
usage_sources (List[str]): Sources of usage data
cost_sources (List[str]): Sources of cost calculations

query_for_evaluation

def query_for_evaluation(
    question: str, 
    hyde_model: str = None, 
    answer_model: str = None,
    custom_hyde_llm: Optional[BaseChatModel] = None,
    custom_answer_llm: Optional[BaseChatModel] = None
) -> dict

A wrapper function for RAG evaluation frameworks like Ragas.

question

str

required

The question to process

hyde_model

str

default:"None"

The name of the LLM model to use for HyDE generation. Defaults to “gpt-3.5-turbo”

answer_model

str

default:"None"

The name of the LLM model to use for answer generation. Defaults to “gpt-4o”

custom_hyde_llm

BaseChatModel

default:"None"

Pre-configured LLM for HyDE. Takes precedence over hyde_model

custom_answer_llm

BaseChatModel

default:"None"

Pre-configured LLM for answer. Takes precedence over answer_model

return

dict

Dictionary containing:

question (str): Original question
answer (str): Generated answer
contexts (List[str]): Retrieved document contents
metadata (dict): Comprehensive metadata including:
- execution_time (float): Total execution time in seconds
- input_tokens (int): Total input tokens (HyDE + Answer)
- output_tokens (int): Total output tokens (HyDE + Answer)
- total_cost (float): Total cost in USD
- retrieval_method (str): “hyde”
- llm_hyde_model (str): Model used for HyDE generation
- llm_answer_model (str): Model used for answer generation
- hyde_provider (str): Provider for HyDE model
- answer_provider (str): Provider for answer model
- hyde_model_id (str): Full HyDE model ID
- answer_model_id (str): Full answer model ID
- hyde_cost (float): Cost for HyDE generation
- answer_cost (float): Cost for answer generation
- usage_source (str): Combined usage sources
- cost_source (str): Cost calculation source

Usage Example

from src.rag.hyde import query_for_evaluation

# Basic usage with default models
result = query_for_evaluation(
    question="¿Cuáles son las etapas del parto?"
)

print(result["answer"])
print(f"HyDE model: {result['metadata']['llm_hyde_model']}")
print(f"Answer model: {result['metadata']['llm_answer_model']}")
print(f"Total cost: ${result['metadata']['total_cost']:.6f}")
print(f"HyDE cost: ${result['metadata']['hyde_cost']:.6f}")
print(f"Answer cost: ${result['metadata']['answer_cost']:.6f}")

# Using custom models
result = query_for_evaluation(
    question="¿Qué es la episiotomía?",
    hyde_model="gpt-4o-mini",
    answer_model="gpt-4o"
)

# Using pre-configured LLMs
from langchain_openai import ChatOpenAI

hyde_llm = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0.8)
answer_llm = ChatOpenAI(model_name="gpt-4o", temperature=0)

result = query_for_evaluation(
    question="¿Cuándo es necesaria una cesárea?",
    custom_hyde_llm=hyde_llm,
    custom_answer_llm=answer_llm
)

Pipeline Flow

Generate HyDE: Uses gpt-3.5-turbo (temperature=0.7) to generate a detailed hypothetical document that would answer the question
Retrieve: Uses the hypothetical document (not the original query) to perform semantic search and retrieve the top 5 most relevant actual documents
Format: Formats retrieved documents with metadata
Generate Answer: Uses gpt-4o (temperature=0) to generate the final answer based on retrieved context
Track: Captures separate metrics for both HyDE and answer generation

Key Features

Two-stage retrieval: Generates hypothetical content first, then searches
Improved semantic matching: Searches with detailed content vs. short query
Dual model tracking: Separate metrics for HyDE and answer generation
Creative HyDE generation: Uses higher temperature (0.7) for document generation
Precise answer generation: Uses temperature 0 for final answer
Comprehensive cost tracking: Tracks costs for both stages

When to Use HyDE

HyDE works best when:

User queries are short or ambiguous
You need to bridge vocabulary gaps between query and documents
Documents use different terminology than typical user queries
You want to improve recall for conceptual questions

RAG Modules

Evaluation

Common Utilities

Scripts

Overview

Configuration

Default Models

Vector Store

Prompt Templates

HyDE Document Generation Prompt

Answer Generation Prompt

Functions

generate_hypothetical_document

format_docs

process_hyde_query

query_for_evaluation

Usage Example

Pipeline Flow

Key Features

When to Use HyDE

Build docs developers (and LLMs) love

RAG Modules

Evaluation

Common Utilities

Scripts

​Overview

​Configuration

​Default Models

​Vector Store

​Prompt Templates

​HyDE Document Generation Prompt

​Answer Generation Prompt

​Functions

​generate_hypothetical_document

​format_docs

​process_hyde_query

​query_for_evaluation

​Usage Example

​Pipeline Flow

​Key Features

​When to Use HyDE

Build docs developers (and LLMs) love

Overview

Configuration

Default Models

Vector Store

Prompt Templates

HyDE Document Generation Prompt

Answer Generation Prompt

Functions

generate_hypothetical_document

format_docs

process_hyde_query

query_for_evaluation

Usage Example

Pipeline Flow

Key Features

When to Use HyDE