Multi-Query Rewriter RAG

Overview

The Multi-Query Rewriter RAG module implements a RAG pipeline that generates multiple variations of the user’s question to improve document retrieval. It creates three different query reformulations, retrieves documents for each, combines and re-ranks the results, then synthesizes a final answer. Module: src.rag.rewriter Source: src/rag/rewriter.py

Configuration

Default Models

embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
llm_rewriter = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0.3)
llm_answer = ChatOpenAI(model_name="gpt-4o", temperature=0)

Retriever Configuration

base_retriever = vectorstore.as_retriever(
    search_type="similarity_score_threshold",
    search_kwargs={"k": 5, "score_threshold": 0.05}
)

Uses similarity search with a score threshold to filter low-quality results.

Query Rewriting Templates

The module uses three different rewriting strategies:

Template 1: Standalone Query

REPHRASE_TEMPLATE_1 = """Rewrite this question to be a standalone, specific query about pregnancy and childbirth.

Original question: {question}

Instructions:
- Maintain the medical/obstetric context if relevant.
- Be specific and clear in medical terms.
- Focus on pregnancy, childbirth, prenatal care, or maternal health.
- Ensure the question is complete and self-contained.

Standalone question:"""

Template 2: Synonym-Based Rephrasing

REPHRASE_TEMPLATE_2 = """Rephrase this question about pregnancy and childbirth using synonyms and alternative medical terms.

Original question: {question}

Instructions:
- Use precise medical terminology.
- Include synonyms and alternative terms.
- Maintain the meaning but change the wording.
- Focus on clinical and obstetric aspects.

Rephrased question:"""

Template 3: Expanded Context

REPHRASE_TEMPLATE_3 = """Expand this question to include related aspects and additional context about pregnancy and childbirth.

Base question: {question}

Instructions:
- Expand the question to include related aspects.
- Add context about complications, prevention, or care.
- Include possible variations or special cases.
- Keep the focus on maternal and perinatal health.

Expanded question:"""

Functions

format_docs

def format_docs(docs: List[Document]) -> str

Formats documents for the context, indicating relevance.

docs

List[Document]

required

List of documents to format

return

str

Formatted string with documents labeled by relevance (High/Medium/Low)

Example Output:

--- DOCUMENT 1 (Relevance: High) ---
Source: guide.pdf
Content: [content]

--- DOCUMENT 2 (Relevance: High) ---
Source: guide.pdf
Content: [content]

process_rewriter_query

def process_rewriter_query(
    question: str, 
    custom_rewriter_llm: ChatOpenAI = None,
    custom_answer_llm: ChatOpenAI = None,
    max_final_docs: int = 8
) -> Dict[str, Any]

Processes a query using the multi-query rewriting RAG pipeline.

question

str

required

The user’s question

custom_rewriter_llm

ChatOpenAI

default:"None"

Custom model for query rewriting. Defaults to gpt-3.5-turbo with temperature=0.3

custom_answer_llm

ChatOpenAI

default:"None"

Custom model for answer generation. Defaults to gpt-4o with temperature=0

max_final_docs

int

default:"8"

The maximum number of documents to return after deduplication and re-ranking

return

Dict[str, Any]

Dictionary containing:

answer (str): The generated answer
contexts (List[str]): Retrieved document contents
retrieved_documents (List[Document]): Full document objects
rewritten_queries (List[str]): The 3 generated query variations
metrics (dict):
- rewrite_input_tokens (int): Tokens used for rewriting
- rewrite_output_tokens (int): Tokens generated during rewriting
- rewrite_cost (float): Cost of rewriting
- answer_input_tokens (int): Tokens used for answer
- answer_output_tokens (int): Tokens generated for answer
- answer_cost (float): Cost of answer generation
- total_input_tokens (int): Total input tokens
- total_output_tokens (int): Total output tokens
- total_cost (float): Total cost in USD
- usage_source (str): Source of usage data
- cost_source (str): Source of cost calculation

query_for_evaluation

def query_for_evaluation(
    question: str,
    rewriter_model: str = None,
    answer_model: str = None,
    custom_rewriter_llm: Optional[BaseChatModel] = None,
    custom_answer_llm: Optional[BaseChatModel] = None
) -> dict

A wrapper function for RAG evaluation frameworks like Ragas.

question

str

required

The question to process

rewriter_model

str

default:"None"

The name of the LLM model to use for query rewriting. Defaults to “gpt-3.5-turbo”

answer_model

str

default:"None"

The name of the LLM model to use for answer generation. Defaults to “gpt-4o”

custom_rewriter_llm

BaseChatModel

default:"None"

Pre-configured LLM for rewriting. Takes precedence over rewriter_model

custom_answer_llm

BaseChatModel

default:"None"

Pre-configured LLM for answer. Takes precedence over answer_model

return

dict

Dictionary containing:

question (str): Original question
answer (str): Generated answer
contexts (List[str]): Retrieved document contents
source_documents (List[Document]): Full retrieved documents
metadata (dict): Comprehensive metadata including:
- num_contexts (int): Number of contexts
- retrieval_method (str): “multi_query_rewrite”
- rewrite_count (int): Number of query variations (3)
- llm_model (str): Answer model name
- rewriter_model (str): Rewriter model name
- provider (str): Answer provider
- model_id (str): Answer model ID
- rewriter_provider (str): Rewriter provider
- rewriter_model_id (str): Rewriter model ID
- execution_time (float): Total execution time
- input_tokens (int): Total input tokens
- output_tokens (int): Total output tokens
- total_cost (float): Total cost in USD
- tokens_used (int): Total tokens
- usage_source (str): Usage data source
- cost_source (str): Cost calculation source

Usage Example

from src.rag.rewriter import query_for_evaluation

# Basic usage
result = query_for_evaluation(
    question="¿Cuáles son los signos del parto?"
)

print(result["answer"])
print(f"Retrieval method: {result['metadata']['retrieval_method']}")
print(f"Query variations: {result['metadata']['rewrite_count']}")
print(f"Total cost: ${result['metadata']['total_cost']:.6f}")

# Access the rewritten queries from the detailed result
from src.rag.rewriter import process_rewriter_query

detailed_result = process_rewriter_query(
    question="¿Qué es la preeclampsia?"
)

print("Rewritten queries:")
for i, query in enumerate(detailed_result['rewritten_queries'], 1):
    print(f"{i}. {query}")

# Using custom models
from langchain_openai import ChatOpenAI

rewriter_llm = ChatOpenAI(model_name="gpt-4o-mini", temperature=0.5)
answer_llm = ChatOpenAI(model_name="gpt-4o", temperature=0)

result = query_for_evaluation(
    question="¿Cuándo es necesaria una cesárea?",
    custom_rewriter_llm=rewriter_llm,
    custom_answer_llm=answer_llm
)

Pipeline Flow

Generate Queries: Creates 3 query variations using gpt-3.5-turbo (temp=0.3):
- Standalone specific query
- Synonym-based rephrasing
- Expanded contextual query
Retrieve: Performs semantic search for each query variation (top 5 per query)
Deduplicate: Removes duplicate documents using content-based IDs
Weight & Re-rank: Applies query-based weighting (later queries get 5% penalty)
Select: Chooses top 8 documents after re-ranking
Format: Formats documents with relevance indicators
Generate: Uses gpt-4o (temp=0) to generate final answer
Track: Captures separate metrics for rewriting and answer generation

Query Weighting Strategy

# Penalize queries from later, more speculative prompts
query_weight = 1.0 - (query_index - 1) * 0.05

Query 1: weight = 1.0 (100%)
Query 2: weight = 0.95 (95%)
Query 3: weight = 0.90 (90%)

This prioritizes the standalone query while still considering alternative formulations.

Key Features

Multi-perspective retrieval: 3 different query formulations
Automatic deduplication: Removes duplicate documents across queries
Intelligent weighting: Prioritizes more direct query reformulations
High coverage: Up to 15 candidates (3 queries × 5 docs)
Relevance labeling: Documents marked as High/Medium/Low relevance
Dual cost tracking: Separate metrics for rewriting and answer generation
Temperature tuning: 0.3 for rewriting (balanced), 0 for answer (precise)

RAG Modules

Evaluation

Common Utilities

Scripts

Overview

Configuration

Default Models

Retriever Configuration

Query Rewriting Templates

Template 1: Standalone Query

Template 2: Synonym-Based Rephrasing

Template 3: Expanded Context

Functions

format_docs

process_rewriter_query

query_for_evaluation

Usage Example

Pipeline Flow

Query Weighting Strategy

Key Features

Build docs developers (and LLMs) love

RAG Modules

Evaluation

Common Utilities

Scripts

​Overview

​Configuration

​Default Models

​Retriever Configuration

​Query Rewriting Templates

​Template 1: Standalone Query

​Template 2: Synonym-Based Rephrasing

​Template 3: Expanded Context

​Functions

​format_docs

​process_rewriter_query

​query_for_evaluation

​Usage Example

​Pipeline Flow

​Query Weighting Strategy

​Key Features

Build docs developers (and LLMs) love

Overview

Configuration

Default Models

Retriever Configuration

Query Rewriting Templates

Template 1: Standalone Query

Template 2: Synonym-Based Rephrasing

Template 3: Expanded Context

Functions

format_docs

process_rewriter_query

query_for_evaluation

Usage Example

Pipeline Flow

Query Weighting Strategy

Key Features