Skip to main content

Overview

The Hybrid RRF module implements an advanced hybrid RAG pipeline that combines lexical retrieval (BM25) and semantic retrieval (ChromaDB), fuses both ranked lists using Reciprocal Rank Fusion (RRF), and then applies Maximal Marginal Relevance (MMR) for diversity. Module: src.rag.hybrid_rrf Source: src/rag/hybrid_rrf.py

Configuration

Retrieval Settings

k_bm25_candidates = 15      # BM25 initial candidates
k_semantic_candidates = 15  # Semantic initial candidates
k_rrf_pool = 10             # Pool size after RRF fusion
k_final = 5                 # Final documents after MMR
rrf_k = 60                  # RRF constant
mmr_lambda = 0.7            # MMR diversity parameter (0.7 = 70% relevance, 30% diversity)

Default Models

embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
llm = ChatOpenAI(model_name="gpt-4o", temperature=0)

Retrievers

# 1. Lexical retriever (BM25)
bm25_retriever = BM25Retriever.from_documents(documents)
bm25_retriever.k = k_bm25_candidates

# 2. Semantic retriever (Chroma)
semantic_retriever = vectorstore.as_retriever(
    search_kwargs={"k": k_semantic_candidates}
)

Core Functions

reciprocal_rank_fusion

def reciprocal_rank_fusion(
    rankings: List[List[Document]], 
    k_constant: int = 60, 
    top_k: int = 5
) -> List[Document]
Fuses multiple ranked lists using Reciprocal Rank Fusion (RRF) algorithm.
rankings
List[List[Document]]
required
List of ranked document lists from different retrievers
k_constant
int
default:"60"
RRF constant. Higher values give less weight to rank position
top_k
int
default:"5"
Number of top documents to return after fusion
return
List[Document]
Fused and re-ranked list of documents
RRF Formula: score = sum(1 / (k_constant + rank)) for each document across all rankings.

mmr_select

def mmr_select(
    query: str, 
    candidate_docs: List[Document], 
    top_k: int, 
    lambda_mult: float = 0.7
) -> List[Document]
Selects top-k documents using Maximal Marginal Relevance (MMR) to balance relevance and diversity.
query
str
required
The original user query
candidate_docs
List[Document]
required
Pool of candidate documents to select from
top_k
int
required
Number of documents to select
lambda_mult
float
default:"0.7"
Lambda parameter controlling relevance vs diversity tradeoff.
  • 1.0 = pure relevance
  • 0.0 = pure diversity
  • 0.7 = 70% relevance, 30% diversity
return
List[Document]
Selected documents balancing relevance and diversity
MMR Formula: MMR = λ * relevance(query, doc) - (1-λ) * max_similarity(doc, selected_docs)

retrieve_hybrid_rrf

def retrieve_hybrid_rrf(query: str) -> List[Document]
Retrieves candidates from both BM25 and semantic retrievers, fuses with RRF, then applies MMR diversification.
query
str
required
The user’s query
return
List[Document]
Final list of k_final documents (default: 5)

format_docs

def format_docs(docs: List[Document]) -> str
Formats retrieved documents for the answer prompt.
docs
List[Document]
required
List of documents to format
return
str
Formatted string with document contents and metadata

process_hybrid_rrf_query

def process_hybrid_rrf_query(
    query: str, 
    custom_llm: Optional[BaseChatModel] = None
) -> Dict[str, Any]
Processes a query with Hybrid RAG + RRF fusion.
query
str
required
The user’s question
custom_llm
BaseChatModel
default:"None"
Custom language model for answer generation
return
Dict[str, Any]
Dictionary containing:
  • answer (str): Generated answer
  • contexts (List[str]): Document contents
  • retrieved_documents (List[Document]): Full documents
  • metrics (dict): Token usage and cost metrics

query_for_evaluation

def query_for_evaluation(
    question: str, 
    llm_model: str = None, 
    custom_llm: Optional[BaseChatModel] = None
) -> dict
Wrapper function for RAGAS-compatible evaluation.
question
str
required
The question to process
llm_model
str
default:"None"
Model name to use. If None, uses default “gpt-4o”
custom_llm
BaseChatModel
default:"None"
Pre-configured language model. Takes precedence over llm_model
return
dict
Dictionary containing:
  • question (str): Original question
  • answer (str): Generated answer
  • contexts (List[str]): Retrieved document contents
  • source_documents (List[Document]): Full retrieved documents
  • metadata (dict): Comprehensive metadata including:
    • num_contexts (int): Number of contexts
    • retrieval_method (str): “hybrid_bm25_semantic_rrf_mmr”
    • rrf_k (int): RRF constant used
    • k_bm25_candidates (int): 15
    • k_semantic_candidates (int): 15
    • k_rrf_pool (int): 10
    • k_final (int): 5
    • mmr_lambda (float): 0.7
    • llm_model (str): Model name
    • provider (str): Provider name
    • model_id (str): Full model ID
    • embedding_model (str): “text-embedding-3-small”
    • execution_time (float): Execution time in seconds
    • input_tokens (int): Input tokens used
    • output_tokens (int): Output tokens generated
    • total_cost (float): Total cost in USD
    • tokens_used (int): Total tokens
    • usage_source (str): Usage data source
    • cost_source (str): Cost calculation source

Usage Example

from src.rag.hybrid_rrf import query_for_evaluation

# Basic usage
result = query_for_evaluation(
    question="¿Cuáles son los riesgos de la cesárea?"
)

print(result["answer"])
print(f"Retrieval method: {result['metadata']['retrieval_method']}")
print(f"RRF constant: {result['metadata']['rrf_k']}")
print(f"MMR lambda: {result['metadata']['mmr_lambda']}")
print(f"Cost: ${result['metadata']['total_cost']:.6f}")

# With custom model
from langchain_openai import ChatOpenAI

custom_llm = ChatOpenAI(model_name="gpt-4o", temperature=0.2)
result = query_for_evaluation(
    question="¿Qué es la epidural?",
    custom_llm=custom_llm
)

Pipeline Flow

  1. BM25 Retrieval: Retrieves top 15 candidates using lexical search
  2. Semantic Retrieval: Retrieves top 15 candidates using vector similarity
  3. RRF Fusion: Fuses both ranked lists using Reciprocal Rank Fusion, creating a pool of 10 documents
  4. MMR Selection: Applies Maximal Marginal Relevance to select final 5 documents balancing relevance and diversity
  5. Format: Formats documents with metadata
  6. Generate: Uses LLM to generate answer
  7. Track: Captures comprehensive metrics

Key Features

  • Advanced Fusion: Uses Reciprocal Rank Fusion for better ranking
  • Diversity: MMR ensures diverse document selection
  • Deduplication: Automatically removes duplicate documents
  • Tunable Parameters: Configurable k values and lambda for fine-tuning
  • High Recall: Retrieves 15 candidates from each method
  • Balanced Results: 70% relevance + 30% diversity by default

Build docs developers (and LLMs) love