Hybrid RAG combines two complementary retrieval strategies:
Lexical search (BM25): Matches exact keywords and terms
Semantic search (ChromaDB): Matches meaning and context
This architecture uses LangChain’s EnsembleRetriever to merge results from both retrievers with configurable weights, providing more robust retrieval than either method alone.
Parallel Retrieval: Query is sent to both BM25 and semantic retrievers simultaneously
Weighted Fusion: Results are combined using configurable weights (default 0.5/0.5)
Result Merging: Ensemble retriever produces a unified ranked list
Context Formatting: Merged documents are formatted with metadata
Answer Generation: LLM generates the final answer from combined context
The ensemble weights determine how much influence each retriever has on the final ranking. Equal weights (0.5/0.5) give balanced importance to both keyword matching and semantic similarity.
The process_hybrid_query() function handles the complete hybrid pipeline:
def process_hybrid_query(query: str, custom_llm: ChatOpenAI = None) -> Dict[str, Any]: """ Processes a query using the hybrid RAG pipeline. Args: query (str): The user's question. custom_llm (ChatOpenAI, optional): A custom language model to use. Returns: Dict[str, Any]: A dictionary with the final answer, contexts, and detailed metrics. """ # 1. Retrieve similar documents using the ensemble retriever retrieved_docs = ensemble_retriever.invoke(query) # 2. Format context formatted_context = format_docs(retrieved_docs) # 3. Generate final answer current_llm = custom_llm if custom_llm else llm response = current_llm.invoke(qa_prompt.format_messages( context=formatted_context, question=query )) # 4. Return response and all metrics return { 'answer': response.content, 'contexts': [doc.page_content for doc in retrieved_docs], 'retrieved_documents': retrieved_docs, 'metrics': {...} }
from src.rag.hybrid import query_for_evaluation# Basic usage with default model (gpt-4o)result = query_for_evaluation( question="¿Qué es la diabetes gestacional?")# With custom model nameresult = query_for_evaluation( question="¿Cuáles son los signos de parto?", llm_model="gpt-4o-mini")# With custom LLM instancefrom langchain_openai import ChatOpenAIcustom_llm = ChatOpenAI(model_name="gpt-4o", temperature=0)result = query_for_evaluation( question="¿Qué pruebas se hacen en el embarazo?", custom_llm=custom_llm)
No explicit rank fusion: Simple weighted averaging may not optimally combine scores
Fixed weights: Ensemble weights are static, not query-adaptive
Potential redundancy: Both retrievers may return very similar documents
Higher complexity: Requires maintaining two separate indexes
The ensemble weights (0.5/0.5) work well as a default, but you may want to tune them based on your specific corpus and query distribution. More technical queries may benefit from higher BM25 weight.