This guide reproduces hybrid retrieval across a subset of BEIR v1.0.0 datasets using Reciprocal Rank Fusion (RRF) of sparse BM25 and dense BGE-base-en-v1.5 results. Hybrid retrieval is available for DuckDB and PostgreSQL. RRF combines ranked lists from two or more retrieval systems without requiring score normalization. For each document, its fused score is computed as the sum of reciprocal ranks across the input lists. This allows sparse and dense rankings to be merged in a single query without tuning weighting parameters.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/castorini/quackir/llms.txt
Use this file to discover all available pages before exploring further.
The sparse corpus uses a flat index with
title and text concatenated into contents. The dense corpus uses BGE-base-en-v1.5 embeddings. Hybrid retrieval runs both in a single search call by passing both table names to quackir.search.Prerequisites
Hybrid retrieval requires both sparse and dense indexes to already be built. Complete the following guides first:Sparse retrieval
Build BM25 sparse indexes for DuckDB and PostgreSQL.
Dense retrieval
Build BGE-base-en-v1.5 dense indexes for DuckDB and PostgreSQL.
--recurse-submodules so the tools/topics-and-qrels/ submodule is available.
Download data
If you have not already downloaded the corpora, get both the raw BEIR corpus and the pre-encoded BGE embeddings:Hybrid retrieval corpora
The hybrid experiments cover the following 17 datasets:trec-covid, quora, hotpotqa) are excluded because the high query latency makes them impractical for hybrid search at this scale.
Step-by-step hybrid retrieval
Tokenize corpora and prepare query files
Tokenize the sparse corpus and queries, then combine the tokenized queries with their pre-encoded BGE embeddings into a single file used for hybrid search:The
combine_contents_vector.py script merges tokenized query text with pre-encoded query vectors into a single JSONL file. This combined format is required because hybrid search needs both the tokenized text (for sparse BM25) and the embedding vector (for dense cosine similarity) in a single pass.Alternatively, run the dedicated scripts:Build sparse and dense indexes
Index both the sparse and dense representations for each corpus. Note that hybrid retrieval uses distinct table names with Alternatively, run the dedicated scripts:
_sparse and _dense suffixes to keep the two indexes separate within the same database file:Run hybrid retrieval
Pass both the sparse and dense table names to The
quackir.search. QuackIR automatically detects that two table names are provided and performs RRF fusion:--index flag accepts two table names: the sparse index (used for BM25) and the dense index (used for cosine similarity). The combined query file provides both tokenized text and the embedding vector for each query in a single pass.Alternatively, run the dedicated script:Results
The following nDCG@10 scores are reproducible with the commands above. Datasets not included in the hybrid experiments are marked with a dash (-).
| Corpus | DuckDB | PostgreSQL |
|---|---|---|
nfcorpus | 0.3621 | 0.3626 |
fiqa | 0.3683 | 0.2881 |
arguana | 0.5063 | 0.3449 |
cqadupstack-android | 0.4653 | 0.4117 |
cqadupstack-english | 0.4436 | 0.3913 |
cqadupstack-gaming | 0.5628 | 0.5022 |
cqadupstack-gis | 0.3683 | 0.3290 |
cqadupstack-mathematica | 0.2744 | 0.2325 |
cqadupstack-physics | 0.4138 | 0.3593 |
cqadupstack-programmers | 0.3732 | 0.3293 |
cqadupstack-stats | 0.3400 | 0.3130 |
cqadupstack-tex | 0.2930 | 0.2581 |
cqadupstack-unix | 0.3620 | 0.3302 |
cqadupstack-webmasters | 0.3723 | 0.3391 |
cqadupstack-wordpress | 0.3362 | 0.2805 |
scidocs | 0.1943 | 0.1750 |
scifact | 0.7440 | 0.6800 |
Comparison to sparse and dense alone
RRF fusion generally improves over either individual method on most datasets. The table below shows nDCG@10 for all three methods on the DuckDB backend for the datasets covered by all three experiments:| Corpus | Sparse (BM25) | Dense (BGE) | Hybrid (RRF) |
|---|---|---|---|
nfcorpus | 0.3206 | 0.3735 | 0.3621 |
fiqa | 0.2378 | 0.4065 | 0.3683 |
arguana | 0.3179 | 0.6361 | 0.5063 |
cqadupstack-android | 0.3812 | 0.5075 | 0.4653 |
cqadupstack-english | 0.3441 | 0.4857 | 0.4436 |
cqadupstack-gaming | 0.4827 | 0.5965 | 0.5628 |
cqadupstack-gis | 0.2893 | 0.4127 | 0.3683 |
cqadupstack-mathematica | 0.2036 | 0.3163 | 0.2744 |
cqadupstack-physics | 0.3213 | 0.4722 | 0.4138 |
cqadupstack-programmers | 0.2803 | 0.4242 | 0.3732 |
cqadupstack-stats | 0.2728 | 0.3732 | 0.3400 |
cqadupstack-tex | 0.2256 | 0.3115 | 0.2930 |
cqadupstack-unix | 0.2779 | 0.4219 | 0.3620 |
cqadupstack-webmasters | 0.3070 | 0.4065 | 0.3723 |
cqadupstack-wordpress | 0.2485 | 0.3547 | 0.3362 |
scidocs | 0.1502 | 0.2170 | 0.1943 |
scifact | 0.6795 | 0.7408 | 0.7440 |
RRF consistently outperforms sparse retrieval alone. On most datasets it falls between sparse and dense, or exceeds dense (as with
scifact). The degree of improvement depends on how complementary the sparse and dense rankings are for each domain.