QuackIR: IR Toolkit Built on Relational Databases

QuackIR is a Python toolkit for reproducible information retrieval (IR) research built on top of relational database management systems. It supports sparse BM25 retrieval, dense vector search, and hybrid retrieval via Reciprocal Rank Fusion — all without requiring a dedicated search engine or vector database.

Installation

Set up QuackIR with conda and install all dependencies including DuckDB, PostgreSQL, and Pyserini.

Quickstart

Index a corpus and run your first sparse or dense retrieval query in minutes.

Guides

Learn how to index, search, and analyze text across DuckDB, SQLite, and PostgreSQL.

API Reference

Explore the full Python API for indexers, searchers, and analysis utilities.

What is QuackIR?

QuackIR demonstrates that relational database management systems (RDBMSes) like DuckDB can perform information retrieval with effectiveness comparable to established IR toolkits such as Lucene and Faiss. It is designed for researchers who want reproducible IR experiments and for practitioners who want to add retrieval capabilities to an existing relational database infrastructure.

Sparse Retrieval

BM25 full-text search using DuckDB FTS, SQLite FTS5, or PostgreSQL GIN indexes.

Dense Retrieval

Cosine similarity vector search with pre-encoded embeddings in DuckDB or PostgreSQL.

Hybrid Retrieval

Reciprocal Rank Fusion combining sparse and dense results in DuckDB or PostgreSQL.

Getting started

Install QuackIR

Clone the repository and install all dependencies using conda and pip.

git clone https://github.com/castorini/quackir.git --recurse-submodules
conda create -n quackir python=3.10
conda activate quackir
pip install -r requirements.txt

Index your corpus

Load a JSONL corpus into DuckDB and build a full-text search index.

from quackir.index import DuckDBIndexer
from quackir import IndexType

indexer = DuckDBIndexer()
indexer.init_table("corpus", IndexType.SPARSE)
indexer.load_table("corpus", "corpus.jsonl")
indexer.fts_index("corpus")
indexer.close()

Run your first search

Query the index using BM25 sparse retrieval.

from quackir.search import DuckDBSearcher
from quackir import SearchType

searcher = DuckDBSearcher()
results = searcher.search(SearchType.SPARSE, query_string="your query here", table_names=["corpus"])
print(results)
searcher.close()

Evaluate results

Use Pyserini’s trec_eval to measure retrieval effectiveness (nDCG, MAP, etc.).

python -m pyserini.eval.trec_eval -c -m ndcg_cut.10 qrels.txt run.txt

Supported databases

Feature	DuckDB	SQLite	PostgreSQL
Sparse (BM25)	Yes	Yes	Yes
Dense (vector)	Yes	No	Yes
Hybrid (RRF)	Yes	No	Yes

DuckDB requires no server setup and is the recommended starting point for most use cases. It is the fastest way to get running with QuackIR.

Get Started

Guides

Experiments

QuackIR: IR Toolkit Built on Relational Databases

Installation

Quickstart

Guides

API Reference

What is QuackIR?

Sparse Retrieval

Dense Retrieval

Hybrid Retrieval

Getting started

Supported databases

Build docs developers (and LLMs) love

Get Started

Guides

Experiments

Documentation Index

Installation

Quickstart

Guides

API Reference

​What is QuackIR?

Sparse Retrieval

Dense Retrieval

Hybrid Retrieval

​Getting started

​Supported databases

Build docs developers (and LLMs) love

What is QuackIR?

Getting started

Supported databases