QuackIR is a Python toolkit for reproducible information retrieval (IR) research built on top of relational database management systems. It supports sparse BM25 retrieval, dense vector search, and hybrid retrieval via Reciprocal Rank Fusion — all without requiring a dedicated search engine or vector database.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/castorini/quackir/llms.txt
Use this file to discover all available pages before exploring further.
Installation
Set up QuackIR with conda and install all dependencies including DuckDB, PostgreSQL, and Pyserini.
Quickstart
Index a corpus and run your first sparse or dense retrieval query in minutes.
Guides
Learn how to index, search, and analyze text across DuckDB, SQLite, and PostgreSQL.
API Reference
Explore the full Python API for indexers, searchers, and analysis utilities.
What is QuackIR?
QuackIR demonstrates that relational database management systems (RDBMSes) like DuckDB can perform information retrieval with effectiveness comparable to established IR toolkits such as Lucene and Faiss. It is designed for researchers who want reproducible IR experiments and for practitioners who want to add retrieval capabilities to an existing relational database infrastructure.Sparse Retrieval
BM25 full-text search using DuckDB FTS, SQLite FTS5, or PostgreSQL GIN indexes.
Dense Retrieval
Cosine similarity vector search with pre-encoded embeddings in DuckDB or PostgreSQL.
Hybrid Retrieval
Reciprocal Rank Fusion combining sparse and dense results in DuckDB or PostgreSQL.
Getting started
Supported databases
| Feature | DuckDB | SQLite | PostgreSQL |
|---|---|---|---|
| Sparse (BM25) | Yes | Yes | Yes |
| Dense (vector) | Yes | No | Yes |
| Hybrid (RRF) | Yes | No | Yes |