Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/castorini/quackir/llms.txt

Use this file to discover all available pages before exploring further.

QuackIR’s search module queries a previously built index and writes ranked results in TREC run-file format. Three retrieval methods are supported: sparse BM25 (via full-text search), dense cosine similarity (via vector search), and hybrid using reciprocal rank fusion (RRF) over one sparse and one dense index. The search method can be specified explicitly or inferred automatically from the column names in the target table.

Query file formats

Each line must be a JSON object with id and contents fields:
{"id": "q1", "contents": "what is a lobster roll"}
{"id": "q2", "contents": "in-process database systems"}
The file may be compressed with gzip (.jsonl.gz).

Output format

Results are written one result per line in standard TREC run-file format:
q1 Q0 doc3 1 0.8421 sparse_duckdb
q1 Q0 doc1 2 0.7103 sparse_duckdb
q2 Q0 doc2 1 0.9012 sparse_duckdb
Each field: query_id Q0 doc_id rank score run_tag. This format is directly compatible with trec_eval.

Python API

CLI usage

python -m quackir.search \
  --db-type <duckdb|sqlite|postgres> \
  --topics <path> \
  --output <path> \
  [options]

Required arguments

--db-type
string
required
Database backend to use. Accepted values: duckdb, sqlite, postgres.
--topics
string
required
Path to the query file. Accepts JSONL or TSV format (gzip-compressed files are supported). See query file formats above.
--output
string
required
Path to write the search results. Results are written in TREC run-file format: query_id Q0 doc_id rank score run_tag.

Database connection arguments

--db-path
string
default:"database.db"
Path to the database file. Used by DuckDB and SQLite. Ignored for PostgreSQL.
--db-name
string
default:"quackir"
PostgreSQL database name. Ignored for DuckDB and SQLite.
--db-user
string
default:"postgres"
PostgreSQL username. Ignored for DuckDB and SQLite.

Optional arguments

--search-method
string
Retrieval method. Accepted values: sparse, dense, hybrid. If omitted, the method is inferred from the column names in the index table: a contents column implies sparse, an embedding column implies dense. If two indexes are provided and they have different column types, hybrid is used.
--index
string
default:"corpus"
Name of the table to search. Accepts one value for sparse or dense search, or two values for hybrid search (one sparse table and one dense table). Dashes are replaced with underscores.
--pretokenized
boolean
default:"false"
When set, skips query tokenization. Use this when your query file has already been processed by quackir.analysis. Has no effect for dense indexes.
--hits
integer
default:"1000"
Number of top results to return per query.
--rrf-k
integer
default:"60"
The k parameter for reciprocal rank fusion. Only applies to hybrid search. Higher values reduce the impact of rank differences between the two result lists.
--run-tag
string
Tag written in the last column of the output file. Defaults to {search_method}_{db_type} (e.g., sparse_duckdb).

Examples

# Sparse BM25 search with DuckDB
python -m quackir.search \
  --db-type duckdb \
  --db-path database.db \
  --topics queries.jsonl \
  --search-method sparse \
  --output run.txt

# Dense cosine similarity search with DuckDB
python -m quackir.search \
  --db-type duckdb \
  --db-path database.db \
  --topics queries_dense.jsonl \
  --search-method dense \
  --index corpus_dense \
  --output run_dense.txt

# Hybrid RRF search with DuckDB (sparse + dense indexes)
python -m quackir.search \
  --db-type duckdb \
  --db-path database.db \
  --topics queries_hybrid.jsonl \
  --search-method hybrid \
  --index corpus corpus_dense \
  --rrf-k 60 \
  --output run_hybrid.txt

# Sparse search with PostgreSQL, TSV query file
python -m quackir.search \
  --db-type postgres \
  --db-name quackir \
  --db-user postgres \
  --topics queries.tsv \
  --search-method sparse \
  --output run_pg.txt

# Sparse search with SQLite
python -m quackir.search \
  --db-type sqlite \
  --db-path sqlite.db \
  --topics queries.jsonl \
  --search-method sparse \
  --output run_sqlite.txt
SQLite only supports sparse search. Attempting dense or hybrid retrieval with SQLite will exit with an error message.
To evaluate results with trec_eval, pass the output file directly:
python -m pyserini.eval.trec_eval -c -m ndcg_cut.10 qrels.txt run.txt

Build docs developers (and LLMs) love