QuackIR’s search module queries a previously built index and writes ranked results in TREC run-file format. Three retrieval methods are supported: sparse BM25 (via full-text search), dense cosine similarity (via vector search), and hybrid using reciprocal rank fusion (RRF) over one sparse and one dense index. The search method can be specified explicitly or inferred automatically from the column names in the target table.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/castorini/quackir/llms.txt
Use this file to discover all available pages before exploring further.
Query file formats
- Sparse (JSONL)
- Dense (JSONL)
- Hybrid (JSONL)
- TSV (sparse only)
Each line must be a JSON object with The file may be compressed with gzip (
id and contents fields:.jsonl.gz).Output format
Results are written one result per line in standard TREC run-file format:query_id Q0 doc_id rank score run_tag. This format is directly compatible with trec_eval.
Python API
- Sparse search
- Dense search
- Hybrid search (RRF)
Sparse retrieval uses BM25 via the database’s full-text search index. Works with DuckDB, SQLite, and PostgreSQL.For SQLite, replace
DuckDBSearcher with SQLiteSearcher(db_path="sqlite.db"). For PostgreSQL, use PostgresSearcher(db_name="quackir", user="postgres").By default, the query is tokenized using Pyserini’s default Lucene analyzer before being passed to the index. Pass
tokenize_query=False to skip tokenization if your queries are already preprocessed.CLI usage
Required arguments
Database backend to use. Accepted values:
duckdb, sqlite, postgres.Path to the query file. Accepts JSONL or TSV format (gzip-compressed files are supported). See query file formats above.
Path to write the search results. Results are written in TREC run-file format:
query_id Q0 doc_id rank score run_tag.Database connection arguments
Path to the database file. Used by DuckDB and SQLite. Ignored for PostgreSQL.
PostgreSQL database name. Ignored for DuckDB and SQLite.
PostgreSQL username. Ignored for DuckDB and SQLite.
Optional arguments
Retrieval method. Accepted values:
sparse, dense, hybrid. If omitted, the method is inferred from the column names in the index table: a contents column implies sparse, an embedding column implies dense. If two indexes are provided and they have different column types, hybrid is used.Name of the table to search. Accepts one value for sparse or dense search, or two values for hybrid search (one sparse table and one dense table). Dashes are replaced with underscores.
When set, skips query tokenization. Use this when your query file has already been processed by
quackir.analysis. Has no effect for dense indexes.Number of top results to return per query.
The
k parameter for reciprocal rank fusion. Only applies to hybrid search. Higher values reduce the impact of rank differences between the two result lists.Tag written in the last column of the output file. Defaults to
{search_method}_{db_type} (e.g., sparse_duckdb).Examples
SQLite only supports sparse search. Attempting dense or hybrid retrieval with SQLite will exit with an error message.