Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/castorini/quackir/llms.txt

Use this file to discover all available pages before exploring further.

QuackIR exposes a modular Python API for building and querying information retrieval indexes on top of relational databases. The top-level quackir package exports the three enums that control indexing and retrieval behavior. Backend-specific indexers and searchers live in quackir.index and quackir.search, and text analysis utilities are available in quackir.analysis.

Public exports

The following names are importable directly from quackir:
NameKindDescription
IndexTypeEnumSelects sparse (BM25) or dense (vector) indexing.
SearchTypeEnumSelects sparse, dense, or hybrid retrieval.
SearchDBEnumIdentifies the target database backend.
from quackir import IndexType, SearchType, SearchDB

Module structure

# Enums
from quackir import IndexType, SearchType, SearchDB

# Indexers
from quackir.index import DuckDBIndexer, SQLiteIndexer, PostgresIndexer

# Searchers
from quackir.search import DuckDBSearcher, SQLiteSearcher, PostgresSearcher

# Text analysis
from quackir.analysis import tokenize

Quick start

The following example indexes a JSONL corpus with DuckDB and runs a BM25 search against it.
from quackir import IndexType, SearchType
from quackir.index import DuckDBIndexer
from quackir.search import DuckDBSearcher

# --- Indexing ---
indexer = DuckDBIndexer(db_path="my_index.db")
indexer.init_table("corpus", IndexType.SPARSE)
indexer.load_table("corpus", "corpus.jsonl")
indexer.fts_index("corpus")
indexer.close()

# --- Searching ---
searcher = DuckDBSearcher(db_path="my_index.db")
results = searcher.search(
    method=SearchType.SPARSE,
    query_string="information retrieval",
    top_n=10,
)

for doc_id, score in results:
    print(doc_id, score)

searcher.close()
Pass pretokenized=True to load_table if your JSONL data has already been tokenized with Pyserini’s Lucene analyzer, skipping the automatic tokenization step.

API sections

Enums

IndexType and SearchDB — configure indexing backends and index kinds.

SearchType

SearchType — choose sparse, dense, or hybrid retrieval.

Indexers

DuckDB, SQLite, and PostgreSQL indexers for building BM25 and vector indexes.

Searchers

DuckDB, SQLite, and PostgreSQL searchers for running retrieval queries.

Analysis

tokenize() — Pyserini Lucene Analyzer wrapper for sparse preprocessing.

Build docs developers (and LLMs) love