Documentation Index
Fetch the complete documentation index at: https://mintlify.com/castorini/quackir/llms.txt
Use this file to discover all available pages before exploring further.
DuckDBSearcher queries BM25 (sparse), cosine similarity (dense), or Reciprocal Rank Fusion (hybrid) indexes stored in a DuckDB database file. It implements the abstract Searcher base class.
Constructor
Path to the DuckDB database file produced by
DuckDBIndexer.Methods
search
fts_search, embedding_search, or rrf_search based on method, then filters out the query_id document from the results (useful when the query is itself a document in the index).
SearchType.SPARSE, SearchType.DENSE, or SearchType.HYBRID.Document ID to exclude from results. Pass the query document’s own ID to avoid self-matches.
Text query for sparse or hybrid search. Required when
method is SPARSE or HYBRID.Query vector (list of floats) for dense or hybrid search. Required when
method is DENSE or HYBRID.Maximum number of results to return.
When
True and method is SPARSE or HYBRID, the query_string is tokenized with Pyserini’s Lucene Analyzer before querying.Table(s) to search. For
HYBRID, provide two names: [sparse_table, dense_table].RRF rank smoothing constant. Only used when
method is SearchType.HYBRID.List of
(doc_id, score) tuples ordered by descending score. The query_id document is excluded if provided.fts_search
k=0.9 and b=0.4.
Pre-processed query string (tokenized if called via
search()).Maximum number of results to return.
Name of the sparse table to search.
List of
(id, score) tuples.embedding_search
array_cosine_similarity.
Query vector as a list of floats.
Maximum number of results to return.
Name of the dense table to search.
List of
(id, score) tuples ordered by descending cosine similarity.rrf_search
table_names using get_search_type.
Query string for BM25 retrieval. Should already be tokenized.
Query vector for cosine similarity retrieval.
Number of candidates fetched from each sub-ranker before fusion.
RRF rank smoothing constant.
Two table names. The method detects which is sparse and which is dense automatically.
List of
(id, rrf_score) tuples ordered by descending RRF score.get_search_type
Table to inspect.
SearchType.SPARSE if a contents column exists; SearchType.DENSE if an embedding column exists.ValueError if neither column is found.
filter_id
id matches query_id from a results list. Called automatically by search().
List of
(id, score) tuples.Document ID to remove.
Filtered list with the matching entry removed.