Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/castorini/quackir/llms.txt

Use this file to discover all available pages before exploring further.

PostgresSearcher queries full-text (sparse), pgvector (dense), or Reciprocal Rank Fusion (hybrid) indexes stored in a PostgreSQL database. It implements the abstract Searcher base class.
from quackir.search import PostgresSearcher
Dense and hybrid retrieval require the pgvector extension to be installed in your PostgreSQL instance.

Constructor

PostgresSearcher(db_name="quackir", user="postgres")
Opens a psycopg2 connection to the specified PostgreSQL database.
db_name
string
default:"quackir"
Name of the PostgreSQL database to connect to.
user
string
default:"postgres"
PostgreSQL username.

Methods

searcher.search(
    method,
    query_id=None,
    query_string=None,
    query_embedding=None,
    top_n=5,
    tokenize_query=True,
    table_names=["corpus"],
    rrf_k=60,
)
Main entry point for retrieval. Dispatches to fts_search, embedding_search, or rrf_search based on method, then filters the query_id document from results.
method
SearchType
required
SearchType.SPARSE, SearchType.DENSE, or SearchType.HYBRID.
query_id
string
default:"None"
Document ID to exclude from results.
query_string
string
default:"None"
Text query for sparse or hybrid search.
query_embedding
number[]
default:"None"
Query vector for dense or hybrid search.
top_n
number
default:"5"
Maximum number of results to return.
tokenize_query
boolean
default:"true"
When True and method is SPARSE or HYBRID, the query_string is tokenized with Pyserini’s Lucene Analyzer before querying.
table_names
string[]
default:"[\"corpus\"]"
Table(s) to search. For HYBRID, provide [sparse_table, dense_table].
rrf_k
number
default:"60"
RRF rank smoothing constant. Only used when method is SearchType.HYBRID.
return
list
List of (doc_id, score) tuples ordered by descending score.

searcher.fts_search(query_string, top_n=5, table_name="corpus")
Executes a PostgreSQL full-text search using to_tsquery('simple', …) and ranks results with ts_rank. The query string is sanitized before use: non-word characters are stripped and remaining terms are joined with | (OR).
query_string
string
required
Query string. Cleaned and converted to a tsquery expression internally.
top_n
number
default:"5"
Maximum number of results to return.
table_name
string
default:"corpus"
Name of the sparse table with a GIN index on contents.
return
list
List of (id, score) tuples.

searcher.embedding_search(query_embedding, top_n=5, table_name="corpus")
Computes cosine similarity using pgvector’s <=> distance operator. The score returned is 1 - cosine_distance, so higher values indicate greater similarity.
query_embedding
number[]
required
Query vector. Passed directly as the ::vector cast argument.
top_n
number
default:"5"
Maximum number of results to return.
table_name
string
default:"corpus"
Name of the dense table with a vector column.
return
list
List of (id, score) tuples ordered by descending similarity.

searcher.rrf_search(query_string, query_embedding, top_n=5, k=60, table_names=["sparse", "dense"])
Combines full-text and semantic ranking using Reciprocal Rank Fusion. Each result’s RRF score is:
rrf_score = 1 / (k + keyword_rank) + 1 / (k + semantic_rank)
Sparse and dense tables are auto-detected from table_names using get_search_type.
query_string
string
required
Query string for keyword retrieval. Cleaned to a tsquery expression internally.
query_embedding
number[]
required
Query vector for semantic retrieval.
top_n
number
default:"5"
Number of candidates fetched from each sub-ranker before fusion.
k
number
default:"60"
RRF rank smoothing constant.
table_names
string[]
default:"[\"sparse\", \"dense\"]"
Two table names. The method auto-detects which is sparse and which is dense.
return
list
List of (id, rrf_score) tuples ordered by descending RRF score.

get_search_type

searcher.get_search_type(table_name)
Queries information_schema.columns to detect the table type.
table_name
string
required
Table to inspect.
return
SearchType
SearchType.SPARSE if a contents column exists; SearchType.DENSE if an embedding column exists.
Raises ValueError if neither column is found.

filter_id

PostgresSearcher.filter_id(results, query_id)
Static method. Removes the entry whose id matches query_id. Called automatically by search().
results
list
required
List of (id, score) tuples.
query_id
string
default:"None"
Document ID to remove.
return
list
Filtered results list.

close

searcher.close()
Closes the underlying psycopg2 connection.

Examples

from quackir import SearchType
from quackir.search import PostgresSearcher

searcher = PostgresSearcher(db_name="mydb", user="myuser")

results = searcher.search(
    method=SearchType.SPARSE,
    query_string="information retrieval benchmarks",
    top_n=10,
)

for doc_id, score in results:
    print(doc_id, score)

searcher.close()
from quackir import SearchType
from quackir.search import PostgresSearcher

query_vector = [0.12, -0.34, 0.56]  # replace with a real embedding

searcher = PostgresSearcher(db_name="mydb", user="myuser")

results = searcher.search(
    method=SearchType.DENSE,
    query_embedding=query_vector,
    top_n=10,
    table_names=["dense_corpus"],
)

for doc_id, score in results:
    print(doc_id, score)

searcher.close()
from quackir import SearchType
from quackir.search import PostgresSearcher

searcher = PostgresSearcher(db_name="mydb", user="myuser")

results = searcher.search(
    method=SearchType.HYBRID,
    query_string="neural retrieval",
    query_embedding=[0.12, -0.34, 0.56],
    top_n=10,
    table_names=["sparse_corpus", "dense_corpus"],
    rrf_k=60,
)

for doc_id, score in results:
    print(doc_id, score)

searcher.close()

Build docs developers (and LLMs) love