Documentation Index
Fetch the complete documentation index at: https://mintlify.com/castorini/quackir/llms.txt
Use this file to discover all available pages before exploring further.
PostgresIndexer creates and populates BM25-style full-text (sparse) or pgvector (dense) indexes in a PostgreSQL database. It implements the abstract Indexer base class and supports loading data from both JSONL and Parquet files.
Dense indexing requires the pgvector extension to be installed in your PostgreSQL instance. The
embedding column is created as a vector(embedding_dim) type.Constructor
psycopg2 connection to the specified PostgreSQL database.
Name of the PostgreSQL database to connect to.
PostgreSQL username.
Methods
init_table
- Sparse schema:
(id TEXT PRIMARY KEY, contents TEXT) - Dense schema:
(id TEXT PRIMARY KEY, embedding vector(embedding_dim))
Name of the table to create.
IndexType.SPARSE or IndexType.DENSE.Dimension of the embedding vectors. Only used when
index_type is IndexType.DENSE.load_table
load_jsonl_table or load_parquet_table based on the file extension. If index_type is None, it is detected via get_index_type.
Parquet loading is currently restricted to
IndexType.DENSE. Passing a .parquet file with IndexType.SPARSE raises a ValueError.Target table name.
Path to a
.jsonl or .parquet source file.Override the index type. When
None, get_index_type(table_name) is called.When
False, contents values are tokenized with Pyserini’s Lucene Analyzer before insertion.load_jsonl_table
psycopg2.extras.execute_values. Each line must be a JSON object with id and either contents (sparse) or vector (dense).
Null bytes (
\x00) in contents fields are replaced with the Unicode replacement character (\uFFFD) before insertion because PostgreSQL does not allow null characters in text columns.Target table name.
Path to the
.jsonl file.IndexType.SPARSE or IndexType.DENSE.Skip tokenization when
True. Applies to sparse indexing only.load_parquet_table
vector column as a pgvector-compatible string [f1, f2, ...], and bulk-copies rows using COPY … FROM STDIN WITH CSV.
Target table name.
Path to the
.parquet file.Must be
IndexType.DENSE.Unused; present for interface consistency.
fts_index
contents column using to_tsvector('simple', contents), enabling efficient full-text search with to_tsquery.
Name of the sparse table to index.
get_index_type
information_schema.columns to detect index type from column names.
Table to inspect.
IndexType.SPARSE if a contents column exists; IndexType.DENSE if an embedding column exists.ValueError if neither column is found.
get_num_rows
Table to count.
Row count as an integer.
close
psycopg2 connection.