Documentation Index
Fetch the complete documentation index at: https://mintlify.com/castorini/quackir/llms.txt
Use this file to discover all available pages before exploring further.
SQLiteIndexer creates and populates BM25 (sparse) indexes stored in a SQLite database file. It implements the abstract Indexer base class and loads data from JSONL files. Dense indexing is not supported.
Constructor
Path to the SQLite database file.
Methods
init_table
(id TEXT PRIMARY KEY, contents TEXT) table. Raises ValueError if index_type is not IndexType.SPARSE.
Name of the table to create.
Must be
IndexType.SPARSE. Any other value raises ValueError.Accepted for interface consistency but unused.
load_table
load_jsonl_table based on the file extension. Parquet loading is not implemented.
Target table name.
Path to a
.jsonl source file.Override the index type. When
None, get_index_type(table_name) is called.When
False, contents values are tokenized with Pyserini’s Lucene Analyzer before insertion.load_jsonl_table
(id, contents) rows. Each line must be a JSON object with id and contents fields. Raises ValueError if index_type is not IndexType.SPARSE.
Target table name.
Path to the
.jsonl file.Must be
IndexType.SPARSE.Skip tokenization when
True.fts_index
fts_{table_name} backed by the base table, using the porter tokenizer. The virtual table is populated immediately from the base table.
The porter stemmer is used here at the SQLite FTS5 level. However, QuackIR’s tokenization pipeline (Pyserini Lucene Analyzer) already stems tokens before they are stored, so the FTS5 porter stemmer provides a second pass. For best retrieval consistency, use the default
pretokenized=False so that index and query terms go through the same pipeline.Name of the base table to build the FTS5 virtual table from.
get_index_type
IndexType.SPARSE if a contents column is present, otherwise raises ValueError.
Table to inspect.
Always
IndexType.SPARSE for valid SQLite tables.get_num_rows
Table to count.
Row count as an integer.