SQLiteIndexer API reference

SQLiteIndexer creates and populates BM25 (sparse) indexes stored in a SQLite database file. It implements the abstract Indexer base class and loads data from JSONL files. Dense indexing is not supported.

from quackir.index import SQLiteIndexer

SQLiteIndexer only supports IndexType.SPARSE. Attempting to create a dense table or load non-sparse data raises a ValueError.

Constructor

SQLiteIndexer(db_path="sqlite.db")

Opens a SQLite connection to the specified file. The file is created if it does not already exist.

db_path

string

default:"sqlite.db"

Path to the SQLite database file.

Methods

init_table

indexer.init_table(table_name, index_type, embedding_dim=768)

Drops any existing table with the given name and creates a new (id TEXT PRIMARY KEY, contents TEXT) table. Raises ValueError if index_type is not IndexType.SPARSE.

table_name

string

required

Name of the table to create.

index_type

IndexType

required

Must be IndexType.SPARSE. Any other value raises ValueError.

embedding_dim

number

default:"768"

Accepted for interface consistency but unused.

load_table

indexer.load_table(table_name, file_path, index_type=None, pretokenized=False)

Dispatches to load_jsonl_table based on the file extension. Parquet loading is not implemented.

table_name

string

required

Target table name.

file_path

string

required

Path to a .jsonl source file.

index_type

IndexType

default:"None"

Override the index type. When None, get_index_type(table_name) is called.

pretokenized

boolean

default:"false"

When False, contents values are tokenized with Pyserini’s Lucene Analyzer before insertion.

load_jsonl_table

indexer.load_jsonl_table(table_name, file_path, index_type, pretokenized=False)

Reads a JSONL file and inserts (id, contents) rows. Each line must be a JSON object with id and contents fields. Raises ValueError if index_type is not IndexType.SPARSE.

table_name

string

required

Target table name.

file_path

string

required

Path to the .jsonl file.

index_type

IndexType

required

Must be IndexType.SPARSE.

pretokenized

boolean

default:"false"

Skip tokenization when True.

fts_index

indexer.fts_index(table_name="corpus")

Creates an FTS5 virtual table named fts_{table_name} backed by the base table, using the porter tokenizer. The virtual table is populated immediately from the base table.

CREATE VIRTUAL TABLE fts_corpus USING fts5(
    id, contents,
    content='corpus', content_rowid='rowid', tokenize = 'porter'
)

The porter stemmer is used here at the SQLite FTS5 level. However, QuackIR’s tokenization pipeline (Pyserini Lucene Analyzer) already stems tokens before they are stored, so the FTS5 porter stemmer provides a second pass. For best retrieval consistency, use the default pretokenized=False so that index and query terms go through the same pipeline.

table_name

string

default:"corpus"

Name of the base table to build the FTS5 virtual table from.

get_index_type

indexer.get_index_type(table_name)

Inspects column names. Returns IndexType.SPARSE if a contents column is present, otherwise raises ValueError.

table_name

string

required

Table to inspect.

return

IndexType

Always IndexType.SPARSE for valid SQLite tables.

get_num_rows

indexer.get_num_rows(table_name)

Returns the number of rows in the given table.

table_name

string

required

Table to count.

return

number

Row count as an integer.

close

indexer.close()

Closes the underlying SQLite connection.

Example

from quackir import IndexType
from quackir.index import SQLiteIndexer

indexer = SQLiteIndexer(db_path="sparse.db")

indexer.init_table("corpus", IndexType.SPARSE)
indexer.load_table("corpus", "corpus.jsonl")  # auto-tokenizes contents
indexer.fts_index("corpus")

print(indexer.get_num_rows("corpus"), "documents indexed")
indexer.close()

Pre-tokenized JSONL

from quackir import IndexType
from quackir.index import SQLiteIndexer

indexer = SQLiteIndexer(db_path="pretok.db")
indexer.init_table("corpus", IndexType.SPARSE)
indexer.load_table("corpus", "pretokenized.jsonl", pretokenized=True)
indexer.fts_index("corpus")
indexer.close()

Core

Indexers

Searchers

Analysis

SQLiteIndexer API reference

Constructor

Methods

init_table

load_table

load_jsonl_table

fts_index

get_index_type

get_num_rows

close

Example

Pre-tokenized JSONL

Build docs developers (and LLMs) love

Core

Indexers

Searchers

Analysis

Documentation Index

​Constructor

​Methods

​init_table

​load_table

​load_jsonl_table

​fts_index

​get_index_type

​get_num_rows

​close

​Example

​Pre-tokenized JSONL

Build docs developers (and LLMs) love

Constructor

Methods

init_table

load_table

load_jsonl_table

fts_index

get_index_type

get_num_rows

close

Example

Pre-tokenized JSONL