Documentation Index
Fetch the complete documentation index at: https://mintlify.com/castorini/quackir/llms.txt
Use this file to discover all available pages before exploring further.
QuackIR supports three database backends: DuckDB, SQLite, and PostgreSQL. DuckDB and SQLite are file-based and require no server; they are ready to use immediately after installation. PostgreSQL requires a running server and additional initialization, but supports dense and hybrid retrieval in addition to sparse. Every quackir.index and quackir.search command requires you to specify a backend and its connection parameters via CLI flags or environment variables.
Backend comparison
| Feature | DuckDB | SQLite | PostgreSQL |
|---|
| Sparse (BM25) | Yes | Yes | Yes |
| Dense (vector) | Yes | No | Yes |
| Hybrid (RRF) | Yes | No | Yes |
| Server required | No | No | Yes |
| Parquet input | Yes | No | Yes |
| Default db path / name | database.db | database.db | quackir |
SQLite only supports sparse indexing and search. Attempting dense or hybrid operations with SQLite will raise an error.
DuckDB
DuckDB stores the entire database in a single file. No server or configuration is needed beyond specifying the file path.
--db-path
string
default:"database.db"
Path to the DuckDB database file. Created automatically if it does not exist.
python -m quackir.index \
--db-type duckdb \
--db-path database.db \
--input corpus.jsonl \
--index-type sparse
Python API:
from quackir.index import DuckDBIndexer
from quackir import IndexType
indexer = DuckDBIndexer(db_path="database.db")
indexer.init_table("corpus", IndexType.SPARSE)
indexer.load_table("corpus", "corpus.jsonl", IndexType.SPARSE)
indexer.fts_index("corpus")
indexer.close()
SQLite
SQLite, like DuckDB, is file-based and requires no server setup.
--db-path
string
default:"database.db"
Path to the SQLite database file. Created automatically if it does not exist.
python -m quackir.index \
--db-type sqlite \
--db-path sqlite.db \
--input corpus.jsonl \
--index-type sparse
Python API:
from quackir.index import SQLiteIndexer
from quackir import IndexType
indexer = SQLiteIndexer(db_path="sqlite.db")
indexer.init_table("corpus", IndexType.SPARSE)
indexer.load_table("corpus", "corpus.jsonl", IndexType.SPARSE)
indexer.fts_index("corpus")
indexer.close()
If you use both DuckDB and SQLite, point --db-path to different file paths. Both backends use database.db as the default, so using the same path for both will corrupt the other backend’s data.
PostgreSQL
PostgreSQL requires a running server with the quackir database created and — for dense retrieval — the pgvector extension installed.
Name of the PostgreSQL database.
Server setup
Initialize a data directory
Start the server
pg_ctl -D mydb -l logfile start &
Create the database
createdb quackir
psql quackir
Create user and enable pgvector
Inside the psql shell:create user postgres superuser;
create extension vector;
\q
The create extension vector command requires the pgvector conda package. Without it, dense and hybrid retrieval will not be available on the PostgreSQL backend. See installation for the full setup.
Once the server is running, use the backend like this:
python -m quackir.index \
--db-type postgres \
--db-name quackir \
--db-user postgres \
--input corpus.jsonl \
--index-type sparse
Python API:
from quackir.index import PostgresIndexer
from quackir import IndexType
indexer = PostgresIndexer(db_name="quackir", user="postgres")
indexer.init_table("corpus", IndexType.SPARSE)
indexer.load_table("corpus", "corpus.jsonl", IndexType.SPARSE)
indexer.fts_index("corpus")
indexer.close()
Dotenv configuration
All database connection parameters can be set via environment variables in a .env file in your working directory. When a .env file is present, its values override any CLI arguments.
# .env
DB_TYPE=duckdb
DB_PATH=database.db
DB_NAME=quackir
DB_USER=postgres
| Variable | Overrides | Description |
|---|
DB_TYPE | --db-type | Backend type: duckdb, sqlite, or postgres |
DB_PATH | --db-path | Database file path (DuckDB and SQLite) |
DB_NAME | --db-name | Database name (PostgreSQL) |
DB_USER | --db-user | Database username (PostgreSQL) |
Dotenv values override command-line arguments. If a .env file is present in your working directory, its values take precedence over any --db-* flags you pass.
With a .env file configured, you can omit connection flags from the command line entirely:
# With a .env file present, --db-type and --db-path are read from the environment
python -m quackir.index \
--input corpus.jsonl \
--index-type sparse