Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/castorini/quackir/llms.txt

Use this file to discover all available pages before exploring further.

QuackIR supports three database backends: DuckDB, SQLite, and PostgreSQL. DuckDB and SQLite are file-based and require no server; they are ready to use immediately after installation. PostgreSQL requires a running server and additional initialization, but supports dense and hybrid retrieval in addition to sparse. Every quackir.index and quackir.search command requires you to specify a backend and its connection parameters via CLI flags or environment variables.

Backend comparison

FeatureDuckDBSQLitePostgreSQL
Sparse (BM25)YesYesYes
Dense (vector)YesNoYes
Hybrid (RRF)YesNoYes
Server requiredNoNoYes
Parquet inputYesNoYes
Default db path / namedatabase.dbdatabase.dbquackir
SQLite only supports sparse indexing and search. Attempting dense or hybrid operations with SQLite will raise an error.

DuckDB

DuckDB stores the entire database in a single file. No server or configuration is needed beyond specifying the file path.
--db-type
string
required
Must be duckdb.
--db-path
string
default:"database.db"
Path to the DuckDB database file. Created automatically if it does not exist.
python -m quackir.index \
  --db-type duckdb \
  --db-path database.db \
  --input corpus.jsonl \
  --index-type sparse
Python API:
from quackir.index import DuckDBIndexer
from quackir import IndexType

indexer = DuckDBIndexer(db_path="database.db")
indexer.init_table("corpus", IndexType.SPARSE)
indexer.load_table("corpus", "corpus.jsonl", IndexType.SPARSE)
indexer.fts_index("corpus")
indexer.close()

SQLite

SQLite, like DuckDB, is file-based and requires no server setup.
--db-type
string
required
Must be sqlite.
--db-path
string
default:"database.db"
Path to the SQLite database file. Created automatically if it does not exist.
python -m quackir.index \
  --db-type sqlite \
  --db-path sqlite.db \
  --input corpus.jsonl \
  --index-type sparse
Python API:
from quackir.index import SQLiteIndexer
from quackir import IndexType

indexer = SQLiteIndexer(db_path="sqlite.db")
indexer.init_table("corpus", IndexType.SPARSE)
indexer.load_table("corpus", "corpus.jsonl", IndexType.SPARSE)
indexer.fts_index("corpus")
indexer.close()
If you use both DuckDB and SQLite, point --db-path to different file paths. Both backends use database.db as the default, so using the same path for both will corrupt the other backend’s data.

PostgreSQL

PostgreSQL requires a running server with the quackir database created and — for dense retrieval — the pgvector extension installed.
--db-type
string
required
Must be postgres.
--db-name
string
default:"quackir"
Name of the PostgreSQL database.
--db-user
string
default:"postgres"
PostgreSQL username.

Server setup

1

Initialize a data directory

initdb -D mydb
2

Start the server

pg_ctl -D mydb -l logfile start &
3

Create the database

createdb quackir
psql quackir
4

Create user and enable pgvector

Inside the psql shell:
create user postgres superuser;
create extension vector;
\q
The create extension vector command requires the pgvector conda package. Without it, dense and hybrid retrieval will not be available on the PostgreSQL backend. See installation for the full setup.
Once the server is running, use the backend like this:
python -m quackir.index \
  --db-type postgres \
  --db-name quackir \
  --db-user postgres \
  --input corpus.jsonl \
  --index-type sparse
Python API:
from quackir.index import PostgresIndexer
from quackir import IndexType

indexer = PostgresIndexer(db_name="quackir", user="postgres")
indexer.init_table("corpus", IndexType.SPARSE)
indexer.load_table("corpus", "corpus.jsonl", IndexType.SPARSE)
indexer.fts_index("corpus")
indexer.close()

Dotenv configuration

All database connection parameters can be set via environment variables in a .env file in your working directory. When a .env file is present, its values override any CLI arguments.
# .env
DB_TYPE=duckdb
DB_PATH=database.db
DB_NAME=quackir
DB_USER=postgres
VariableOverridesDescription
DB_TYPE--db-typeBackend type: duckdb, sqlite, or postgres
DB_PATH--db-pathDatabase file path (DuckDB and SQLite)
DB_NAME--db-nameDatabase name (PostgreSQL)
DB_USER--db-userDatabase username (PostgreSQL)
Dotenv values override command-line arguments. If a .env file is present in your working directory, its values take precedence over any --db-* flags you pass.
With a .env file configured, you can omit connection flags from the command line entirely:
# With a .env file present, --db-type and --db-path are read from the environment
python -m quackir.index \
  --input corpus.jsonl \
  --index-type sparse

Build docs developers (and LLMs) love