Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/castorini/quackir/llms.txt

Use this file to discover all available pages before exploring further.

QuackIR requires Python 3.10, a set of conda-managed system packages (including Java for Pyserini’s Lucene analyzer), and several Python packages. DuckDB and SQLite work out of the box with no server setup. PostgreSQL requires additional initialization if you want to use the PostgreSQL backend.
1

Clone the repository

Clone the repository and its submodules. The --recurse-submodules flag is required to pull in dependencies bundled as git submodules.
git clone https://github.com/castorini/quackir.git --recurse-submodules
2

Create and activate a conda environment

QuackIR requires Python 3.10. Create a dedicated environment to avoid conflicts with other projects.
conda create -n quackir python=3.10
conda activate quackir
3

Install conda dependencies

Install system-level dependencies via conda-forge. This includes PostgreSQL, the pgvector extension, OpenJDK 21 (required by Pyserini’s Lucene analyzer), and Maven.
conda install -c conda-forge postgresql pgvector openjdk=21 maven -y
PostgreSQL and pgvector are installed here so that the psycopg2 Python package can link against them at install time. You do not need to run a PostgreSQL server unless you plan to use the PostgreSQL backend.
4

Install Python dependencies

From the repository root, install all Python dependencies:
pip install -r requirements.txt
The requirements.txt file installs the following packages:
PackageVersionPurpose
duckdb1.1.1In-process analytical database for sparse, dense, and hybrid retrieval
tqdm4.66.5Progress bars during indexing
pyserini1.0.0Lucene Porter tokenizer for BM25 text analysis
psycopg22.9.10PostgreSQL adapter for the PostgreSQL backend
dotenvlatestLoad database configuration from .env files
pyarrowlatestParquet file support for loading dense embedding tables
fastparquetlatestParquet file support (alternative engine)
faiss-cpulatestApproximate nearest-neighbor search used by Pyserini
5

Set up PostgreSQL (optional)

This step is only needed if you intend to use the PostgreSQL backend (--db-type postgres). DuckDB and SQLite require no server setup and are ready to use immediately after the previous step.Initialize a local PostgreSQL data directory, start the server, create the database, and enable the pgvector extension for dense retrieval:
initdb -D mydb
pg_ctl -D mydb -l logfile start &
createdb quackir
psql quackir
Inside the psql shell, run:
create user postgres superuser;
create extension vector;
\q
The create extension vector command requires the pgvector conda package installed in step 3. If you skip that package, dense and hybrid retrieval will not be available on the PostgreSQL backend.

Verify the installation

After completing the steps above, verify that QuackIR imports correctly:
from quackir import IndexType, SearchType, SearchDB

print(list(IndexType))   # [<IndexType.SPARSE: 'sparse'>, <IndexType.DENSE: 'dense'>]
print(list(SearchType))  # [<SearchType.SPARSE: 'sparse'>, <SearchType.DENSE: 'dense'>, <SearchType.HYBRID: 'hybrid'>]
print(list(SearchDB))    # [<SearchDB.DUCKDB: 'duckdb'>, <SearchDB.SQLITE: 'sqlite'>, <SearchDB.POSTGRES: 'postgres'>]
If the imports succeed without errors, your environment is ready.

Build docs developers (and LLMs) love