QuackIR requires Python 3.10, a set of conda-managed system packages (including Java for Pyserini’s Lucene analyzer), and several Python packages. DuckDB and SQLite work out of the box with no server setup. PostgreSQL requires additional initialization if you want to use the PostgreSQL backend.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/castorini/quackir/llms.txt
Use this file to discover all available pages before exploring further.
Clone the repository
Clone the repository and its submodules. The
--recurse-submodules flag is required to pull in dependencies bundled as git submodules.Create and activate a conda environment
QuackIR requires Python 3.10. Create a dedicated environment to avoid conflicts with other projects.
Install conda dependencies
Install system-level dependencies via conda-forge. This includes PostgreSQL, the pgvector extension, OpenJDK 21 (required by Pyserini’s Lucene analyzer), and Maven.
PostgreSQL and pgvector are installed here so that the
psycopg2 Python package can link against them at install time. You do not need to run a PostgreSQL server unless you plan to use the PostgreSQL backend.Install Python dependencies
From the repository root, install all Python dependencies:The
requirements.txt file installs the following packages:| Package | Version | Purpose |
|---|---|---|
duckdb | 1.1.1 | In-process analytical database for sparse, dense, and hybrid retrieval |
tqdm | 4.66.5 | Progress bars during indexing |
pyserini | 1.0.0 | Lucene Porter tokenizer for BM25 text analysis |
psycopg2 | 2.9.10 | PostgreSQL adapter for the PostgreSQL backend |
dotenv | latest | Load database configuration from .env files |
pyarrow | latest | Parquet file support for loading dense embedding tables |
fastparquet | latest | Parquet file support (alternative engine) |
faiss-cpu | latest | Approximate nearest-neighbor search used by Pyserini |
Set up PostgreSQL (optional)
This step is only needed if you intend to use the PostgreSQL backend (Inside the
--db-type postgres). DuckDB and SQLite require no server setup and are ready to use immediately after the previous step.Initialize a local PostgreSQL data directory, start the server, create the database, and enable the pgvector extension for dense retrieval:psql shell, run: