Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/exegia/corpora-py/llms.txt

Use this file to discover all available pages before exploring further.

corpora-py is a standard Python package published to PyPI. It requires Python 3.13 or later and ships a CLI entry point (cf-mcp) that you can run immediately after installation. This page covers system requirements, installation methods, environment variable configuration, and how to install from source for development.

Requirements

Before installing, confirm your environment meets the following prerequisites:
  • Python 3.13+ — required by corpora-py and its dependencies.
  • uv >= 0.9 — recommended for dependency management and running scripts. Not required if you use pip directly.
  • git — required only if you plan to fetch corpus datasets from GitHub using exegia.corpus.fetch_from_git.
uv add corpora-py
uv add resolves and pins the package into your project’s uv.lock file and installs it into the project virtual environment.

Install with pip

pip install corpora-py

Verify Installation

After installing, confirm that the cf-mcp CLI entry point is available:
uv run cf-mcp --help
Expected output:
usage: cf-mcp [-h] [--corpus PATH] [--name NAME] [--features FEAT [FEAT ...]]
              [--sse PORT] [--http PORT] [--host HOST] [--verbose]
If you installed with pip into an active virtual environment, run cf-mcp --help directly (without uv run).

Dependencies

corpora-py installs the following key dependencies automatically:
PackageVersionPurpose
fastmcp>=2.0MCP server framework
context-fabric>=0.5.0Graph corpus engine (cfabric)
text-fabric>=13.0.0Text-Fabric annotation layer
httpx>=0.27.0HTTP client
pydantic>=2.0.0Data validation and schemas
pydantic-settings>=2.0.0Environment-based configuration
ebooklib>=0.18EPUB parsing for book import
beautifulsoup4>=4.12.0HTML parsing for book import
python-dotenv>=1.0.0.env file loading
supabase>=2.31.0Auth and storage backend

Environment Setup

Some features — particularly exegia.auth — read configuration from a .env file. Copy the provided example and fill in any values you need:
cp .env.example .env
Key environment variables:
VariableRequiredDescription
SUPABASE_URLAuth onlyYour Supabase project URL
SUPABASE_ANON_KEYAuth onlySupabase anonymous (public) API key
SUPABASE_SERVICE_ROLE_KEYAuth onlySupabase service role key (bypasses RLS)
DATASETS_BASE_PATHOptionalOverride the default ~/.exegia/datasets/ base path
ENVIRONMENTOptionaldevelopment, staging, or production (default: development)
Configuration is loaded at import time by pydantic-settings from .env.{ENVIRONMENT}. For example, if ENVIRONMENT=production, the library reads .env.production.
The SUPABASE_* environment variables are only required if you use the exegia.auth module. The MCP server (cf-mcp) and all 11 corpus tools work without any environment variables configured — they only need a corpus path passed via --corpus.

Development Install (From Source)

To install from source and run tests or contribute changes:
git clone https://github.com/exegia/corpora-py
cd corpora-py
uv run scripts/setup.py
setup.py installs all dependencies (including dotenvx for encrypted .env support) and prepares the development environment. After setup, the full test suite runs with:
uv run pytest
To build a distributable wheel:
uv build --out-dir dist/

Build docs developers (and LLMs) love