Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/exegia/corpora-py/llms.txt

Use this file to discover all available pages before exploring further.

This guide walks you through installing the corpora-py package, downloading a Text-Fabric corpus from a public git repository, and running your first query — either through the cf-mcp server or directly from Python. By the end you will have a working MCP server ready to connect to Claude Desktop or any other MCP-compatible AI client.
1

Prerequisites

Context Fabric requires Python 3.13 or later and uv 0.9 or later. Verify that both are available before continuing:
python --version   # should print Python 3.13.x or higher
uv --version       # should print uv 0.9.x or higher
You will also need git on your PATH if you plan to fetch corpus datasets directly from GitHub.
2

Install corpora-py

Install the package into your project with uv (recommended) or pip:
uv add corpora-py
This installs the cf-mcp CLI entry point along with all dependencies, including fastmcp, context-fabric, and text-fabric.
3

Obtain a corpus

Text-Fabric datasets are stored locally under ~/.exegia/datasets/. Use fetch_datasets_from_git to shallow-clone a public corpus repository and locate its TF dataset directories automatically:
from exegia.corpus.fetch_from_git import fetch_datasets_from_git

paths = fetch_datasets_from_git("https://github.com/ETCBC/bhsa")
print(paths)  # [PosixPath('/home/user/.exegia/datasets/.../BHSA/tf/2021')]
The function returns a list of Path objects, each pointing to a directory that contains both otext.tf and otype.tf — the two files required by every valid TF dataset.
Corpus repositories can be several hundred megabytes. The fetch is a shallow clone, so only the latest commit is downloaded. Subsequent calls for the same repository skip the download if the directory already exists.
4

Start the MCP server

Pass the corpus path to cf-mcp with --corpus. The server defaults to stdio transport, which is what Claude Desktop and most MCP clients expect:
uv run cf-mcp --corpus ~/.exegia/datasets/bibles/BHSA
To start with SSE transport on port 8000 instead (useful for remote or browser-based clients):
uv run cf-mcp --corpus ~/.exegia/datasets/bibles/BHSA --sse 8000
You can load multiple corpora in a single server process by repeating --corpus and --name:
uv run cf-mcp \
  --corpus ~/.exegia/datasets/bibles/BHSA --name BHSA \
  --corpus ~/.exegia/datasets/bibles/GNT  --name GNT
For Claude Desktop, use the stdio transport (no --sse flag) and add the cf-mcp command to your Claude Desktop MCP server configuration. The AI assistant will then have access to all 11 corpus tools automatically.
5

Run your first query

You can also start the server and manage corpora entirely from Python:
from exegia.mcp import mcp, corpus_manager

# Load a corpus by path — the name defaults to the directory name
corpus_manager.load("~/.exegia/datasets/bibles/BHSA", name="BHSA")

# Start the server with SSE transport
mcp.run(transport="sse", host="localhost", port=8000)
Once the server is running, any MCP client can call tools such as search, get_passages, or describe_corpus against the loaded corpus.
When an AI assistant connects to the Context Fabric MCP server, the following tool sequence gives it a solid understanding of the corpus before issuing expensive queries:
describe_corpus()           → understand what node types exist
list_features()             → see what annotations are available
search_syntax_guide()       → learn the query language
search(template, "count")   → check scale before fetching results
search(template, "results") → get paginated result set
get_passages(references)    → read the matched text
Start with describe_corpus() to see the section hierarchy (e.g. book > chapter > verse) and the count of each node type. Then call list_features() to discover what annotation columns are available — lex, pos, gloss, and so on. Once you understand the corpus structure, use search_syntax_guide() to review the template query syntax before writing your first pattern. Always call search with return_type="count" first to check how many results a template will produce before fetching the full result set.

Build docs developers (and LLMs) love