Corpora in Context Fabric are Text-Fabric dataset directories stored on your local filesystem. The MCP server loads one or more of these directories at startup from the path(s) you provide via theDocumentation Index
Fetch the complete documentation index at: https://mintlify.com/exegia/corpora-py/llms.txt
Use this file to discover all available pages before exploring further.
--corpus flag or the programmatic corpus_manager.load() call. Each dataset is a self-contained folder of annotated .tf feature files that the graph engine indexes at load time.
Dataset Format
A Text-Fabric dataset directory is any folder that contains bothotext.tf and otype.tf side by side. These two files are mandatory:
otext.tf— defines corpus-level metadata (name, version, description, section hierarchy, text formats).otype.tf— maps every node in the graph to its type (e.g.word,verse,chapter,book).
.tf files in the directory are feature files that store annotations on nodes (morphology, lemma, gloss, etc.).
Recommended Directory Layout
Organise datasets by category under~/.exegia/datasets/ for clarity. Context Fabric does not enforce this structure, but it makes multi-corpus setups easy to manage:
Supported Corpus Categories
TheBookCategory enum defines the recognised corpus types. You can use these values when tagging or filtering corpora in your application:
| Category | Value | Description |
|---|---|---|
| Bible | bible | Old/New Testament texts |
| Quran | quran | Arabic or translated Quran |
| Tanakh | tanakh | Hebrew Bible |
| Commentary | commentary | Rabbinical, patristic, etc. |
| Lexicon | lexicon | Lexical databases (BDB, BDAG) |
| Dictionary | dictionary | Theological dictionaries |
| Devotional | devotional | Devotional literature |
| Theology | theology | Systematic theology |
| History | history | Historical texts |
| Philosophy | philosophy | Philosophical works |
| Fiction | fiction | Literary texts |
| Other | other | Catch-all |
Loading a Dataset
Pass a dataset path directly to thecf-mcp entrypoint, or load it programmatically with corpus_manager:
Where to Obtain Datasets
Well-known public Text-Fabric datasets you can fetch directly from git:- BHSA (BHS Hebrew Bible with Annotations): github.com/ETCBC/bhsa
- GNT (Greek New Testament): various Text-Fabric repositories on GitHub
- Custom books: use the EPUB or HTML converters to create your own datasets
Fetch from Git
Shallow-clone a git repository and locate all Text-Fabric datasets inside it automatically.
Convert EPUB
Turn any EPUB ebook into a queryable Text-Fabric dataset with a full node hierarchy.
Convert HTML
Convert a directory of HTML files into a Text-Fabric dataset with document and element nodes.
Package as .exg
Bundle a Text-Fabric dataset into a single distributable .exg archive with manifest metadata.