Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/TrinaxCode/TrinaxAI/llms.txt

Use this file to discover all available pages before exploring further.

TrinaxAI is a three-tier local stack where every component runs on your own device. The React PWA on port 3334 talks to the FastAPI RAG engine on port 3333, which in turn drives Ollama on port 11434. No request ever leaves your machine or trusted LAN — there is no cloud dependency at any tier.

System Diagram

┌──────────────────────────────────────────┐
│              Your Device                 │
│  ┌──────────┐  ┌─────────────────────┐   │
│  │PWA(React)│  │ VSCode (Continue)   │   │
│  │  :3334   │  │ continue-config.yaml│   │
│  └─────┬─────┘  └──────────┬──────────┘  │
│        │                   │              │
│  ┌─────┴───────────────────┴──────────┐  │
│  │    RAG API (FastAPI) :3333         │  │
│  │ LlamaIndex · bge-m3 · BM25        │  │
│  └─────┬──────────────────────────────┘  │
│        │                                  │
│  ┌─────┴──────┐                           │
│  │   Ollama   │  qwen2.5 · llama3.2      │
│  │   :11434   │  bge-m3 · moondream      │
│  └────────────┘                           │
└──────────────────────────────────────────┘
The three tiers are:
  1. PWA Frontend — React 19 + TypeScript + Vite, served on port 3334
  2. RAG API — FastAPI + LlamaIndex, running on port 3333
  3. Ollama — Local model runtime, on port 11434

Components

The following table lists every major file and package in the repository and its role in the system.
ComponentRole
config.pyCentral configuration hub — models, hardware profiles, embedding presets, factory functions (make_llm(), make_embed(), make_reranker()), and the heuristic route_model() auto-router
rag_api.pyFastAPI backend (2000+ lines) — hybrid retrieval, memory, collections, project detection, deep research, file watcher, rate limiting, and usage stats
index.pyDocument indexer — AST-aware chunking, aggressive directory pruning, incremental mode with manifest, collection tagging
trinaxai_cli/Modular CLI package — subcommands: chat, index, browse, research, memory, collections, watch, export, obsidian, doctor and more
service_manager.pyCross-platform service supervisor — systemd on Linux, launchctl on macOS, subprocess supervisor on Windows
chat-pwa/React 19 + TypeScript + Vite PWA frontend — 18 components, Tailwind CSS, framer-motion, bilingual i18n

config.py — Central Configuration Hub

config.py is the single source of truth imported by rag_api.py, index.py, and trinaxai_cli.py. It exposes:
  • Model fleetMODEL_GENERAL, MODEL_CODE, MODEL_DEEP, MODEL_FAST — each overridable via environment variable (e.g. TRINAXAI_MODEL_CODE)
  • Hardware profiles — auto-tuned by TRINAXAI_PROFILE (8gb, 16gb, max, ultra) — controls chunk sizes, embedding workers, context windows, and model selection
  • Embedding presetsbalanced (bge-m3, multilingual, 1024 dims), lite (nomic-embed-text, 768 dims), fast (all-minilm, 384 dims)
  • Factory functionsmake_llm(), make_embed(), make_reranker() — construct LlamaIndex-compatible objects wired to the active profile
  • Auto-routerroute_model(text) — heuristic classifier; no LLM call needed, returns the right model name instantly

rag_api.py — FastAPI Backend

The heart of the system. Key subsystems:
FeatureImplementation
Hybrid retrievalVector (bge-m3) + BM25 (keyword) → reciprocal rank fusion
RerankingCross-encoder (bge-reranker-v2-m3) reorders candidates after fusion
CollectionsSeparate namespaces within the same vector store
Project detectionHeuristic from file paths and user query
MemoryExplicit “remember that” facts stored and auto-summarized
Deep researchMulti-pass decomposition with sub-question RAG
File watcherwatchdog monitors the filesystem for auto-reindexing
Rate limitingToken bucket, 30 req/min per IP, thread-safe
Usage statsJSONL-based local analytics written to storage/usage.jsonl
App state syncCross-device shared key-value store via storage/app_state.json

chat-pwa/ — React PWA Frontend

18 TypeScript components with Tailwind CSS and framer-motion:
ComponentPurpose
ChatInterfaceMain chat UI — streaming, markdown, voice, slash commands
ChatSidebarSession history, search, export (Markdown/PDF/Word)
Settings5-section config panel (general, indexing, prompts, memory, stats)
KnowledgeBrowserExplore indexed chunks by collection → file → chunk
SourcesCitation cards with file, project, snippet, and relevance score
OnboardingWizard7-step first-time setup
Docs11-section in-app documentation
Tech stack: React 19, Vite 6, TypeScript, Tailwind CSS, vite-plugin-pwa, react-markdown

Chat Data Flow

When a user sends a message, the system routes it through this pipeline:
User types query in PWA

  ├─ Slash command? → built-in handler (e.g., /index, /memory)
  ├─ Image attached? → routeVisionModel() → streamOllamaVision()
  ├─ Docs attached? → extractDocumentText() → inject into prompt

  └─ Normal text:

       ├─ RAG engine:
       │    POST /v1/chat/completions → FastAPI
       │    │
       │    ├─ route_model(query) → picks best Ollama model (heuristic)
       │    ├─ prepare_query() → enriches with previous user turn
       │    ├─ _fusion_retriever.retrieve() → hybrid vector+BM25 search
       │    ├─ detect_project() → filters by mentioned project
       │    ├─ collections filter → narrows to active collections
       │    ├─ reranker → reorders by cross-encoder relevance
       │    ├─ get_response_synthesizer().synthesize() → LLM with context
       │    └─ SSE stream + source citations → back to PWA

       └─ Ollama engine:
            routeOllamaModel() → Ollama /api/chat (JSON lines)
            → model unload (keep_alive=0)
The auto-router (route_model()) is a pure heuristic — it scans the query text for code keywords and complexity signals. No LLM is called for routing, making it instant and free.

Indexing Flow

index.py runs in stages, only touching files that have changed since the last run:
index.py starts

  ├─ collect_files(root) → os.walk with aggressive directory pruning
  │   (skips node_modules, .git, venv, dist, __pycache__, etc.)

  ├─ current_state(paths) → {source_key: mtime}

  ├─ read_manifest() → canonicalized key map (collection:path → mtime)

  ├─ Diff: new_files, changed_files, deleted_files

  ├─ load_docs(paths) → Document objects with metadata

  ├─ build_nodes(docs) → CodeSplitter (AST) or SentenceSplitter (prose)

  ├─ Embed nodes (bge-m3 via Ollama — no LLM call during this phase)

  └─ persist to storage/ + write_manifest()

Key Design Decisions

Only embedding models run during indexing. This saves RAM, keeps indexing fast, and means you can re-index a large codebase without blocking the LLM for chat.
CodeSplitter uses tree-sitter to chunk at function and class boundaries for 15+ languages. This keeps logical units intact so retrieved chunks contain whole functions rather than arbitrary slices.
Each query is run through both a vector (semantic) retriever and a BM25 (keyword) retriever. Results are merged with reciprocal rank fusion. This catches both conceptually similar passages and exact keyword matches.
route_model() in config.py inspects the query for code hints (_CODE_HINTS) and complexity hints (_DEEP_HINTS). The right model is picked in microseconds with zero API calls.
Every indexed chunk carries a collection_id metadata tag. The API, CLI, and PWA all expose collection-scoped queries, so you can keep project knowledge bases cleanly separated.
Serving the frontend as a PWA avoids the Electron binary, Chromium bundling, and native toolchain requirements. The app is installable on iOS, Android, and desktop from the browser — no app store needed.
storage/manifest.json maps collection:path → mtime. On each run, index.py diffs the current filesystem state against the manifest and only re-processes new or changed files. Re-indexing a large repository after a small edit takes seconds.

Storage Layout

All persistent state lives under storage/:
storage/
├── docstore.json          # LlamaIndex document store
├── index_store.json       # FAISS/vector index metadata
├── manifest.json          # File → mtime for incremental indexing
├── collections.json       # Collection metadata
├── usage.jsonl            # Usage statistics (JSON lines)
└── app_state.json         # Cross-device shared state
To force a full reindex, delete docstore.json, index_store.json, and manifest.json, then run python index.py.

Security Model

TrinaxAI is local-first by design. Here is a brief summary of the security layers:
LayerMechanism
NetworkLocalhost + private LAN only (CORS filtered by IP + port)
System endpointsRequire localhost/LAN or TRINAXAI_ADMIN_TOKEN
LAN controlTRINAXAI_ALLOW_LAN_SYSTEM=0 disables LAN system access (default)
TLSHTTPS with self-signed certs; TRINAXAI_TLS_VERIFY controls verification
Sudoerssetup_trinaxai.sh creates /etc/sudoers.d/trinaxai for service control
DataAll data stays on device — no cloud uploads, no telemetry
For the full threat model and reporting process, see the Security Model page.

Build docs developers (and LLMs) love