Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/TrinaxCode/TrinaxAI/llms.txt

Use this file to discover all available pages before exploring further.

TrinaxAI is configured entirely through environment variables loaded from a .env file in the repo root. Every variable is optional — the system auto-detects your hardware and applies profile-based defaults. Copy .env.example to .env and override only what you need.
cp .env.example .env
All defaults shown below are for the 16gb profile (the factory default). Actual runtime defaults vary by the active TRINAXAI_PROFILE. Run trinaxai doctor to inspect what values are in effect.

Profile & Performance

These two variables are the highest-level controls. They cascade into dozens of downstream defaults — model sizes, context windows, embed workers, chunk sizes, and retrieval depth.
VariableTypeDefaultDescription
TRINAXAI_PROFILEstring16gbHardware profile. Valid values: 4gb, 8gb, 16gb, max, ultra. Auto-detected by the installer; override manually when needed.
TRINAXAI_PERFORMANCE_MODEstringfastTrade-off between speed and retrieval quality. fast — smaller chunks, lower TOP_K, snappier responses. balanced — middle ground. quality — larger context, more retrieval candidates, slower.
TRINAXAI_PROFILE=16gb
TRINAXAI_PERFORMANCE_MODE=fast
Use --profile ultra at install time or set TRINAXAI_PROFILE=max in .env to switch profiles without reinstalling. See Hardware Profiles for the full per-profile breakdown.

Network

Controls where the RAG API binds and how it communicates with Ollama.
VariableTypeDefaultDescription
TRINAXAI_HOSTstring0.0.0.0Bind address for the RAG API. Use 127.0.0.1 for localhost-only (no LAN access).
TRINAXAI_PORTint3333TCP port for the RAG API (FastAPI/uvicorn).
TRINAXAI_RAG_HTTPSbool1Set to 1 to serve the RAG API over HTTPS with a self-signed certificate. Set to 0 for plain HTTP (not recommended in LAN environments).
OLLAMA_BASE_URLstringhttp://localhost:11434URL of the Ollama API. Change if Ollama runs on a different host or port.
TRINAXAI_HOST=0.0.0.0
TRINAXAI_PORT=3333
TRINAXAI_RAG_HTTPS=1
OLLAMA_BASE_URL=http://localhost:11434
Leaving TRINAXAI_HOST=0.0.0.0 exposes the RAG API on your local network. This is intentional for LAN PWA/phone access, but make sure TRINAXAI_ALLOW_LAN_SYSTEM is 0 (the default) unless you understand the implications.

Model Fleet

TrinaxAI maintains a fleet of four models and automatically routes each query to the most appropriate one. You can override individual fleet slots or disable the router entirely.
VariableTypeDefault (16gb)Description
TRINAXAI_MODEL_GENERALstringllama3.2:3bGeneral-purpose chat model for non-code questions. Low-resource profiles default to llama3.2:1b.
TRINAXAI_MODEL_CODEstringqwen2.5-coder:3bCode-focused model for regular coding tasks. Low-resource profiles default to qwen2.5-coder:1.5b.
TRINAXAI_MODEL_DEEPstringqwen2.5-coder:3bDeep-analysis model for complex refactoring, architecture, and multi-file queries. max/32gb profiles default to qwen2.5-coder:7b; ultra defaults to qwen2.5-coder:14b. Low-resource profiles use qwen2.5-coder:1.5b.
TRINAXAI_MODEL_FASTstringllama3.2:3bLightweight model for short, trivial queries (greetings, simple lookups). Defaults to MODEL_GENERAL.
TRINAXAI_LLMstringqwen2.5-coder:3bFallback model when TRINAXAI_AUTO_ROUTE=0. Defaults to MODEL_CODE.
TRINAXAI_LLM_HEAVYstringqwen2.5-coder:3bHeavy fallback model when TRINAXAI_AUTO_ROUTE=0. Defaults to MODEL_DEEP (profile-dependent: 3b on 16gb, 7b on max, 14b on ultra).
TRINAXAI_AUTO_ROUTEbool1Set to 0 to disable the heuristic model router and always use TRINAXAI_LLM.
# Fleet overrides — only set what you want to change
TRINAXAI_MODEL_GENERAL=llama3.2:3b
TRINAXAI_MODEL_CODE=qwen2.5-coder:3b
TRINAXAI_MODEL_DEEP=qwen2.5-coder:7b
TRINAXAI_MODEL_FAST=llama3.2:3b

# Disable router and always use a single model
# TRINAXAI_AUTO_ROUTE=0
# TRINAXAI_LLM=qwen2.5-coder:3b
The router classifies each query using an offline heuristic — no LLM call required. It checks for:
  • Deep hints (query length > 600 chars, words like refactor, architecture, debug, security) → routes to MODEL_DEEP
  • Code hints (backtick present, keywords like function, import, .py, .ts) → routes to MODEL_CODE
  • Trivial queries (< 25 chars, greetings) → routes to MODEL_FAST
  • Everything else → routes to MODEL_GENERAL

Embeddings

Controls which embedding model is used for indexing and retrieval. Three presets are available; you can also override the model and dimensions individually.
VariableTypeDefault (16gb)Description
TRINAXAI_EMBED_PRESETstringbalancedEmbedding preset. balanced = bge-m3 (multilingual, 1024 dims). lite = nomic-embed-text (768 dims, English-leaning). fast = all-minilm (384 dims, smallest). Low-resource profiles default to lite.
TRINAXAI_EMBEDstringbge-m3Override the embedding model name directly, bypassing the preset.
TRINAXAI_EMBED_DIMSint1024Embedding vector dimensions. Must match the model. Set automatically by preset; override only if using a custom model.
# Use the balanced preset (default for 16gb+)
# TRINAXAI_EMBED_PRESET=balanced   # bge-m3, 1024 dims, multilingual
# TRINAXAI_EMBED_PRESET=lite       # nomic-embed-text, 768 dims, faster
# TRINAXAI_EMBED_PRESET=fast       # all-minilm, 384 dims, smallest

# Or override model + dims manually
# TRINAXAI_EMBED=bge-m3
# TRINAXAI_EMBED_DIMS=1024
PresetModelDimsContextBest for
balancedbge-m310248192Multilingual, highest quality
litenomic-embed-text7682048Faster, English-leaning
fastall-minilm384512Smallest, English-only

Context & Threads

VariableTypeDefault (16gb)Description
TRINAXAI_NUM_CTXint4096Token context window passed to Ollama. Must fit: system prompt + retrieved chunks + response. Auto-scaled by profile: 2048 (low-resource), 4096 (16gb), 8192 (max), 16384 (ultra).
TRINAXAI_NUM_THREADint8CPU threads allocated per Ollama request. 8 threads avoids over-subscription when several embedding workers run concurrently. Raise on high-core-count machines.
# TRINAXAI_NUM_CTX=4096
# TRINAXAI_NUM_THREAD=8
Setting TRINAXAI_NUM_THREAD too high (e.g., equal to your total core count) can make concurrent embeddings slower, not faster, due to thread contention. The default of 8 is intentionally conservative.

Embedding Concurrency

VariableTypeDefault (16gb)Description
TRINAXAI_EMBED_WORKERSint2Concurrent workers sending embedding requests to Ollama. Profile defaults: 1 (low-resource), 2 (16gb), 4 (max), 6 (ultra). Maximum value: 16.
TRINAXAI_EMBED_BATCHint8Number of text chunks sent to Ollama per embedding call. Larger batches reduce per-request HTTP overhead. Profile defaults: 1 (low-resource), 8 (16gb/max), 16 (ultra).
# TRINAXAI_EMBED_WORKERS=2
# TRINAXAI_EMBED_BATCH=8

Keep-Alive

Controls how long Ollama keeps models loaded in RAM between requests. Longer keep-alive = faster subsequent responses but higher memory use.
VariableTypeDefault (16gb)Description
TRINAXAI_KEEP_ALIVEstring10mKeep-alive duration for LLM (chat/code) models. Profile defaults: 0s (low-resource), 10m (16gb fast mode), 30m (max), 60m (ultra). Ollama duration format: 0s, 15m, 1h.
TRINAXAI_EMBED_KEEP_ALIVEstring15mKeep-alive for the embedding model. Kept warm separately from chat models to avoid sawtooth reload cost during indexing. Profile defaults: 10m (low-resource), 15m (16gb), 30m (max/ultra).
TRINAXAI_TIMEOUTfloat300Request timeout in seconds for Ollama API calls. Increase for very long generations on slow hardware.
# TRINAXAI_KEEP_ALIVE=10m
# TRINAXAI_EMBED_KEEP_ALIVE=15m
# TRINAXAI_TIMEOUT=300

Chunking

Controls how documents are split before embedding. Code files use AST-aware chunking that respects function and class boundaries.
VariableTypeDefault (16gb)Description
TRINAXAI_CHUNK_SIZEint1024Token chunk size for prose (Markdown, PDF, TXT, config files). Profile/mode defaults: 896 (fast mode), 1024 (balanced/quality), 1536 (ultra).
TRINAXAI_CHUNK_OVERLAPint150Token overlap between adjacent prose chunks. Mode defaults: 96 (fast mode), 150 (balanced/quality), 220 (ultra).
TRINAXAI_CODE_CHUNK_LINESint60Lines per code chunk (AST-based). Chunks respect function/class boundaries.
TRINAXAI_CODE_CHUNK_LINES_OVERLAPint12Line overlap between code chunks. Mode default: 8 (fast mode), 12 (balanced/quality).
TRINAXAI_CODE_MAX_CHARSint2000Maximum character length of a code chunk. Prevents oversized single-function chunks.
# TRINAXAI_CHUNK_SIZE=1024
# TRINAXAI_CHUNK_OVERLAP=150
# TRINAXAI_CODE_CHUNK_LINES=60
# TRINAXAI_CODE_CHUNK_LINES_OVERLAP=12
# TRINAXAI_CODE_MAX_CHARS=2000

Retrieval

Controls how many chunks are retrieved and how they are ranked before being passed to the LLM.
VariableTypeDefault (16gb)Description
TRINAXAI_SIMILARITY_TOP_Kint4Final number of chunks injected into the LLM prompt. Profile/mode defaults: 3 (low-resource fast), 4 (16gb fast), 5 (16gb quality / max), 6 (ultra), 8 (ultra quality). Raise for broader answers; lower to reduce latency.
TRINAXAI_FUSION_CANDIDATESint8Candidates each retriever (vector search and BM25 keyword search) contributes before fusion. Profile defaults: 6 (low-resource fast), 8 (16gb fast), 12 (max), 20 (ultra), 32 (ultra quality).
TRINAXAI_RETRIEVAL_CACHE_SECONDSint20TTL (seconds) for cached RAG retrieval results. Identical repeated queries skip re-embedding and re-retrieval. Set to 0 to disable.
TRINAXAI_SOURCES_CACHE_SECONDSint30TTL (seconds) for cached Knowledge Browser source/chunk listings.
# TRINAXAI_SIMILARITY_TOP_K=4
# TRINAXAI_FUSION_CANDIDATES=8
# TRINAXAI_RETRIEVAL_CACHE_SECONDS=20
# TRINAXAI_SOURCES_CACHE_SECONDS=30

Reranking

Cross-encoder reranking dramatically improves retrieval precision by re-scoring candidates based on their actual relevance to the query. Disabled by default; requires additional Python dependencies.
VariableTypeDefaultDescription
TRINAXAI_RERANKbool0Set to 1 to enable cross-encoder reranking. Requires pip install -r requirements-rerank.txt. Loads ~2 GB model into RAM.
TRINAXAI_RERANK_MODELstringBAAI/bge-reranker-v2-m3HuggingFace model ID for the cross-encoder. bge-reranker-v2-m3 is multilingual and state-of-the-art.
TRINAXAI_RERANK_TOP_Nint(matches TOP_K)Number of candidates to keep after reranking. Defaults to TRINAXAI_SIMILARITY_TOP_K.
# TRINAXAI_RERANK=0
# TRINAXAI_RERANK_MODEL=BAAI/bge-reranker-v2-m3
# TRINAXAI_RERANK_TOP_N=5
Enable reranking on max or ultra profiles for the best precision. The reranker narrows a large set of FUSION_CANDIDATES down to the RERANK_TOP_N most relevant chunks, which more than compensates for its RAM cost.

Indexing

Controls what gets indexed, where it’s stored, and how uploads are handled.
VariableTypeDefaultDescription
TRINAXAI_INDEX_DIRstring(parent of repo)Directory to index recursively. Set to ~/Documents, ~/Projects, or any path. Supports ~ expansion.
TRINAXAI_MAX_FILE_BYTESint3145728Maximum file size to index, in bytes (default: 3 MB). Files larger than this are skipped to avoid enormous chunks from generated files.
TRINAXAI_UPLOAD_MAX_FILESint2500Maximum number of files accepted in a single upload batch.
TRINAXAI_UPLOAD_MAX_BYTESint536870912Maximum total upload size per batch, in bytes (default: 512 MB).
TRINAXAI_COLLECTION_IDstringdefaultActive collection ID for indexing and querying.
TRINAXAI_COLLECTION_NAMEstringGeneralHuman-readable name for the default collection.
TRINAXAI_INDEX_APPENDbool0Set to 1 for append-only mode — skip files already in the manifest, only add new ones. Useful for large codebases where re-indexing is slow.
TRINAXAI_INDEX_BATCH_SIZEint100Files processed per indexing batch. Lower values use less RAM; higher values can be faster on fast storage.
# TRINAXAI_INDEX_DIR=~/Documents
# TRINAXAI_MAX_FILE_BYTES=3145728
# TRINAXAI_UPLOAD_MAX_FILES=2500
# TRINAXAI_UPLOAD_MAX_BYTES=536870912
# TRINAXAI_COLLECTION_ID=default
# TRINAXAI_COLLECTION_NAME=General
# TRINAXAI_INDEX_APPEND=0
# TRINAXAI_INDEX_BATCH_SIZE=100

Security

VariableTypeDefaultDescription
TRINAXAI_CORS_ORIGINSstringhttps://localhost:3334,http://localhost:3334Comma-separated list of allowed CORS origins. Set * to allow all (not recommended). Add your LAN IP when accessing from other devices.
TRINAXAI_ALLOW_LAN_SYSTEMbool0Allow /system/* endpoints from LAN origins. Disabled by default. Enable only on trusted networks after setting TRINAXAI_ADMIN_TOKEN.
TRINAXAI_ADMIN_TOKENstring(empty)Bearer token required for protected /system/* endpoints when called from non-localhost origins. Auto-generated when installing with --lan-system.
TRINAXAI_CORS_ORIGINS=https://localhost:3334,http://localhost:3334
# TRINAXAI_ALLOW_LAN_SYSTEM=0
# TRINAXAI_ADMIN_TOKEN=
See Security Model for the full access control and CORS documentation.

GPU & Quantization

VariableTypeDefaultDescription
OLLAMA_NUM_GPUstring(empty / auto)Number of GPU layers to offload to the GPU (Ollama native). Leave empty for Ollama auto-detection, or set explicitly (e.g., 35) to control VRAM usage.
TRINAXAI_AGGRESSIVE_QUANTbool0Set to 1 to enable aggressive Q4_K_M-style quantization profile. Reduces RAM usage at some quality cost. Exposes the setting to the /health endpoint and configures Ollama hints accordingly.
# OLLAMA_NUM_GPU=
# TRINAXAI_AGGRESSIVE_QUANT=0

OCR

VariableTypeDefaultDescription
TRINAXAI_OCRbool0Set to 1 to enable Tesseract OCR for scanned PDFs. Requires the tesseract system package (apt install tesseract-ocr / brew install tesseract).
# TRINAXAI_OCR=0

PWA / Frontend

These variables configure the Vite build and the PWA frontend. Most are set at build time and baked into the frontend bundle.
VariableTypeDefaultDescription
TRINAXAI_FRONTEND_URLstringhttps://localhost:3334Public URL of the PWA frontend. Used for CORS and service worker scope.
VITE_TRINAXAI_RAG_TARGETstringhttps://127.0.0.1:3333URL the Vite dev proxy forwards /api/rag requests to. Should point to the RAG API.
VITE_TRINAXAI_VISION_MODELstringqwen2.5vl:3bVision model used for image analysis in the PWA.
VITE_TRINAXAI_VISION_QUALITY_MODELstringqwen2.5vl:7bHigh-quality vision model used when the user selects quality mode.
VariableDefaultDescription
TRINAXAI_FRONTEND_MODEpreviewVite mode (development or preview).
TRINAXAI_RAG_TARGEThttps://127.0.0.1:3333Server-side RAG proxy target.
VITE_TRINAXAI_RAG_BASE/api/ragBase path for RAG API requests from the PWA.
VITE_TRINAXAI_OLLAMA_BASE/api/ollamaBase path for Ollama API requests from the PWA.
VITE_TRINAXAI_DEV_RAG_BASE/api/ragRAG base path in dev mode.
VITE_TRINAXAI_DEV_OLLAMA_BASE/api/ollamaOllama base path in dev mode.
VITE_TRINAXAI_INDEX_DIR~/DocumentsDefault index directory shown in the PWA settings.
VITE_TRINAXAI_REPO_URLhttps://github.com/TrinaxCode/TrinaxAIRepository link shown in the PWA.
VITE_TRINAXAI_DOCS_URLhttps://github.com/TrinaxCode/TrinaxAI#readmeDocs link shown in the PWA.
TRINAXAI_FRONTEND_URL=https://localhost:3334
VITE_TRINAXAI_RAG_TARGET=https://127.0.0.1:3333
VITE_TRINAXAI_VISION_MODEL=qwen2.5vl:3b
VITE_TRINAXAI_VISION_QUALITY_MODEL=qwen2.5vl:7b

Build docs developers (and LLMs) love