TrinaxAI ships five hardware profiles that automatically tune every performance-sensitive setting to your machine’s capabilities. The installer detects your total RAM and selects the appropriate profile — no manual tuning needed for most users. The profile is the master switch. It sets the model fleet, context window size, embedding worker count, chunk sizes, retrieval depth, and keep-alive durations in a single step.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/TrinaxCode/TrinaxAI/llms.txt
Use this file to discover all available pages before exploring further.
How Profiles Work
When you run./install.sh, the installer reads your total system RAM and writes TRINAXAI_PROFILE=<profile> into your .env file. From that point on, every restart reads the profile and applies its defaults.
You can override the auto-detected profile at any time:
- At install time
- Via .env
- Temporary override
Profile Reference
The table below shows the exact defaults applied by each profile, drawn directly fromconfig.py.
| Setting | 4gb / 8gb | 16gb (default) | max / 32gb | ultra / 64gb |
|---|---|---|---|---|
| RAM target | ~4–8 GB | ~16 GB | ~32 GB | 64 GB+ or GPU workstation |
MODEL_GENERAL | llama3.2:1b | llama3.2:3b | llama3.2:3b | llama3.2:3b |
MODEL_CODE | qwen2.5-coder:1.5b | qwen2.5-coder:3b | qwen2.5-coder:3b | qwen2.5-coder:3b |
MODEL_DEEP | qwen2.5-coder:1.5b | qwen2.5-coder:3b | qwen2.5-coder:7b | qwen2.5-coder:14b |
MODEL_FAST | llama3.2:1b | llama3.2:3b | llama3.2:3b | llama3.2:3b |
| Embed preset | lite | balanced | balanced | balanced |
| Embed model | nomic-embed-text | bge-m3 | bge-m3 | bge-m3 |
| Embed dims | 768 | 1024 | 1024 | 1024 |
NUM_CTX | 2048 | 4096 | 8192 | 16384 |
EMBED_WORKERS | 1 | 2 | 4 | 6 |
EMBED_BATCH | 1 | 8 | 8 | 16 |
KEEP_ALIVE (fast mode) | 0s | 10m | 30m | 60m |
EMBED_KEEP_ALIVE | 10m | 15m | 30m | 30m |
CHUNK_SIZE (balanced) | 1024 | 1024 | 1024 | 1536 |
CHUNK_OVERLAP (balanced) | 150 | 150 | 150 | 220 |
SIMILARITY_TOP_K (fast) | 3 | 4 | 5 | 6 |
FUSION_CANDIDATES (fast) | 6 | 8 | 12 | 20 |
MODEL_GENERAL, MODEL_CODE, and MODEL_FAST share the same defaults across 16gb, max, and ultra. The key differentiator between profiles is MODEL_DEEP — complex queries (refactoring, architecture, multi-file analysis) escalate to a progressively larger model.Profile Deep Dives
4gb / 8gb — Low-resource (laptops, Windows with ~8 GB RAM)
4gb / 8gb — Low-resource (laptops, Windows with ~8 GB RAM)
This profile prioritises staying within RAM. Everything is sized for machines where the OS itself consumes 3–4 GB.
- Models:
llama3.2:1bfor chat,qwen2.5-coder:1.5bfor code and deep tasks — the smallest Ollama-compatible options. - Embeddings:
nomic-embed-text(768 dims,litepreset) — faster to load and query thanbge-m3. - Context: 2048 tokens — enough for focused queries without competing with the model’s weights.
- Concurrency: 1 embed worker, batch size 1 — serialised embedding to avoid OOM during indexing.
- Keep-alive:
0sfor LLM — model is unloaded after each response to free RAM for embeddings and the OS.
16gb — Balanced default (~16 GB RAM)
16gb — Balanced default (~16 GB RAM)
The factory default. Balanced for everyday coding, documentation search, and RAG queries.
- Models:
llama3.2:3bfor chat,qwen2.5-coder:3bfor code and deep analysis (themax/32gbprofile upgrades deep analysis toqwen2.5-coder:7b). - Embeddings:
bge-m3(1024 dims,balancedpreset) — multilingual, high-quality, 8K token context. - Context: 4096 tokens — comfortably fits a system prompt + 4 source chunks + response.
- Concurrency: 2 embed workers, batch size 8 — parallel embedding without crowding the LLM.
- Keep-alive:
10min fast mode — models stay warm for several minutes between queries.
max / 32gb — High-quality (32 GB RAM, no dedicated GPU needed)
max / 32gb — High-quality (32 GB RAM, no dedicated GPU needed)
Unlocks the 7b deep model and wider context. Suitable for large codebases and multi-file analysis.
- Models: Same general/code fleet as 16gb, but
MODEL_DEEPis upgraded toqwen2.5-coder:7b— the first profile to unlock the larger deep-analysis model. - Context: 8192 tokens — fits longer documents and more retrieved chunks.
- Concurrency: 4 embed workers, batch size 8 — faster bulk indexing.
- Keep-alive:
30m— models stay warm for extended work sessions. - Retrieval:
TOP_K=5,FUSION_CANDIDATES=12— broader retrieval sweep.
ultra / 64gb — GPU workstation (64 GB RAM or VRAM-rich GPU)
ultra / 64gb — GPU workstation (64 GB RAM or VRAM-rich GPU)
Maximum quality. Designed for ML workstations, server-grade machines, and high-end consumer GPUs.Valid aliases for
- Models:
qwen2.5-coder:14bfor deep tasks — the best local code model available at time of writing. - Context: 16384 tokens — fit entire files, multiple retrieved documents, and long conversation history.
- Concurrency: 6 embed workers, batch size 16 — rapid bulk indexing of large repositories.
- Keep-alive:
60m— all models stay hot indefinitely during a session. - Chunking: Larger chunks (
CHUNK_SIZE=1536,CHUNK_OVERLAP=220) capture more context per embedding. - Retrieval:
TOP_K=6(up to 8 in quality mode),FUSION_CANDIDATES=20–32— exhaustive retrieval.
ultra: gpu, 64gb, 64g, 4090, rtx, workstation.Performance Modes
Performance mode is a secondary dial that fine-tunes chunk sizes and retrieval depth within a profile. It is independent of the profile — any profile can run in any mode.| Mode | Chunk size | Overlap | TOP_K | FUSION_CANDIDATES | Best for |
|---|---|---|---|---|---|
fast (default) | 896 tokens | 96 tokens | 3–6 (by profile) | 6–20 (by profile) | Snappy responses, frequent queries |
balanced | 1024 tokens | 150 tokens | 4–6 (by profile) | 8–20 (by profile) | General daily use |
quality | 1024 tokens | 150 tokens | 5–8 (by profile) | 12–32 (by profile) | Deep research, broad retrieval |
fast mode also uses smaller chunk overlaps (96 vs 150) and shorter code overlaps (8 vs 12 lines). This reduces index size and speeds up retrieval at a minor cost to cross-chunk context continuity.Embedding Presets by Profile
The embedding preset is selected automatically from the profile but can be overridden independently.| Preset | Model | Dims | Token ctx | Profile default |
|---|---|---|---|---|
balanced | bge-m3 | 1024 | 8192 | 16gb, max, ultra |
lite | nomic-embed-text | 768 | 2048 | 4gb, 8gb |
fast | all-minilm | 384 | 512 | (manual override only) |
16gb machine (e.g., to reserve RAM for larger LLMs):