TrinaxAI: 100% Local AI Assistant with RAG and Voice

TrinaxAI is an open-source, local-first AI assistant that runs entirely on your own machine. It pairs a ChatGPT-like Progressive Web App with a developer CLI, semantic code search powered by Retrieval-Augmented Generation (RAG), voice conversations, and local image analysis — all without a cloud account, API key, or subscription. Your code, documents, and chat history never leave your network.

Architecture overview

TrinaxAI is a three-tier local stack where every service runs on your device:

┌──────────────────────────────────────────┐
│              Your Device                 │
│  ┌──────────┐  ┌─────────────────────┐   │
│  │PWA(React)│  │  CLI (trinaxai)     │   │
│  │  :3334   │  │  pip install -e .   │   │
│  └─────┬────┘  └──────────┬──────────┘   │
│        │                  │              │
│  ┌─────┴──────────────────┴──────────┐   │
│  │    RAG API (FastAPI) :3333        │   │
│  │  LlamaIndex · bge-m3 · BM25      │   │
│  └─────┬─────────────────────────────┘   │
│        │                                 │
│  ┌─────┴──────┐                          │
│  │   Ollama   │  qwen2.5 · llama3.2     │
│  │   :11434   │  bge-m3 · moondream     │
│  └────────────┘                          │
└──────────────────────────────────────────┘

Tier	Component	Port	Role
Frontend	React 19 PWA (TypeScript + Vite)	`:3334`	Chat UI, voice, vision, PWA install
Backend	FastAPI + LlamaIndex RAG API	`:3333`	Hybrid retrieval, memory, streaming
Models	Ollama model runtime	`:11434`	LLM inference, embeddings

Your data stays on your machine. TrinaxAI makes no outbound network requests to any cloud service. The only external call in the PWA is to load Google Fonts. No chat, code, or documents are ever uploaded anywhere.

Key features

RAG & Code Indexing

Index your projects for semantic search with citations. AST-aware chunking for 15+ languages, hybrid vector + BM25 retrieval, and incremental re-indexing that only touches changed files.

Chat & Models

Chat via the PWA or CLI with an auto-routing heuristic that picks the best Ollama model for each query. Deep research mode decomposes questions across multiple RAG passes.

Voice & Vision

Full voice conversations with speech-to-text and text-to-speech, plus local image and screenshot analysis powered by qwen2.5vl:3b — no third-party API required.

Memory & Collections

Persistent memory stores “remember that…” facts locally and syncs across devices. Knowledge Collections let you create separate RAG namespaces and query one or many at once.

Progressive Web App

Install TrinaxAI as a native app on iOS, Android, or desktop. Served over self-signed HTTPS for LAN access. Supports dark/light mode and Spanish/English auto-detection.

Security Model

LAN system control is disabled by default. RAG API, Ollama, and the PWA are all localhost-bound unless you explicitly enable LAN access. Full threat model and hardening guide included.

Supported platforms

OS	Installer	Service Manager
Linux (Ubuntu, Debian, Fedora, Arch)	`install.sh`	user systemd
macOS (Intel + Apple Silicon)	`install.sh`	launchctl
Windows (10/11, PowerShell)	`install.ps1`	subprocess supervisor

Platform-specific guides: Linux · macOS · Windows

Hardware profiles

The installer auto-detects your available RAM and selects the best model profile. You can override it with --profile <name> during install or by setting TRINAXAI_PROFILE in .env.

Profile	RAM Target	General model	Code model	Deep model	Embedding
`8gb`	~8 GB	`llama3.2:1b`	`qwen2.5-coder:1.5b`	`qwen2.5-coder:1.5b`	`nomic-embed-text`
`16gb`	~16 GB	`llama3.2:3b`	`qwen2.5-coder:3b`	`qwen2.5-coder:3b`	`bge-m3`
`max`	32 GB+	`llama3.2:3b`	`qwen2.5-coder:3b`	`qwen2.5-coder:7b`	`bge-m3`
`ultra`	64 GB+	`llama3.2:3b`	`qwen2.5-coder:3b`	`qwen2.5-coder:14b`	`bge-m3`

The available embedding models are bge-m3 (balanced, multilingual), nomic-embed-text (lite, faster), and all-minilm (fast, smallest). The profile sets the default preset, which you can override with TRINAXAI_EMBED_PRESET in .env. The auto-router selects among MODEL_GENERAL, MODEL_CODE, MODEL_DEEP, and MODEL_FAST at query time using a heuristic classifier — no extra LLM call needed.

What’s included

Developer CLI — trinaxai ask, trinaxai chat, trinaxai index, trinaxai browse, trinaxai doctor, trinaxai research, trinaxai memory, trinaxai collections, trinaxai watch, trinaxai export, trinaxai obsidian
PWA — 18 TypeScript components, session history, search, export to Markdown/PDF/Word, in-app docs, 7-step onboarding wizard
Bilingual UI — Spanish and English, auto-detected from browser locale
One-command installers — install.sh (Linux/macOS) and install.ps1 (Windows), with guided update and uninstall scripts
Continue.dev integration — VSCode config included for IDE-native AI completions

Ready to install? Head to the Quickstart for a step-by-step walkthrough.

Get Started

Core Features

CLI Reference

Configuration & Security

Developer Guide

TrinaxAI: 100% Local AI Assistant with RAG and Voice

Architecture overview

Key features

RAG & Code Indexing

Chat & Models

Voice & Vision

Memory & Collections

Progressive Web App

Security Model

Supported platforms

Hardware profiles

What’s included

Build docs developers (and LLMs) love

Get Started

Core Features

CLI Reference

Configuration & Security

Developer Guide

Documentation Index

​Architecture overview

​Key features

RAG & Code Indexing

Chat & Models

Voice & Vision

Memory & Collections

Progressive Web App

Security Model

​Supported platforms

​Hardware profiles

​What’s included

Build docs developers (and LLMs) love

Architecture overview

Key features

Supported platforms

Hardware profiles

What’s included