TrinaxAI runs voice and vision capabilities entirely on your device. Speech recognition uses the browser’s Web Speech API, text-to-speech uses browser TTS, and image analysis runs through a localDocumentation Index
Fetch the complete documentation index at: https://mintlify.com/TrinaxCode/TrinaxAI/llms.txt
Use this file to discover all available pages before exploring further.
qwen2.5vl model via Ollama. No audio, no images, and no transcripts are ever sent to a cloud API.
Voice Mode
Voice mode lets you speak your queries and hear responses read aloud — a natural, hands-free conversation with your local AI.How It Works
Speech Recognition
TrinaxAI uses the browser’s Web Speech API (
SpeechRecognition) for speech-to-text. The browser streams audio to its built-in recognition engine (on Chrome/Edge, this is on-device when offline or via Google’s speech service when online — see the privacy note below). The recognised text is inserted into the chat input field.Response Synthesis
When a response is received, TrinaxAI uses the browser’s Web Speech Synthesis API (
SpeechSynthesisUtterance) to read it aloud. Responses are split at sentence boundaries so playback begins as soon as the first sentence is available, without waiting for the full reply.The Web Speech API’s speech-to-text engine may use an online service (Chrome uses Google’s servers when connected). Text-to-speech synthesis is always local to your browser. If full offline voice is required, use the PWA in a browser that supports on-device speech recognition (e.g., Firefox with a local engine).
Activating Voice Mode in the PWA
- Open TrinaxAI at
https://localhost:3334 - Click the microphone icon (🎤) in the chat input bar
- Grant microphone permission when prompted — this permission is remembered for the origin
- Speak your question; the transcript appears in the input field as you talk
- TrinaxAI sends the message automatically when you pause, or press Enter to send immediately
- Toggle the speaker icon (🔊) to enable or disable text-to-speech responses
Vision: Image Analysis
TrinaxAI can analyse images and screenshots you attach to the chat. The entire analysis runs locally on your machine through aqwen2.5vl vision-language model served by Ollama.
Vision Models
Two model sizes are available, configured as Vite build-time environment variables:| Variable | Default | Use Case |
|---|---|---|
VITE_TRINAXAI_VISION_MODEL | qwen2.5vl:3b | Default — fast, good quality, runs on 8 GB RAM |
VITE_TRINAXAI_VISION_QUALITY_MODEL | qwen2.5vl:7b | Quality mode — better detail analysis, needs 16 GB+ |
VITE_TRINAXAI_VISION_QUALITY_MODEL when you enable Quality vision mode in Settings. Both are standard Ollama pull targets.
Attaching an Image
Choose an Image
Select any image file from your device (JPEG, PNG, WebP, GIF). You can also paste an image directly from the clipboard.
Ask Your Question
Type your question about the image — or leave the text field empty to get a general description. Examples: “What does this error message mean?”, “Describe the UI layout”, “What’s wrong with this chart?”
Image Preprocessing
Before sending large images to the vision model, TrinaxAI preprocesses them in the browser to prevent out-of-memory (OOM) errors on the local Ollama process:- Images are downscaled if their dimensions exceed a safe threshold
- Files are re-encoded as JPEG at reduced quality when the file size is large
- The processed image is base64-encoded and sent in the Ollama API payload
Skipping Vision During Install
If your machine doesn’t have enough RAM for a vision model, skip the download during installation:Privacy Guarantee
Everything stays local. Vision requests are sent to Ollama at
http://localhost:11434 — the same endpoint used for text models. No image, screenshot, or analysis result is transmitted to any external server.Configuration Reference
| Variable | Default | Description |
|---|---|---|
VITE_TRINAXAI_VISION_MODEL | qwen2.5vl:3b | Ollama model for standard vision analysis |
VITE_TRINAXAI_VISION_QUALITY_MODEL | qwen2.5vl:7b | Ollama model for high-quality vision analysis |