Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/TrinaxCode/TrinaxAI/llms.txt

Use this file to discover all available pages before exploring further.

The TrinaxAI RAG API is a FastAPI server that powers the PWA, CLI, and any third-party integrations. It runs locally on port 3333 and exposes an OpenAI-compatible chat endpoint alongside dedicated endpoints for RAG collections, persistent memory, file indexing, file watching, and system control. By default the server binds to 0.0.0.0:3333 so the PWA is reachable from phones and tablets on the same network; you can restrict it to 127.0.0.1 via the TRINAXAI_HOST environment variable. Base URL
SchemeBase URLNotes
HTTPS (default)https://localhost:3333Self-signed certificate generated at install time
HTTPhttp://localhost:3333Available when TRINAXAI_RAG_HTTPS=0
For LAN access from another device on the same network substitute your machine’s local IP, e.g. https://192.168.1.42:3333.
TrinaxAI uses a self-signed TLS certificate by default. Browsers and curl will reject it unless you either trust the generated cert or pass the --insecure / -k flag. Set TRINAXAI_TLS_VERIFY=0 in .env (already the default) to let internal Python clients skip certificate verification for localhost traffic.
For authentication details — including the two-tier model that separates open chat endpoints from protected system endpoints — see the Authentication page.

Endpoint Summary

All endpoints return application/json unless stated otherwise. File upload endpoints accept multipart/form-data.

Chat & Retrieval

MethodPathDescription
POST/v1/chat/completionsOpenAI-compatible RAG chat. Supports SSE streaming via "stream": true.
POST/v1/researchMulti-pass deep research with sub-question decomposition and LLM synthesis.

Knowledge Browser

MethodPathDescription
GET/v1/sourcesList indexed source files with chunk counts and a preview snippet.
GET/v1/sources/{collection}/{file:path}/chunksList individual chunks for a specific file within a collection.

Usage & Stats

MethodPathDescription
POST/v1/usageRecord a usage event from frontend-only flows (e.g. direct Ollama chat).
GET/v1/statsAggregate local usage statistics from storage/usage.jsonl.

Memory

MethodPathDescription
GET/v1/memoryList all persistent memory entries.
POST/v1/memoryAppend a new memory entry.
DELETE/v1/memory/{memory_id}Delete a memory entry by ID.
POST/v1/memory/refreshRe-summarise all memory entries into a short context-injectable note.
GET/v1/memory/summaryRead the current LLM-generated memory summary.

File Watcher

MethodPathDescription
POST/v1/watch/startStart the filesystem watcher. Re-runs index.py when files change. Requires watchdog.
POST/v1/watch/stopStop the running watcher.
GET/v1/watch/statusReport watcher state: running, watched paths, event count.

Health & Telemetry

MethodPathDescription
GET/healthSystem overview: index status, available models, active collections, features.
GET/resourcesLocal RAM / VRAM telemetry. Requires psutil.

App State

MethodPathDescription
GET/app-stateRead shared configuration from storage/app_state.json.
PUT/app-stateSave settings, preferences, and chat state.
DELETE/app-stateFactory-reset shared state to host defaults.

Document Extraction

MethodPathDescription
POST/documents/extractExtract text from PDF, DOCX, or plain-text files for temporary analysis (not indexed). Accepts multipart/form-data.

Collections

MethodPathDescription
GET/collectionsList all RAG collections.
POST/collectionsCreate a new collection.
PATCH/collections/{collection_id}Rename a collection.
DELETE/collections/{collection_id}Delete a collection and all its indexed chunks.

Indexing

MethodPathDescription
POST/system/index-uploadUpload a folder for indexing via browser file picker. Accepts multipart/form-data.
GET/system/index-jobs/{job_id}Poll an index job’s status, progress, and ETA.
POST/system/index-jobs/{job_id}/cancelCancel a running index job.

System Control

MethodPathDescription
POST/system/reloadHot-reload the RAG index from storage/ without restarting the process.
POST/system/shutdownShut down Ollama and the RAG API.
POST/system/startupStart Ollama and the RAG API.
POST/system/stop-allStop all TrinaxAI services immediately.
POST/system/self-testRun automated health checks: Ollama, embeddings, RAG query.

SSE Streaming

Chat completions support server-sent events (SSE) when "stream": true is included in the request body. The response Content-Type is text/event-stream. Each event is a JSON-encoded data: line; the stream ends with data: [DONE].
data: {"trinaxai":{"model":"qwen2.5-coder:3b","project":"MyApp"}}
data: {"choices":[{"delta":{"content":"The auth module"}}]}
data: {"choices":[{"delta":{"content":" handles JWT validation..."}}]}
data: {"trinaxai_sources":[{"file":"app/auth.py","snippet":"...","score":0.89}]}
data: [DONE]
Non-streaming requests (default: "stream": false) return a single JSON object conforming to the OpenAI chat completion schema, with an extra trinaxai key containing the resolved model, project, and source citations.

Content Types

SituationContent-Type
All JSON endpointsapplication/json
File uploads (/system/index-upload, /documents/extract)multipart/form-data
SSE streaming responsestext/event-stream

Rate Limiting

The API enforces a rolling rate limit to protect the local Ollama process.
ParameterValue
Limit30 requests per minute per IP address
Window60-second rolling
ScopeChat completions and system endpoints
Response when exceededHTTP 429 Too Many Requests
You can adjust the defaults via environment variables:
TRINAXAI_RATE_LIMIT_PER_MINUTE=30   # requests per window
TRINAXAI_RATE_LIMIT_WINDOW_SECONDS=60

Error Codes

HTTP StatusMeaning
200Success
400Bad request — missing parameters or invalid data
403Forbidden — system endpoint called without a valid token or from an untrusted origin
404Not found — collection, job, or memory entry does not exist
429Rate limited — 30 req/min per IP exceeded
500Internal server error
501Not implemented — optional dependency missing (e.g. watchdog, pypdf)
503Service unavailable — Ollama is unreachable or the index is not loaded
If you receive a 503 on /v1/chat/completions, check that Ollama is running (ollama ps) and that the index has been built (python index.py, then POST /system/reload).

Build docs developers (and LLMs) love