TrinaxAI exposes two primary query endpoints.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/TrinaxCode/TrinaxAI/llms.txt
Use this file to discover all available pages before exploring further.
/v1/chat/completions is an OpenAI-compatible interface that retrieves context from your indexed documents and streams a grounded response via SSE. /v1/research performs deep multi-pass retrieval with sub-question decomposition for complex analytical queries. Both endpoints are rate-limited to 30 requests per minute per IP and are open to trusted CORS origins by default.
POST /v1/chat/completions
OpenAI-compatible RAG chat with hybrid retrieval (vector + BM25 fusion) and optional SSE streaming. The auto-router selects the best local model for each query — a coder model for code questions, a general model for prose, and the largest available model for complex multi-part questions.This endpoint is open to trusted CORS origins (localhost ports 3334/3335 and private LAN IPs) without a token. System-level authorization is not required for chat.
Request body
Ollama model name to use, or
"auto" to let the built-in router choose the best model for the query. The router maps trivial queries → fast model, code queries → qwen2.5-coder, complex queries → the deepest available model (up to 14b on ultra profile).Array of conversation turns. Each element must have
role ("user", "assistant", or "system") and content (string). Up to four previous turns are injected into the retrieval prompt for follow-up understanding.Set to
true for Server-Sent Events (SSE) streaming. Set to false to receive a single JSON response after the full answer is generated.Optional list of collection IDs to restrict retrieval to. When omitted, the retriever searches across all indexed collections. Collection IDs are slugified strings (e.g.
"my-project", "default").Streaming response (SSE)
Whenstream: true, the server returns Content-Type: text/event-stream. Each data: line is a JSON object. The stream follows this sequence:
choices events stream individual tokens. The trinaxai_sources event is emitted once, just before [DONE], containing the grounding sources used for the response.
trinaxai_sources fields
Array of source objects used to ground the answer.
Non-streaming response
Whenstream: false, the server returns a single JSON object after the complete answer is generated.
Unique completion ID (e.g.
"chatcmpl-1718123456").Always
"chat.completion".Unix timestamp of when the response was generated.
The model that was actually used (resolved from
"auto" if applicable).Array with a single element containing the assistant message.
TrinaxAI-specific metadata appended to the standard OpenAI response shape.
Examples
Non-streaming response
POST /v1/research
Deep multi-pass research with sub-question decomposition. The LLM first breaks your query into 2–4 focused sub-questions, runs a separate retrieval pass for each, deduplicates the collected chunks, and synthesizes a comprehensive grounded answer with inline citations.This endpoint requires authorization (localhost/LAN or
X-Admin-Token header). It is not rate-limited the same way as chat, but it is significantly more compute-intensive.Request body
The full research question. Longer, more specific questions produce better sub-question decomposition.
Research depth.
1 skips sub-question decomposition and runs a single retrieval pass. 2 (default) decomposes into 2–4 sub-questions. 3 adds an extra cross-pass using the original query to fill any remaining gaps. Clamped to [1, 3].Optional list of collection IDs to restrict retrieval to. When omitted, all collections are searched.
Override the model used for both decomposition and synthesis. Defaults to
TRINAXAI_LLM from config (the code model for the current profile).Response
Comprehensive synthesized answer with inline citations in
[n] format referencing the source list.The sub-questions that were generated and used to drive retrieval passes. Array of strings.
All unique source chunks collected across all retrieval passes.
Number of retrieval passes executed (equals
len(sub_questions) plus one for depth >= 3).Model used for decomposition and synthesis.
Example
Response
GET /v1/sources
List all indexed files in a collection, with chunk counts, byte size, last-modified time, and a short preview snippet. Results are cached for performance (default: 30 seconds in fast mode).This endpoint requires authorization (localhost/LAN or
X-Admin-Token).Query parameters
Collection ID to list sources for. Defaults to
"default" when omitted.Response
The resolved collection ID that was queried.
List of source file entries, sorted by descending chunk count then filename.
Example
Response
GET /v1/sources/{collection}/{file:path}/chunks
Retrieve individual indexed chunks for a specific file within a collection. Supports pagination and optional text search filtering.This endpoint requires authorization (localhost/LAN or
X-Admin-Token).Path parameters
The collection ID containing the file.
The relative file path within the collection (URL-encoded). For example:
app/auth.py or docs/notes/architecture.md.Query parameters
Maximum number of chunks to return. Clamped to
[1, 500].Number of chunks to skip for pagination.
Optional case-insensitive substring filter. When provided, only chunks whose text contains this string are returned, and
total reflects the filtered count.Response
Collection ID.
The file path that was queried.
Total number of matching chunks (after optional
q filter).The
q filter value used, or empty string.The paginated list of chunks.
Example
Response