Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/TrinaxCode/TrinaxAI/llms.txt

Use this file to discover all available pages before exploring further.

TrinaxAI exposes two primary query endpoints. /v1/chat/completions is an OpenAI-compatible interface that retrieves context from your indexed documents and streams a grounded response via SSE. /v1/research performs deep multi-pass retrieval with sub-question decomposition for complex analytical queries. Both endpoints are rate-limited to 30 requests per minute per IP and are open to trusted CORS origins by default.

POST /v1/chat/completions

OpenAI-compatible RAG chat with hybrid retrieval (vector + BM25 fusion) and optional SSE streaming. The auto-router selects the best local model for each query — a coder model for code questions, a general model for prose, and the largest available model for complex multi-part questions.
This endpoint is open to trusted CORS origins (localhost ports 3334/3335 and private LAN IPs) without a token. System-level authorization is not required for chat.

Request body

model
string
default:"auto"
Ollama model name to use, or "auto" to let the built-in router choose the best model for the query. The router maps trivial queries → fast model, code queries → qwen2.5-coder, complex queries → the deepest available model (up to 14b on ultra profile).
messages
array
required
Array of conversation turns. Each element must have role ("user", "assistant", or "system") and content (string). Up to four previous turns are injected into the retrieval prompt for follow-up understanding.
stream
boolean
default:"true"
Set to true for Server-Sent Events (SSE) streaming. Set to false to receive a single JSON response after the full answer is generated.
collections
array
Optional list of collection IDs to restrict retrieval to. When omitted, the retriever searches across all indexed collections. Collection IDs are slugified strings (e.g. "my-project", "default").

Streaming response (SSE)

When stream: true, the server returns Content-Type: text/event-stream. Each data: line is a JSON object. The stream follows this sequence:
data: {"trinaxai":{"model":"qwen2.5-coder:7b","project":"Insider"}}
data: {"choices":[{"delta":{"content":"The auth module..."}}]}
data: {"choices":[{"delta":{"content":" uses JWT tokens"}}]}
data: {"trinaxai_sources":[{"file":"auth.py","snippet":"def verify_token(tok...","score":0.89}]}
data: [DONE]
The first event carries model routing metadata. Subsequent choices events stream individual tokens. The trinaxai_sources event is emitted once, just before [DONE], containing the grounding sources used for the response.

trinaxai_sources fields

trinaxai_sources
array
Array of source objects used to ground the answer.

Non-streaming response

When stream: false, the server returns a single JSON object after the complete answer is generated.
id
string
Unique completion ID (e.g. "chatcmpl-1718123456").
object
string
Always "chat.completion".
created
integer
Unix timestamp of when the response was generated.
model
string
The model that was actually used (resolved from "auto" if applicable).
choices
array
Array with a single element containing the assistant message.
trinaxai
object
TrinaxAI-specific metadata appended to the standard OpenAI response shape.

Examples

curl -N -X POST http://localhost:3333/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "auto",
    "messages": [
      {"role": "user", "content": "How does the auth module work?"}
    ],
    "stream": true,
    "collections": ["my-project"]
  }'
Non-streaming response
{
  "id": "chatcmpl-1718123456",
  "object": "chat.completion",
  "created": 1718123456,
  "model": "qwen2.5-coder:7b",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "The auth module uses JWT tokens for stateless session management..."
      },
      "finish_reason": "stop"
    }
  ],
  "trinaxai": {
    "model": "qwen2.5-coder:7b",
    "project": "Insider",
    "sources": [
      {
        "file": "app/auth.py",
        "project": "Insider",
        "collection_id": "my-project",
        "collection": "My Project",
        "page": null,
        "snippet": "def verify_token(tok): ...",
        "score": 0.89
      }
    ]
  },
  "usage": {"prompt_tokens": 0, "completion_tokens": 0, "total_tokens": 0}
}

POST /v1/research

Deep multi-pass research with sub-question decomposition. The LLM first breaks your query into 2–4 focused sub-questions, runs a separate retrieval pass for each, deduplicates the collected chunks, and synthesizes a comprehensive grounded answer with inline citations.
Use /v1/research when you need a thorough comparative or analytical answer — e.g. “Compare authentication patterns across the project” or “Explain how data flows from the frontend to the database.” For simple factual look-ups, /v1/chat/completions is faster.
This endpoint requires authorization (localhost/LAN or X-Admin-Token header). It is not rate-limited the same way as chat, but it is significantly more compute-intensive.

Request body

query
string
required
The full research question. Longer, more specific questions produce better sub-question decomposition.
depth
integer
default:"2"
Research depth. 1 skips sub-question decomposition and runs a single retrieval pass. 2 (default) decomposes into 2–4 sub-questions. 3 adds an extra cross-pass using the original query to fill any remaining gaps. Clamped to [1, 3].
collections
array
Optional list of collection IDs to restrict retrieval to. When omitted, all collections are searched.
model
string
Override the model used for both decomposition and synthesis. Defaults to TRINAXAI_LLM from config (the code model for the current profile).

Response

answer
string
Comprehensive synthesized answer with inline citations in [n] format referencing the source list.
sub_questions
array
The sub-questions that were generated and used to drive retrieval passes. Array of strings.
sources
array
All unique source chunks collected across all retrieval passes.
passes
integer
Number of retrieval passes executed (equals len(sub_questions) plus one for depth >= 3).
model
string
Model used for decomposition and synthesis.

Example

curl -X POST http://localhost:3333/v1/research \
  -H "Content-Type: application/json" \
  -d '{
    "query": "Compare authentication patterns across the project",
    "depth": 2,
    "collections": ["default"]
  }'
Response
{
  "answer": "The project uses two authentication patterns: [1] JWT-based stateless auth in `auth.py` and [2] session cookies in `legacy_auth.py`...",
  "sub_questions": [
    "What authentication methods are used in the codebase?",
    "How are tokens validated and sessions managed?"
  ],
  "sources": [
    {
      "file": "app/auth.py",
      "project": "Insider",
      "collection_id": "default",
      "collection": "General",
      "page": null,
      "snippet": "def verify_token(tok): payload = jwt.decode(tok, SECRET_KEY, ...",
      "score": 0.923
    }
  ],
  "passes": 2,
  "model": "qwen2.5-coder:7b"
}

GET /v1/sources

List all indexed files in a collection, with chunk counts, byte size, last-modified time, and a short preview snippet. Results are cached for performance (default: 30 seconds in fast mode).
This endpoint requires authorization (localhost/LAN or X-Admin-Token).

Query parameters

collection
string
Collection ID to list sources for. Defaults to "default" when omitted.

Response

collection
string
The resolved collection ID that was queried.
sources
array
List of source file entries, sorted by descending chunk count then filename.

Example

curl "http://localhost:3333/v1/sources?collection=default" \
  -H "X-Admin-Token: your-token"
Response
{
  "collection": "default",
  "sources": [
    {
      "file": "app/auth.py",
      "chunks": 45,
      "size": 38420,
      "mtime": 1718000000.0,
      "preview": "import jwt\nfrom functools import wraps\n\nSECRET_KEY = os.getenv('SECRET_KEY')"
    },
    {
      "file": "docs/architecture.md",
      "chunks": 12,
      "size": 9800,
      "mtime": 1717900000.0,
      "preview": "# Architecture Overview\n\nTrinaxAI is a local-first AI assistant..."
    }
  ]
}

GET /v1/sources/{collection}/{file:path}/chunks

Retrieve individual indexed chunks for a specific file within a collection. Supports pagination and optional text search filtering.
This endpoint requires authorization (localhost/LAN or X-Admin-Token).

Path parameters

collection
string
required
The collection ID containing the file.
file
string
required
The relative file path within the collection (URL-encoded). For example: app/auth.py or docs/notes/architecture.md.

Query parameters

limit
integer
default:"50"
Maximum number of chunks to return. Clamped to [1, 500].
offset
integer
default:"0"
Number of chunks to skip for pagination.
q
string
Optional case-insensitive substring filter. When provided, only chunks whose text contains this string are returned, and total reflects the filtered count.

Response

collection
string
Collection ID.
file
string
The file path that was queried.
total
integer
Total number of matching chunks (after optional q filter).
query
string
The q filter value used, or empty string.
chunks
array
The paginated list of chunks.

Example

curl "http://localhost:3333/v1/sources/default/app%2Fauth.py/chunks?limit=5&q=jwt" \
  -H "X-Admin-Token: your-token"
Response
{
  "collection": "default",
  "file": "app/auth.py",
  "total": 3,
  "query": "jwt",
  "chunks": [
    {
      "id": "a1b2c3d4e5f6",
      "text": "import jwt\nfrom functools import wraps\n\ndef verify_token(tok):\n    payload = jwt.decode(tok, SECRET_KEY, algorithms=['HS256'])\n    return payload",
      "metadata": {
        "rel_path": "app/auth.py",
        "project": "Insider",
        "collection_id": "default",
        "collection_name": "General",
        "page_label": null,
        "page": null
      },
      "score": null
    }
  ]
}

Build docs developers (and LLMs) love