TrinaxAI exposes two lightweight monitoring endpoints that do not require authorization.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/TrinaxCode/TrinaxAI/llms.txt
Use this file to discover all available pages before exploring further.
/health provides a comprehensive snapshot of the system state — including whether the index is loaded, which models are available, the current hardware profile, and which optional features are active. /resources reports local RAM and VRAM telemetry for the PWA’s resource meter.
Both
/health and /resources are open to all trusted CORS origins (localhost ports 3334/3335 and private LAN IPs) without an admin token. They are safe to poll frequently from the PWA.GET /health
Returns a comprehensive system status snapshot. Intended to be polled by the PWA on startup and periodically to detect index readiness, Ollama availability, and active feature flags.Response
Always
true when the RAG API is running. (The API returning any response at all means it is healthy.)true if the hybrid retriever (vector + BM25) is loaded and ready to answer queries.true if Ollama is reachable at the configured OLLAMA_BASE_URL. Cached for 5 seconds to avoid hammering the Ollama process.Sorted list of project names detected in the indexed content. Empty if the index is not loaded.
Full list of collection objects (same shape as
GET /collections).The active model fleet list — ordered by preference. Derived from the current profile and
TRINAXAI_MODEL_* environment variables. Typical 16gb fleet: ["qwen2.5-coder:3b", "qwen2.5-coder:7b", "llama3.2:3b"].Active hardware profile. One of:
"8gb", "16gb", "max", "ultra" (or custom). Controls model selection, chunk sizes, context window, and retrieval depth.Context window size in tokens for the active profile.
2048 for low-resource, 4096 default, 8192 for max, 16384 for ultra.Number of concurrent embedding workers. Higher = faster indexing but more RAM pressure.
Number of chunks per embedding batch request.
How long Ollama keeps the embedding model loaded in memory between requests (e.g.
"15m").One of
"fast", "balanced", "quality". Controls retrieval depth and cache TTLs.Number of candidates each retriever (vector + BM25) contributes before reciprocal rank fusion.
Final number of chunks injected into the LLM context after fusion (and optional reranking).
TTL for the retrieval result cache.
0 disables caching.true if the cross-encoder reranker (BAAI/bge-reranker-v2-m3) is enabled via TRINAXAI_RERANK=1.Active feature flags.
Example
Response (16gb profile, index loaded)
Response (no index yet)
GET /resources
Basic local RAM telemetry for the PWA’s resource usage panel. Usespsutil when available for precise used/available figures, with a graceful fallback to OS syscalls (sysconf) that only reports total RAM. VRAM reporting is reserved for a future release (vram: null).
This endpoint requires
psutil for full RAM metrics (used_gb, percent). Without it, only total is reported. Install with pip install psutil for complete telemetry.Response
Always
true.RAM usage object.
null if memory information could not be read from the OS.Reserved for future GPU VRAM telemetry. Always
null in the current release.Examples
With psutil installed
Without psutil (sysconf fallback)