Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/archestra-ai/archestra/llms.txt

Use this file to discover all available pages before exploring further.

Archestra exposes Prometheus metrics and OpenTelemetry traces for monitoring system health, tracking HTTP requests, and analyzing LLM API performance. The web process exposes metrics at http://localhost:9050/metrics, and when the separate worker deployment is enabled (the default for Helm deployments), the worker exposes its own metrics endpoint at http://<worker-host>:9000/metrics. Production scrape configurations should collect both endpoints — task queue metrics and Knowledge Base pipeline metrics (connector syncs, embedding batches) are emitted from the worker process.

Prometheus Metrics

LLM Metrics

The following metrics track LLM API usage, cost, and performance across all agents, proxies, and gateways:
MetricDescription
llm_request_duration_secondsLLM API request duration by provider, model, agent_id, agent_name, agent_type, external_agent_id, source, and status code
llm_tokens_totalToken consumption by provider, model, agent, source, and type (input/output)
llm_cost_totalEstimated cost in USD by provider, model, and agent. Requires token pricing to be configured in Archestra.
llm_blocked_tools_totalCounter of tool calls blocked by tool invocation policies, grouped by provider, model, and agent
llm_time_to_first_token_secondsTime to first token (TTFT) for streaming requests, by provider, agent, source, and model. Helps compare models by initial response latency.
llm_tokens_per_secondOutput tokens per second throughput, by provider, agent, source, and model. Useful for comparing response speeds in latency-sensitive applications.
agent_id and agent_name are the internal Archestra identifiers. external_agent_id contains the value passed via the X-Archestra-Agent-Id header, allowing clients to associate metrics with their own agent identifiers. Knowledge Base embedding and reranking calls emit these same metrics with agent_name="Knowledge Base" and source="knowledge:embedding" or source="knowledge:reranker".

MCP Metrics

MetricDescription
mcp_tool_calls_totalTotal MCP tool calls by agent, mcp_server_name, tool_name, and status (success/error)
mcp_tool_call_duration_secondsMCP tool call execution duration by agent, mcp_server_name, tool_name, and status
mcp_server_deployment_statusCurrent deployment state of self-hosted MCP servers by server_name and state (not_created/pending/running/failed/succeeded). Value is 1 for the active state. Use count(mcp_server_deployment_status{state="running"} == 1) to count running deployments.

RAG & Knowledge Base Metrics

MetricDescription
rag_connector_syncs_totalTotal connector syncs by connector_type and status (success/failed/partial)
rag_connector_sync_duration_secondsConnector sync duration by connector_type and status
rag_documents_processed_totalTotal documents processed during syncs by connector_type
rag_documents_ingested_totalTotal documents ingested (new or updated) by connector_type
rag_chunks_created_totalTotal chunks created during document ingestion by connector_type
rag_embedding_batches_totalTotal embedding batches processed by status (success/error)
rag_embedding_documents_totalTotal documents embedded by status
rag_queries_totalTotal RAG queries by search_type (vector/hybrid)
rag_query_duration_secondsRAG query end-to-end duration (embedding, search, rerank) by search_type
rag_query_results_countNumber of results returned per RAG query by search_type

Task Queue Metrics

MetricDescription
task_queue_tasks_enqueued_totalTotal tasks enqueued by task_type
task_queue_tasks_completed_totalTotal tasks completed successfully by task_type
task_queue_tasks_failed_totalTotal task processing failures (may be retried) by task_type
task_queue_tasks_dead_totalTotal tasks moved to dead-letter (max retries exceeded) by task_type
task_queue_task_duration_secondsTask processing duration by task_type
task_queue_active_tasksCurrently active (in-flight) tasks by task_type
task_queue_stuck_tasks_reset_totalTotal stuck tasks reset back to pending

HTTP & Runtime Metrics

MetricDescription
http_request_duration_seconds_countTotal HTTP requests by method, route, and status
http_request_duration_seconds_bucketRequest duration histogram buckets
http_request_summary_secondsRequest duration summary with quantiles
process_cpu_user_seconds_totalCPU time in user mode
process_resident_memory_bytesPhysical memory usage
nodejs_eventloop_lag_secondsEvent loop lag (latency indicator)
nodejs_heap_size_used_bytesV8 heap memory usage
nodejs_gc_duration_secondsGarbage collection timing by type

Securing the Metrics Endpoint

The metrics endpoint supports bearer token authentication. Set ARCHESTRA_METRICS_SECRET to require an Authorization: Bearer <token> header from all scrapers:
ARCHESTRA_METRICS_PORT=9050
ARCHESTRA_METRICS_SECRET=your-secure-metrics-token
Configure your Prometheus scrape job accordingly:
scrape_configs:
  - job_name: archestra
    static_configs:
      - targets: ["archestra-host:9050"]
    authorization:
      credentials: your-secure-metrics-token

OpenTelemetry Tracing

Archestra exports OpenTelemetry traces to any OTLP-compatible backend — Jaeger, Tempo, Honeycomb, Grafana Cloud, and others.

Configuration

# OTLP endpoint (base URL for /v1/traces and /v1/logs)
ARCHESTRA_OTEL_EXPORTER_OTLP_ENDPOINT=http://your-collector:4318

Authentication

ARCHESTRA_OTEL_EXPORTER_OTLP_AUTH_BEARER=your-bearer-token
Bearer token authentication takes precedence over basic authentication when both are configured.

Content Capture

Archestra can capture prompt/completion content and tool call arguments/results as span events for a full audit trail. This is enabled by default.
# Disable content capture (for privacy or to reduce span sizes)
ARCHESTRA_OTEL_CAPTURE_CONTENT=false

# Maximum characters per captured event (default: 10000)
ARCHESTRA_OTEL_CONTENT_MAX_LENGTH=10000
When enabled, traces include:
  • LLM spansgen_ai.content.prompt event with request messages, and gen_ai.content.completion event with response text
  • MCP spansgen_ai.content.input event with tool call arguments, and gen_ai.content.output event with tool call results

What’s Traced

Archestra automatically traces these categories:

LLM API Calls

Every call to an LLM provider, with model, token counts, cost, and response time. Span names follow the GenAI semconv format {operation} {model} — e.g., chat gpt-4o-mini or generate_content gemini-2.0-flash.

MCP Tool Calls

Every tool execution through the MCP Gateway, with tool name, server, duration, and whether the call was blocked by a policy.

Knowledge Base Operations

Embedding and reranking LLM calls made by the Knowledge Base system, with cost and token tracking. Identified by archestra.trigger.source=knowledge:embedding or knowledge:reranker.

Skill Sandbox Execution

The native Rust sandbox exports its own spans (service.name=archestra-sandbox-rs) — command runs, artifact reads, and container materialization — with exit_code, duration_ms, and output size as span fields.

Chat Trace Structure

Each chat turn produces a unified trace grouping LLM calls and tool executions under a single parent span:
chat {agentName}                       ← parent span (SpanKind.SERVER)
├── chat {model}                       ← LLM call via proxy (SpanKind.CLIENT)
├── execute_tool {tool_name}           ← MCP tool execution
└── chat {model}                       ← follow-up LLM call after tool result
The same structure applies across all invocation paths:
Invocation Pathroute.categoryarchestra.trigger.source
Chat UIchat
A2A Protocola2a
MS Teamschatopsms-teams
Emailemailemail

Verbose Tracing

By default, traces include only GenAI-specific spans. To also capture infrastructure spans (HTTP routes, outgoing HTTP calls, Node.js fetch):
ARCHESTRA_OTEL_VERBOSE_TRACING=true
Verbose tracing produces significantly more spans. Use it for debugging only, not as a permanent production setting.

Metric-to-Trace Exemplars

All LLM and MCP metrics include trace exemplars. In Grafana, clicking on a data point jumps directly to the corresponding trace in Tempo. This requires:
  • Prometheus configured with --enable-feature=exemplar-storage
  • Grafana Prometheus datasource configured with exemplarTraceIdDestinations pointing to your Tempo datasource

Custom Agent Labels

Labels are key-value pairs configured on agents in the Archestra UI. Once added, they appear in:
  • Metrics — as additional label dimensions on all LLM and MCP metrics (kebab-case labels are converted to snake_case for Prometheus naming compatibility)
  • Traces — as archestra.label.<key> span attributes

Grafana Dashboards

Archestra provides five pre-built Grafana dashboards:

GenAI Observability

LLM request metrics, token usage, cost analysis, latency, and traces

MCP Monitoring

MCP tool call metrics, error rates, duration, and traces

Agent Sessions

Session-level agent audit trail with drill-down into LLM calls, MCP tool calls, and correlated logs

Application Metrics

HTTP traffic, Node.js runtime health, task queue processing, and PostgreSQL database monitoring

RAG & Knowledge Base

Connector sync monitoring, embedding pipeline, and RAG query performance

Importing All Dashboards

Create a Grafana Service Account token with the Editor role and run:
GRAFANA_URL=https://your-grafana-instance GRAFANA_TOKEN=glsa_xxx \
  bash <(curl -sL https://raw.githubusercontent.com/archestra-ai/archestra/main/platform/dev/grafana/install-dashboards.sh)
This creates an Archestra folder and imports all five dashboards. The script is idempotent — safe to re-run after updates.

PostgreSQL Metrics Provider

The Application Metrics dashboard supports multiple PostgreSQL metrics sources. Use the --postgres-provider flag to select the right one:
# OTel Collector PostgreSQL Receiver (RDS, Cloud SQL, Azure, or any PostgreSQL)
GRAFANA_URL=https://example.grafana.net GRAFANA_TOKEN=glsa_xxx \
  bash <(curl -sL https://raw.githubusercontent.com/archestra-ai/archestra/main/platform/dev/grafana/install-dashboards.sh) \
  --postgres-provider otel

# GCP Cloud SQL via Stackdriver Exporter
GRAFANA_URL=https://example.grafana.net GRAFANA_TOKEN=glsa_xxx \
  bash <(curl -sL https://raw.githubusercontent.com/archestra-ai/archestra/main/platform/dev/grafana/install-dashboards.sh) \
  --postgres-provider cloudsql

# Azure Database for PostgreSQL via Azure Monitor
GRAFANA_URL=https://example.grafana.net GRAFANA_TOKEN=glsa_xxx \
  bash <(curl -sL https://raw.githubusercontent.com/archestra-ai/archestra/main/platform/dev/grafana/install-dashboards.sh) \
  --postgres-provider azure
ProviderMetric PrefixUse When
helm (default)pg_*Using the Bitnami PostgreSQL Helm subchart with metrics sidecar
otelpostgresql_*Using OTel Collector PostgreSQL Receiver against any PostgreSQL instance
cloudsqlstackdriver_cloudsql_*Scraping GCP Cloud Monitoring via the Stackdriver Exporter
azureazure_*Scraping Azure Monitor metrics for Azure Database for PostgreSQL

Performance Benchmarks

Archestra adds approximately 30-50ms latency per request while providing enterprise-grade security and policy enforcement for LLM applications.
MetricValue
Backend processing20-23ms
End-to-end P5025ms
End-to-end P9531ms
End-to-end P9941ms
Database overhead<0.5ms
Throughput @ concurrency=10155 req/s
Throughput @ concurrency=500272 req/s
CPU utilization (single process)0.44%
Memory (single process)222MB RAM
Test configuration: Single-threaded Node.js process on GCP e2-standard-2 (2 vCPU, 8 GB RAM) with Cloud SQL PostgreSQL 16 (8 vCPU, 32 GB RAM). LLM calls run in mock mode to isolate platform overhead.
TierRequests/DayRequests/SecondPlatform ResourcesDatabase Resources
Smallunder 100K1-1001 instance: 2 vCPU, 4 GB RAM2 vCPU, 4 GB RAM
Medium100K-1M100-5002-4 instances: 4 vCPU, 8 GB RAM each4 vCPU, 8 GB RAM, read replicas
Large1M-10M500-2K4-8 instances: 4 vCPU, 16 GB RAM each8 vCPU, 16 GB RAM, connection pooling
Enterprise10M+2K+8+ instances: 8 vCPU, 16 GB RAM each8+ vCPU, 32 GB RAM, sharding

Build docs developers (and LLMs) love