Observability: Prometheus Metrics, Tracing, and Dashboards

Archestra exposes Prometheus metrics and OpenTelemetry traces for monitoring system health, tracking HTTP requests, and analyzing LLM API performance. The web process exposes metrics at http://localhost:9050/metrics, and when the separate worker deployment is enabled (the default for Helm deployments), the worker exposes its own metrics endpoint at http://<worker-host>:9000/metrics. Production scrape configurations should collect both endpoints — task queue metrics and Knowledge Base pipeline metrics (connector syncs, embedding batches) are emitted from the worker process.

Prometheus Metrics

LLM Metrics

The following metrics track LLM API usage, cost, and performance across all agents, proxies, and gateways:

Metric	Description
`llm_request_duration_seconds`	LLM API request duration by provider, model, agent_id, agent_name, agent_type, external_agent_id, source, and status code
`llm_tokens_total`	Token consumption by provider, model, agent, source, and type (`input`/`output`)
`llm_cost_total`	Estimated cost in USD by provider, model, and agent. Requires token pricing to be configured in Archestra.
`llm_blocked_tools_total`	Counter of tool calls blocked by tool invocation policies, grouped by provider, model, and agent
`llm_time_to_first_token_seconds`	Time to first token (TTFT) for streaming requests, by provider, agent, source, and model. Helps compare models by initial response latency.
`llm_tokens_per_second`	Output tokens per second throughput, by provider, agent, source, and model. Useful for comparing response speeds in latency-sensitive applications.

agent_id and agent_name are the internal Archestra identifiers. external_agent_id contains the value passed via the X-Archestra-Agent-Id header, allowing clients to associate metrics with their own agent identifiers. Knowledge Base embedding and reranking calls emit these same metrics with agent_name="Knowledge Base" and source="knowledge:embedding" or source="knowledge:reranker".

MCP Metrics

Metric	Description
`mcp_tool_calls_total`	Total MCP tool calls by agent, mcp_server_name, tool_name, and status (`success`/`error`)
`mcp_tool_call_duration_seconds`	MCP tool call execution duration by agent, mcp_server_name, tool_name, and status
`mcp_server_deployment_status`	Current deployment state of self-hosted MCP servers by server_name and state (`not_created`/`pending`/`running`/`failed`/`succeeded`). Value is `1` for the active state. Use `count(mcp_server_deployment_status{state="running"} == 1)` to count running deployments.

RAG & Knowledge Base Metrics

Metric	Description
`rag_connector_syncs_total`	Total connector syncs by connector_type and status (`success`/`failed`/`partial`)
`rag_connector_sync_duration_seconds`	Connector sync duration by connector_type and status
`rag_documents_processed_total`	Total documents processed during syncs by connector_type
`rag_documents_ingested_total`	Total documents ingested (new or updated) by connector_type
`rag_chunks_created_total`	Total chunks created during document ingestion by connector_type
`rag_embedding_batches_total`	Total embedding batches processed by status (`success`/`error`)
`rag_embedding_documents_total`	Total documents embedded by status
`rag_queries_total`	Total RAG queries by search_type (`vector`/`hybrid`)
`rag_query_duration_seconds`	RAG query end-to-end duration (embedding, search, rerank) by search_type
`rag_query_results_count`	Number of results returned per RAG query by search_type

Task Queue Metrics

Metric	Description
`task_queue_tasks_enqueued_total`	Total tasks enqueued by task_type
`task_queue_tasks_completed_total`	Total tasks completed successfully by task_type
`task_queue_tasks_failed_total`	Total task processing failures (may be retried) by task_type
`task_queue_tasks_dead_total`	Total tasks moved to dead-letter (max retries exceeded) by task_type
`task_queue_task_duration_seconds`	Task processing duration by task_type
`task_queue_active_tasks`	Currently active (in-flight) tasks by task_type
`task_queue_stuck_tasks_reset_total`	Total stuck tasks reset back to pending

HTTP & Runtime Metrics

Metric	Description
`http_request_duration_seconds_count`	Total HTTP requests by method, route, and status
`http_request_duration_seconds_bucket`	Request duration histogram buckets
`http_request_summary_seconds`	Request duration summary with quantiles
`process_cpu_user_seconds_total`	CPU time in user mode
`process_resident_memory_bytes`	Physical memory usage
`nodejs_eventloop_lag_seconds`	Event loop lag (latency indicator)
`nodejs_heap_size_used_bytes`	V8 heap memory usage
`nodejs_gc_duration_seconds`	Garbage collection timing by type

Securing the Metrics Endpoint

The metrics endpoint supports bearer token authentication. Set ARCHESTRA_METRICS_SECRET to require an Authorization: Bearer <token> header from all scrapers:

ARCHESTRA_METRICS_PORT=9050
ARCHESTRA_METRICS_SECRET=your-secure-metrics-token

Configure your Prometheus scrape job accordingly:

scrape_configs:
  - job_name: archestra
    static_configs:
      - targets: ["archestra-host:9050"]
    authorization:
      credentials: your-secure-metrics-token

OpenTelemetry Tracing

Archestra exports OpenTelemetry traces to any OTLP-compatible backend — Jaeger, Tempo, Honeycomb, Grafana Cloud, and others.

Configuration

# OTLP endpoint (base URL for /v1/traces and /v1/logs)
ARCHESTRA_OTEL_EXPORTER_OTLP_ENDPOINT=http://your-collector:4318

Authentication

Bearer Token
Basic Auth
No Authentication

ARCHESTRA_OTEL_EXPORTER_OTLP_AUTH_BEARER=your-bearer-token

Bearer token authentication takes precedence over basic authentication when both are configured.

ARCHESTRA_OTEL_EXPORTER_OTLP_AUTH_USERNAME=your-username
ARCHESTRA_OTEL_EXPORTER_OTLP_AUTH_PASSWORD=your-password

Both username and password must be provided; omitting either disables basic auth.

Leave all auth variables unset. Traces are sent without Authorization headers.

Content Capture

Archestra can capture prompt/completion content and tool call arguments/results as span events for a full audit trail. This is enabled by default.

# Disable content capture (for privacy or to reduce span sizes)
ARCHESTRA_OTEL_CAPTURE_CONTENT=false

# Maximum characters per captured event (default: 10000)
ARCHESTRA_OTEL_CONTENT_MAX_LENGTH=10000

When enabled, traces include:

LLM spans — gen_ai.content.prompt event with request messages, and gen_ai.content.completion event with response text
MCP spans — gen_ai.content.input event with tool call arguments, and gen_ai.content.output event with tool call results

What’s Traced

Archestra automatically traces these categories:

LLM API Calls

Every call to an LLM provider, with model, token counts, cost, and response time. Span names follow the GenAI semconv format {operation} {model} — e.g., chat gpt-4o-mini or generate_content gemini-2.0-flash.

MCP Tool Calls

Every tool execution through the MCP Gateway, with tool name, server, duration, and whether the call was blocked by a policy.

Knowledge Base Operations

Embedding and reranking LLM calls made by the Knowledge Base system, with cost and token tracking. Identified by archestra.trigger.source=knowledge:embedding or knowledge:reranker.

Skill Sandbox Execution

The native Rust sandbox exports its own spans (service.name=archestra-sandbox-rs) — command runs, artifact reads, and container materialization — with exit_code, duration_ms, and output size as span fields.

Chat Trace Structure

Each chat turn produces a unified trace grouping LLM calls and tool executions under a single parent span:

chat {agentName}                       ← parent span (SpanKind.SERVER)
├── chat {model}                       ← LLM call via proxy (SpanKind.CLIENT)
├── execute_tool {tool_name}           ← MCP tool execution
└── chat {model}                       ← follow-up LLM call after tool result

The same structure applies across all invocation paths:

Invocation Path	`route.category`	`archestra.trigger.source`
Chat UI	`chat`	—
A2A Protocol	`a2a`	—
MS Teams	`chatops`	`ms-teams`
Email	`email`	`email`

Verbose Tracing

By default, traces include only GenAI-specific spans. To also capture infrastructure spans (HTTP routes, outgoing HTTP calls, Node.js fetch):

ARCHESTRA_OTEL_VERBOSE_TRACING=true

Verbose tracing produces significantly more spans. Use it for debugging only, not as a permanent production setting.

Metric-to-Trace Exemplars

All LLM and MCP metrics include trace exemplars. In Grafana, clicking on a data point jumps directly to the corresponding trace in Tempo. This requires:

Prometheus configured with --enable-feature=exemplar-storage
Grafana Prometheus datasource configured with exemplarTraceIdDestinations pointing to your Tempo datasource

Custom Agent Labels

Labels are key-value pairs configured on agents in the Archestra UI. Once added, they appear in:

Metrics — as additional label dimensions on all LLM and MCP metrics (kebab-case labels are converted to snake_case for Prometheus naming compatibility)
Traces — as archestra.label.<key> span attributes

Grafana Dashboards

Archestra provides five pre-built Grafana dashboards:

GenAI Observability

LLM request metrics, token usage, cost analysis, latency, and traces

MCP Monitoring

MCP tool call metrics, error rates, duration, and traces

Agent Sessions

Session-level agent audit trail with drill-down into LLM calls, MCP tool calls, and correlated logs

Application Metrics

HTTP traffic, Node.js runtime health, task queue processing, and PostgreSQL database monitoring

RAG & Knowledge Base

Connector sync monitoring, embedding pipeline, and RAG query performance

Importing All Dashboards

Create a Grafana Service Account token with the Editor role and run:

GRAFANA_URL=https://your-grafana-instance GRAFANA_TOKEN=glsa_xxx \
  bash <(curl -sL https://raw.githubusercontent.com/archestra-ai/archestra/main/platform/dev/grafana/install-dashboards.sh)

This creates an Archestra folder and imports all five dashboards. The script is idempotent — safe to re-run after updates.

PostgreSQL Metrics Provider

The Application Metrics dashboard supports multiple PostgreSQL metrics sources. Use the --postgres-provider flag to select the right one:

# OTel Collector PostgreSQL Receiver (RDS, Cloud SQL, Azure, or any PostgreSQL)
GRAFANA_URL=https://example.grafana.net GRAFANA_TOKEN=glsa_xxx \
  bash <(curl -sL https://raw.githubusercontent.com/archestra-ai/archestra/main/platform/dev/grafana/install-dashboards.sh) \
  --postgres-provider otel

# GCP Cloud SQL via Stackdriver Exporter
GRAFANA_URL=https://example.grafana.net GRAFANA_TOKEN=glsa_xxx \
  bash <(curl -sL https://raw.githubusercontent.com/archestra-ai/archestra/main/platform/dev/grafana/install-dashboards.sh) \
  --postgres-provider cloudsql

# Azure Database for PostgreSQL via Azure Monitor
GRAFANA_URL=https://example.grafana.net GRAFANA_TOKEN=glsa_xxx \
  bash <(curl -sL https://raw.githubusercontent.com/archestra-ai/archestra/main/platform/dev/grafana/install-dashboards.sh) \
  --postgres-provider azure

Provider	Metric Prefix	Use When
`helm` (default)	`pg_*`	Using the Bitnami PostgreSQL Helm subchart with metrics sidecar
`otel`	`postgresql_*`	Using OTel Collector PostgreSQL Receiver against any PostgreSQL instance
`cloudsql`	`stackdriver_cloudsql_*`	Scraping GCP Cloud Monitoring via the Stackdriver Exporter
`azure`	`azure_*`	Scraping Azure Monitor metrics for Azure Database for PostgreSQL

Performance Benchmarks

Archestra adds approximately 30-50ms latency per request while providing enterprise-grade security and policy enforcement for LLM applications.

Metric	Value
Backend processing	20-23ms
End-to-end P50	25ms
End-to-end P95	31ms
End-to-end P99	41ms
Database overhead	<0.5ms
Throughput @ concurrency=10	155 req/s
Throughput @ concurrency=500	272 req/s
CPU utilization (single process)	0.44%
Memory (single process)	222MB RAM

Test configuration: Single-threaded Node.js process on GCP e2-standard-2 (2 vCPU, 8 GB RAM) with Cloud SQL PostgreSQL 16 (8 vCPU, 32 GB RAM). LLM calls run in mock mode to isolate platform overhead.

Recommended Deployment Configurations

Tier	Requests/Day	Requests/Second	Platform Resources	Database Resources
Small	under 100K	1-100	1 instance: 2 vCPU, 4 GB RAM	2 vCPU, 4 GB RAM
Medium	100K-1M	100-500	2-4 instances: 4 vCPU, 8 GB RAM each	4 vCPU, 8 GB RAM, read replicas
Large	1M-10M	500-2K	4-8 instances: 4 vCPU, 16 GB RAM each	8 vCPU, 16 GB RAM, connection pooling
Enterprise	10M+	2K+	8+ instances: 8 vCPU, 16 GB RAM each	8+ vCPU, 32 GB RAM, sharding

Get Started

MCP

Agents

LLM Proxy

Security

Administration

Integrations

Contributing

Observability: Prometheus Metrics, Tracing, and Dashboards

Prometheus Metrics

LLM Metrics

MCP Metrics

RAG & Knowledge Base Metrics

Task Queue Metrics

HTTP & Runtime Metrics

Securing the Metrics Endpoint

OpenTelemetry Tracing

Configuration

Authentication

Content Capture

What’s Traced

LLM API Calls

MCP Tool Calls

Knowledge Base Operations

Skill Sandbox Execution

Chat Trace Structure

Verbose Tracing

Metric-to-Trace Exemplars

Custom Agent Labels

Grafana Dashboards

GenAI Observability

MCP Monitoring

Agent Sessions

Application Metrics

RAG & Knowledge Base

Importing All Dashboards

PostgreSQL Metrics Provider

Performance Benchmarks

Recommended Deployment Configurations

Build docs developers (and LLMs) love

Get Started

MCP

Agents

LLM Proxy

Security

Administration

Integrations

Contributing

Documentation Index

​Prometheus Metrics

​LLM Metrics

​MCP Metrics

​RAG & Knowledge Base Metrics

​Task Queue Metrics

​HTTP & Runtime Metrics

​Securing the Metrics Endpoint

​OpenTelemetry Tracing

​Configuration

​Authentication

​Content Capture

​What’s Traced

LLM API Calls

MCP Tool Calls

Knowledge Base Operations

Skill Sandbox Execution

​Chat Trace Structure

​Verbose Tracing

​Metric-to-Trace Exemplars

​Custom Agent Labels

​Grafana Dashboards

GenAI Observability

MCP Monitoring

Agent Sessions

Application Metrics

RAG & Knowledge Base

​Importing All Dashboards

​PostgreSQL Metrics Provider

​Performance Benchmarks

​Recommended Deployment Configurations

Build docs developers (and LLMs) love

Prometheus Metrics

LLM Metrics

MCP Metrics

RAG & Knowledge Base Metrics

Task Queue Metrics

HTTP & Runtime Metrics

Securing the Metrics Endpoint

OpenTelemetry Tracing

Configuration

Authentication

Content Capture

What’s Traced

Chat Trace Structure

Verbose Tracing

Metric-to-Trace Exemplars

Custom Agent Labels

Grafana Dashboards

Importing All Dashboards

PostgreSQL Metrics Provider

Performance Benchmarks

Recommended Deployment Configurations