Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/vrashmanyu605-eng/devops-root-cause-analysis-agent/llms.txt

Use this file to discover all available pages before exploring further.

The RCA Agent ships with five built-in connectors covering the most common observability platforms used in production DevOps environments. Each connector translates platform-specific query results into one or more normalized Signal objects — typed as log, metric, or trace — so that the Agent Core can reason across heterogeneous data sources with a consistent interface. Connectors are enabled selectively per analysis run; only the connectors whose required environment variables are present and whose health_check() passes will be included in the signal-fetching stage.

Connector Reference

Signal type: metricPrometheus is queried using PromQL against the /api/v1/query_range endpoint. The connector retrieves all time-series metrics matching the configured label selectors over the analysis time window and returns them as numeric Signal objects annotated with metric name, labels, and step interval.Required environment variables
VariableDescription
PROMETHEUS_URLBase URL of the Prometheus server, e.g. http://prometheus:9090
Optional environment variables
VariableDefaultDescription
PROMETHEUS_BEARER_TOKEN(none)Bearer token for Prometheus instances protected by an authentication proxy
PROMETHEUS_STEP_SECONDS60Resolution step for query_range requests, in seconds
PROMETHEUS_LABEL_SELECTORS{}JSON object of label key/value pairs used to filter metric series
What the connector fetchesFor each metric name listed in PROMETHEUS_METRIC_NAMES (comma-separated), the connector issues a query_range call spanning the full analysis window. Returned samples are normalized into Signal objects with signal_type="metric", source="prometheus", and a values list of (timestamp, float) pairs.
# Example: fetching CPU usage over a 30-minute window
from datetime import datetime, timezone

signals = PrometheusConnector().fetch_signals(
    start_time=datetime(2024, 6, 1, 10, 0, 0, tzinfo=timezone.utc),
    end_time=datetime(2024, 6, 1, 10, 30, 0, tzinfo=timezone.utc),
)
# Returns Signal(signal_type="metric", name="node_cpu_seconds_total", ...)
Known limitations
  • Metrics must already be scraped by Prometheus before the analysis window begins; the connector cannot backfill missing data.
  • Very high-cardinality label sets can produce extremely large response payloads. Use PROMETHEUS_LABEL_SELECTORS to pre-filter series.
  • Remote-write replicas or Thanos sidecars may require additional authentication headers not covered by PROMETHEUS_BEARER_TOKEN.
Signal type: logThe Elasticsearch connector queries a target index (or index pattern) using the _search API, filtering documents by a timestamp field within the analysis window. Matched log documents are returned as Signal objects with signal_type="log", preserving the original message text and severity level where present.Required environment variables
VariableDescription
ELASTICSEARCH_URLBase URL of the Elasticsearch cluster, e.g. https://es-cluster:9200
ELASTICSEARCH_USERNAMEUsername for HTTP Basic authentication
ELASTICSEARCH_PASSWORDPassword for HTTP Basic authentication
ELASTICSEARCH_INDEXIndex name or pattern to query, e.g. logs-*
Optional environment variables
VariableDefaultDescription
ELASTICSEARCH_TIMESTAMP_FIELD@timestampName of the document field used for time-range filtering
ELASTICSEARCH_MAX_RESULTS1000Maximum number of log documents to retrieve per analysis
ELASTICSEARCH_SEVERITY_FIELDlevelDocument field name mapped to log severity
What the connector fetchesThe connector issues a bool query with a range filter on the timestamp field and an optional match filter derived from the incident context string. Results are sorted by timestamp descending and limited to ELASTICSEARCH_MAX_RESULTS. Each hit is converted to a Signal with the message body, severity, and originating index name.Known limitations
  • Very large indices with billions of documents may cause query timeouts. Reduce ELASTICSEARCH_MAX_RESULTS or narrow the index pattern if latency is a concern.
  • The connector authenticates via HTTP Basic auth only. API key authentication is not yet supported.
  • Index lifecycle management (ILM) rollover aliases are supported, but cross-cluster search (CCS) targets require the remote cluster to be reachable from the worker network.
Signal type: logThe Loki connector uses the LogQL query language against the /loki/api/v1/query_range endpoint to retrieve log streams matching a label selector. It is the preferred connector for Kubernetes workload logs when Loki is deployed as part of the Grafana observability stack.Required environment variables
VariableDescription
LOKI_URLBase URL of the Loki server, e.g. http://loki:3100
Optional environment variables
VariableDefaultDescription
LOKI_TENANT_ID(none)X-Scope-OrgID header value for multi-tenant Loki deployments
LOKI_LABEL_SELECTOR{job=~".+"}LogQL stream selector used to scope log retrieval
LOKI_MAX_LINES5000Maximum number of log lines to retrieve per analysis
LOKI_QUERY_TIMEOUT_SECONDS30Per-request timeout when querying the Loki API
What the connector fetchesThe connector calls query_range with the configured LOKI_LABEL_SELECTOR and the analysis time window expressed as nanosecond UNIX timestamps. Each returned log entry is mapped to a Signal with signal_type="log", preserving the stream labels as metadata.
# Example LogQL selector for a specific Kubernetes namespace
LOKI_LABEL_SELECTOR='{namespace="payments", container="api"}'
Known limitations
  • Requires Loki 2.x or later; the /loki/api/v1/query_range endpoint is not available on older releases.
  • Very broad label selectors (e.g., matching all jobs) can hit Loki’s max_entries_limit even when LOKI_MAX_LINES is set, because the limit is enforced server-side.
  • Loki does not support full-text search natively; context-based filtering is done client-side by the connector after retrieval.
Signal type: traceThe Jaeger connector retrieves distributed traces from the Jaeger Query API, scoped by service name and the analysis time window. Traces are converted to Signal objects carrying span count, error rate, and P99 latency, which the LLM uses to identify services with anomalous request behavior.Required environment variables
VariableDescription
JAEGER_QUERY_URLBase URL of the Jaeger Query service, e.g. http://jaeger:16686
Optional environment variables
VariableDefaultDescription
JAEGER_SERVICES(all)Comma-separated list of service names to query; defaults to all services discovered via /api/services
JAEGER_MAX_TRACES200Maximum number of traces to retrieve per service
JAEGER_MIN_DURATION_MICROS(none)Filter traces by minimum duration (microseconds) to focus on slow requests
What the connector fetchesFor each service in scope, the connector calls /api/traces?service=<name>&start=<us>&end=<us>&limit=<n>. Each trace is summarized into a Signal recording the root span’s operation name, total duration, error flag, and child span count. Only completed traces (those with a root span present) are included.Known limitations
  • Only completed traces are returned; in-flight spans at the time of the query are excluded.
  • The Jaeger HTTP API does not support server-side filtering by error status; error filtering is applied client-side, which means JAEGER_MAX_TRACES is consumed before error filtering occurs.
  • OpenTelemetry-native deployments should use the OpenTelemetry collector’s Jaeger-compatible receiver to expose this API, or wait for the planned native OTLP connector.
Signal type: metric + logThe Datadog connector calls two Datadog API endpoints — the Metrics Query API for time-series data and the Logs Search API for log events — and returns both metric and log Signal objects in a single fetch_signals call. This makes it the richest single-source connector available.Required environment variables
VariableDescription
DATADOG_API_KEYDatadog API key (write + read permissions)
DATADOG_APP_KEYDatadog Application key (required for metrics and logs query)
DATADOG_SITEDatadog site identifier, e.g. datadoghq.com or datadoghq.eu
Optional environment variables
VariableDefaultDescription
DATADOG_METRIC_QUERYavg:system.cpu.user{*}Datadog metrics query string sent to the /api/v1/query endpoint
DATADOG_LOG_QUERYstatus:errorDatadog log search query
DATADOG_MAX_LOG_EVENTS500Maximum number of log events to retrieve
What the connector fetchesMetrics are retrieved via GET /api/v1/query with the analysis window expressed as UNIX timestamps. Log events are retrieved via POST /api/v2/logs/events/search filtered by DATADOG_LOG_QUERY and the same time window. Both result sets are merged into a single Signal list before being returned.Known limitations
  • Datadog’s API enforces rate limits (varies by plan). Long analysis windows or high-frequency polling may result in 429 Too Many Requests responses. The connector implements exponential backoff but will ultimately fail if the budget is exhausted.
  • Custom metrics (those with custom. prefix) count against your Datadog custom metric quota and may not be available in all subscription tiers.
  • The DATADOG_SITE value must match the site used when the API key was generated; cross-site key usage will return a 403 Forbidden response.

Signal Type Reference

All connectors normalize their platform-specific output into one of three standardized signal types before returning results to the Agent Core. The LLM prompt always receives a mix of these types, enabling cross-source correlation.
Signal TypeDescriptionTypical Sources
logA timestamped, human-readable text event emitted by an application, system service, or infrastructure componentElasticsearch, Loki
metricA numeric time-series measurement sampled at regular intervals, representing resource utilization, throughput, error rates, or custom application KPIsPrometheus, Datadog
traceA distributed request span capturing the full execution path across services, including per-span timing, error status, and metadataJaeger, OpenTelemetry
Need to pull signals from an internal platform not listed here? Follow the Custom Data Sources guide to implement and register your own BaseConnector subclass without modifying core agent code.

Build docs developers (and LLMs) love