Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/vrashmanyu605-eng/devops-root-cause-analysis-agent/llms.txt

Use this file to discover all available pages before exploring further.

Every analysis the RCA Agent produces follows a consistent structure: a ranked list of hypotheses, each backed by signal excerpts pulled from your data sources, scored by the LLM against the incident context you provided. Understanding how to read that structure — what a confidence score actually means, how to trace a hypothesis back to the raw evidence, and when to trust the output versus widen the investigation — is the key skill for getting value out of the agent during an active incident.

The Result Schema

The AnalysisResult object returned by agent.analyze() (and surfaced in the Streamlit UI) contains the following top-level fields:
FieldTypeDescription
hypothesesList[Hypothesis]Root cause candidates, sorted by confidence descending.
analysis_windowdictThe start_time and end_time passed to the analysis run.
sources_queriedList[str]Connector names that successfully returned signals.
analysis_duration_secondsfloatWall-clock time from task dispatch to result ready.
Each entry in hypotheses is a Hypothesis object with these fields:
FieldTypeDescription
titlestrA short, human-readable description of the candidate root cause.
confidencefloatScore from 0.0 to 1.0 reflecting the agent’s certainty.
evidence_summarystrA one-to-two sentence LLM-generated summary of the supporting signals.
supporting_signalsList[SignalExcerpt]Individual signal excerpts that contributed to this hypothesis. Each has a source, timestamp, signal_type, and content field.

Confidence Scores

Confidence scores are produced by the LLM reasoning step, which evaluates how well each hypothesis is corroborated by the retrieved signals relative to the incident context.
  • Above 0.8 — High confidence. Multiple independent signals from at least two different sources converge on the same root cause. Safe to act on as a strong lead.
  • 0.5 – 0.8 — Moderate confidence. The hypothesis is plausible and supported by some evidence, but the signal may be noisy or incomplete. Worth investigating — escalate to the service owner or pull additional data before taking remediation action.
  • Below 0.5 — Low confidence. Treat as a starting point for manual investigation rather than a conclusion. The agent found a weak correlation but lacks sufficient corroborating evidence.
Confidence scores are relative to one another within a single analysis run — a 0.9 in one run is not directly comparable to a 0.9 from a different run against a different data set. When comparing runs, focus on the rank ordering and the evidence quality rather than the absolute score values.

Evidence Excerpts

Each hypothesis links to a set of supporting_signals — the raw evidence the agent used to form its conclusion. In the Streamlit UI, these appear as a collapsed list beneath each hypothesis card. Click Expand evidence to reveal the individual excerpts. Each excerpt shows:
  • Source — the connector that retrieved it (e.g. elasticsearch, jaeger)
  • Signal typelog, metric, or trace
  • Timestamp — when the signal occurred within your analysis window
  • Content — the raw log line, metric value string, or trace span summary
For log signals, look for recurring error messages or stack traces that coincide with the incident start time. For metric signals, the content field typically shows the metric name, value, and any relevant labels — a spike in http_requests_total{status="500"} is more diagnostic than a generic CPU reading. Trace signals surface span errors and abnormal latencies, helping you pinpoint which downstream service call broke first. When the top hypothesis doesn’t feel right, read the raw excerpts directly rather than relying solely on the LLM summary — the summary can occasionally smooth over contradictory evidence that the raw signals reveal.

When to Widen the Analysis

Low-confidence or missing results usually mean the agent needs more signal. The scenarios below cover the most common causes and how to address each one.
If every hypothesis scores below 0.5, the agent likely didn’t retrieve enough signal to form strong conclusions. Start by widening the time window by 30–60 minutes on each side — precursor events often precede the visible symptom. If that doesn’t help, enable additional data sources that cover different layers of your stack (e.g. add infrastructure metrics if you only had application logs). Check the sources_queried field to confirm all expected connectors responded successfully.
If the highest-ranked hypothesis contradicts what your team already suspects, verify that the relevant data source is enabled and actively returning signals. Open the Expand evidence panel and check whether the supporting signals actually reference the service or component you’d expect. A common cause is a misconfigured connector that returns signals from the wrong index or namespace, inadvertently steering the LLM toward an unrelated root cause.
An empty hypotheses list means the agent found insufficient signal to propose any candidates. This can happen when: the time window is too narrow to capture the incident, all connectors returned empty results (check connector health with agent.health_check()), or the LLM could not map any retrieved signals to a plausible root cause given the provided context. Start by verifying connector health, then try re-running with a broader window and a more specific context string that names the affected service.
After resolving an incident, export the full result to JSON for your post-incident review. From the Python API, call result.model_dump_json(indent=2) and write the output to a file. In the Streamlit UI, use the Download JSON button in the results panel. Storing these files in your incident log gives your team a searchable history of past root causes and the signals that pointed to them.

Ready to extend the agent with a new data source? See Building Custom Data Source Connectors for the full connector API.

Build docs developers (and LLMs) love