Every analysis the RCA Agent produces follows a consistent structure: a ranked list of hypotheses, each backed by signal excerpts pulled from your data sources, scored by the LLM against the incident context you provided. Understanding how to read that structure — what a confidence score actually means, how to trace a hypothesis back to the raw evidence, and when to trust the output versus widen the investigation — is the key skill for getting value out of the agent during an active incident.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/vrashmanyu605-eng/devops-root-cause-analysis-agent/llms.txt
Use this file to discover all available pages before exploring further.
The Result Schema
TheAnalysisResult object returned by agent.analyze() (and surfaced in the Streamlit UI) contains the following top-level fields:
| Field | Type | Description |
|---|---|---|
hypotheses | List[Hypothesis] | Root cause candidates, sorted by confidence descending. |
analysis_window | dict | The start_time and end_time passed to the analysis run. |
sources_queried | List[str] | Connector names that successfully returned signals. |
analysis_duration_seconds | float | Wall-clock time from task dispatch to result ready. |
hypotheses is a Hypothesis object with these fields:
| Field | Type | Description |
|---|---|---|
title | str | A short, human-readable description of the candidate root cause. |
confidence | float | Score from 0.0 to 1.0 reflecting the agent’s certainty. |
evidence_summary | str | A one-to-two sentence LLM-generated summary of the supporting signals. |
supporting_signals | List[SignalExcerpt] | Individual signal excerpts that contributed to this hypothesis. Each has a source, timestamp, signal_type, and content field. |
Confidence Scores
Confidence scores are produced by the LLM reasoning step, which evaluates how well each hypothesis is corroborated by the retrieved signals relative to the incident context.- Above 0.8 — High confidence. Multiple independent signals from at least two different sources converge on the same root cause. Safe to act on as a strong lead.
- 0.5 – 0.8 — Moderate confidence. The hypothesis is plausible and supported by some evidence, but the signal may be noisy or incomplete. Worth investigating — escalate to the service owner or pull additional data before taking remediation action.
- Below 0.5 — Low confidence. Treat as a starting point for manual investigation rather than a conclusion. The agent found a weak correlation but lacks sufficient corroborating evidence.
Confidence scores are relative to one another within a single analysis run — a
0.9 in one run is not directly comparable to a 0.9 from a different run against a different data set. When comparing runs, focus on the rank ordering and the evidence quality rather than the absolute score values.Evidence Excerpts
Each hypothesis links to a set ofsupporting_signals — the raw evidence the agent used to form its conclusion. In the Streamlit UI, these appear as a collapsed list beneath each hypothesis card. Click Expand evidence to reveal the individual excerpts.
Each excerpt shows:
- Source — the connector that retrieved it (e.g.
elasticsearch,jaeger) - Signal type —
log,metric, ortrace - Timestamp — when the signal occurred within your analysis window
- Content — the raw log line, metric value string, or trace span summary
http_requests_total{status="500"} is more diagnostic than a generic CPU reading. Trace signals surface span errors and abnormal latencies, helping you pinpoint which downstream service call broke first.
When the top hypothesis doesn’t feel right, read the raw excerpts directly rather than relying solely on the LLM summary — the summary can occasionally smooth over contradictory evidence that the raw signals reveal.
When to Widen the Analysis
Low-confidence or missing results usually mean the agent needs more signal. The scenarios below cover the most common causes and how to address each one.Confidence is low on all hypotheses
Confidence is low on all hypotheses
If every hypothesis scores below
0.5, the agent likely didn’t retrieve enough signal to form strong conclusions. Start by widening the time window by 30–60 minutes on each side — precursor events often precede the visible symptom. If that doesn’t help, enable additional data sources that cover different layers of your stack (e.g. add infrastructure metrics if you only had application logs). Check the sources_queried field to confirm all expected connectors responded successfully.The top hypothesis doesn't match your intuition
The top hypothesis doesn't match your intuition
If the highest-ranked hypothesis contradicts what your team already suspects, verify that the relevant data source is enabled and actively returning signals. Open the Expand evidence panel and check whether the supporting signals actually reference the service or component you’d expect. A common cause is a misconfigured connector that returns signals from the wrong index or namespace, inadvertently steering the LLM toward an unrelated root cause.
No hypotheses returned
No hypotheses returned
An empty
hypotheses list means the agent found insufficient signal to propose any candidates. This can happen when: the time window is too narrow to capture the incident, all connectors returned empty results (check connector health with agent.health_check()), or the LLM could not map any retrieved signals to a plausible root cause given the provided context. Start by verifying connector health, then try re-running with a broader window and a more specific context string that names the affected service.Ready to extend the agent with a new data source? See Building Custom Data Source Connectors for the full connector API.