Interpreting Root Cause Analysis Results and Evidence

Every analysis the RCA Agent produces follows a consistent structure: a ranked list of hypotheses, each backed by signal excerpts pulled from your data sources, scored by the LLM against the incident context you provided. Understanding how to read that structure — what a confidence score actually means, how to trace a hypothesis back to the raw evidence, and when to trust the output versus widen the investigation — is the key skill for getting value out of the agent during an active incident.

The Result Schema

The AnalysisResult object returned by agent.analyze() (and surfaced in the Streamlit UI) contains the following top-level fields:

Field	Type	Description
`hypotheses`	`List[Hypothesis]`	Root cause candidates, sorted by `confidence` descending.
`analysis_window`	`dict`	The `start_time` and `end_time` passed to the analysis run.
`sources_queried`	`List[str]`	Connector names that successfully returned signals.
`analysis_duration_seconds`	`float`	Wall-clock time from task dispatch to result ready.

Each entry in hypotheses is a Hypothesis object with these fields:

Field	Type	Description
`title`	`str`	A short, human-readable description of the candidate root cause.
`confidence`	`float`	Score from `0.0` to `1.0` reflecting the agent’s certainty.
`evidence_summary`	`str`	A one-to-two sentence LLM-generated summary of the supporting signals.
`supporting_signals`	`List[SignalExcerpt]`	Individual signal excerpts that contributed to this hypothesis. Each has a `source`, `timestamp`, `signal_type`, and `content` field.

Confidence Scores

Confidence scores are produced by the LLM reasoning step, which evaluates how well each hypothesis is corroborated by the retrieved signals relative to the incident context.

Above 0.8 — High confidence. Multiple independent signals from at least two different sources converge on the same root cause. Safe to act on as a strong lead.
0.5 – 0.8 — Moderate confidence. The hypothesis is plausible and supported by some evidence, but the signal may be noisy or incomplete. Worth investigating — escalate to the service owner or pull additional data before taking remediation action.
Below 0.5 — Low confidence. Treat as a starting point for manual investigation rather than a conclusion. The agent found a weak correlation but lacks sufficient corroborating evidence.

Confidence scores are relative to one another within a single analysis run — a 0.9 in one run is not directly comparable to a 0.9 from a different run against a different data set. When comparing runs, focus on the rank ordering and the evidence quality rather than the absolute score values.

Evidence Excerpts

Each hypothesis links to a set of supporting_signals — the raw evidence the agent used to form its conclusion. In the Streamlit UI, these appear as a collapsed list beneath each hypothesis card. Click Expand evidence to reveal the individual excerpts. Each excerpt shows:

Source — the connector that retrieved it (e.g. elasticsearch, jaeger)
Signal type — log, metric, or trace
Timestamp — when the signal occurred within your analysis window
Content — the raw log line, metric value string, or trace span summary

For log signals, look for recurring error messages or stack traces that coincide with the incident start time. For metric signals, the content field typically shows the metric name, value, and any relevant labels — a spike in http_requests_total{status="500"} is more diagnostic than a generic CPU reading. Trace signals surface span errors and abnormal latencies, helping you pinpoint which downstream service call broke first. When the top hypothesis doesn’t feel right, read the raw excerpts directly rather than relying solely on the LLM summary — the summary can occasionally smooth over contradictory evidence that the raw signals reveal.

When to Widen the Analysis

Low-confidence or missing results usually mean the agent needs more signal. The scenarios below cover the most common causes and how to address each one.

Confidence is low on all hypotheses

If every hypothesis scores below 0.5, the agent likely didn’t retrieve enough signal to form strong conclusions. Start by widening the time window by 30–60 minutes on each side — precursor events often precede the visible symptom. If that doesn’t help, enable additional data sources that cover different layers of your stack (e.g. add infrastructure metrics if you only had application logs). Check the sources_queried field to confirm all expected connectors responded successfully.

The top hypothesis doesn't match your intuition

If the highest-ranked hypothesis contradicts what your team already suspects, verify that the relevant data source is enabled and actively returning signals. Open the Expand evidence panel and check whether the supporting signals actually reference the service or component you’d expect. A common cause is a misconfigured connector that returns signals from the wrong index or namespace, inadvertently steering the LLM toward an unrelated root cause.

No hypotheses returned

An empty hypotheses list means the agent found insufficient signal to propose any candidates. This can happen when: the time window is too narrow to capture the incident, all connectors returned empty results (check connector health with agent.health_check()), or the LLM could not map any retrieved signals to a plausible root cause given the provided context. Start by verifying connector health, then try re-running with a broader window and a more specific context string that names the affected service.

After resolving an incident, export the full result to JSON for your post-incident review. From the Python API, call result.model_dump_json(indent=2) and write the output to a file. In the Streamlit UI, use the Download JSON button in the results panel. Storing these files in your incident log gives your team a searchable history of past root causes and the signals that pointed to them.

Ready to extend the agent with a new data source? See Building Custom Data Source Connectors for the full connector API.

Get Started

Configuration

Guides

Reference

Interpreting Root Cause Analysis Results and Evidence

The Result Schema

Confidence Scores

Evidence Excerpts

When to Widen the Analysis

Build docs developers (and LLMs) love

Get Started

Configuration

Guides

Reference

Documentation Index

​The Result Schema

​Confidence Scores

​Evidence Excerpts

​When to Widen the Analysis

Build docs developers (and LLMs) love

The Result Schema

Confidence Scores

Evidence Excerpts

When to Widen the Analysis