MetricResult — standardized metric output

Every metric class in TrustifAI returns a MetricResult. The structure is identical across all four offline metrics and the confidence metric, so downstream code can process results uniformly regardless of which metric produced them. The .to_dict() method serializes the result to a plain dict suitable for JSON responses, dataframe rows, or logging.

Fields

score

float

required

Normalized trust score in the range [0.0, 1.0]. Higher values indicate stronger trustworthiness. The meaning of the scale is metric-specific — see the offline metrics reference for per-metric interpretation.

label

str

required

Human-readable classification of the score, produced by ThresholdEvaluator using thresholds from your config. Common label values include:

"Strong Grounding" / "Partial Grounding" / "Likely Hallucinated Answer" — evidence coverage and semantic drift
"Stable Consistency" / "Fragile Consistency" / "Unreliable" — epistemic consistency
"High Trust" / "Moderate Trust" / "Low Trust" — source diversity
"High Confidence" / "Medium Confidence" / "Low Confidence" — confidence

details

dict

required

Metric-specific breakdown of how the score was derived. Keys vary by metric. All details dicts include an "explanation" string when grading succeeded.

Show properties

explanation

str

One-sentence summary of why the score is what it is, generated by ThresholdEvaluator.

total_sentences

int

(EvidenceCoverageMetric only) Number of sentences in the answer that were evaluated.

supported_sentences

int

(EvidenceCoverageMetric only) Number of sentences judged to be supported by the retrieved documents.

unsupported_sentences

list

(EvidenceCoverageMetric only) The exact answer sentences that lacked document support.

best_matching_sentence

str

(SemanticDriftMetric only) The document sentence with the highest cosine similarity to the answer.

generated_responses

list

(EpistemicConsistencyMetric only) The k stochastic re-generations used to compute the consistency score.

unique_sources

int

(SourceDiversityMetric only) Count of distinct source identifiers found across the documents.

execution_metadata

dict | None

default:"None"

Present when the metric incurred external API costs. Omitted from .to_dict() output when None.

Show properties

total_cost_usd

float

Estimated cost in US dollars for LLM and embedding calls made during this metric calculation.

`.to_dict()` method

Serializes the result to a plain Python dict. The score is rounded to 2 decimal places. execution_metadata is included only if it is not None.

result.to_dict() -> dict

Example output

The following shows a MetricResult returned by EvidenceCoverageMetric and its .to_dict() representation:

result = MetricResult(
    score=0.857142857,
    label="Strong Grounding",
    details={
        "explanation": "Fully supported by source documents.",
        "total_sentences": 7,
        "supported_sentences": 6,
        "unsupported_sentences": ["The tower was built overnight."],
        "failed_checks": 0,
        "failed_reason": None,
    },
    execution_metadata={"total_cost_usd": 0.00031},
)

result.to_dict()
# {
#     "score": 0.86,
#     "label": "Strong Grounding",
#     "details": {
#         "explanation": "Fully supported by source documents.",
#         "total_sentences": 7,
#         "supported_sentences": 6,
#         "unsupported_sentences": ["The tower was built overnight."],
#         "failed_checks": 0,
#         "failed_reason": None,
#     },
#     "execution_metadata": {"total_cost_usd": 0.00031},
# }

Use .to_dict() when building JSON API responses or writing results to a database. The rounded score and flat structure make it safe to serialize with json.dumps() without further processing.

Core API

Data Structures

Metrics

MetricResult — standardized metric output

Fields

`.to_dict()` method

Example output

Build docs developers (and LLMs) love

Core API

Data Structures

Metrics

Documentation Index

​Fields

​.to_dict() method

​Example output

Build docs developers (and LLMs) love

Fields

`.to_dict()` method

Example output