ConfidenceMetric — real-time log probability scoring

ConfidenceMetric is TrustifAI’s only online metric — it runs during LLM generation rather than post-hoc. It converts the per-token log probability stream emitted by the LLM into a single confidence score that reflects both the model’s average certainty and the consistency of that certainty across the generated sequence. Unlike offline metrics, you do not call ConfidenceMetric.calculate directly: it is invoked automatically inside Trustifai.generate().

ConfidenceMetric requires a language model that exposes token-level log probabilities. OpenAI-compatible APIs (including Gemini via response_logprobs=True) and most self-hosted models support this. If your LLM does not return logprobs, generate() returns score: 0.0 with label: "N/A".

Static method

class ConfidenceMetric:
    @staticmethod
    def calculate(
        logprobs: List[float],
        evaluator: ThresholdEvaluator,
    ) -> Dict[str, Any]: ...

logprobs

List[float]

required

List of per-token log probability values (negative floats) as returned by the LLM API. The list corresponds to the generated response tokens in order. An empty list returns score: 0.0, label: "N/A".

evaluator

ThresholdEvaluator

required

A ThresholdEvaluator instance used to map the computed score to a (label, explanation) pair. TrustifAI passes this automatically when calling from generate().

Score computation

The score is derived in three steps:

Average log probability — avg_logprob = mean(logprobs), a length-normalized proxy for sequence probability.
Variance penalty — penalty = exp(−var(logprobs)), which reduces the score when the model was inconsistently uncertain across tokens.
Final score — score = exp(avg_logprob) × penalty, clipped to [0.0, 1.0] by the natural range of the formula.

Return value

calculate returns a plain dict (not a MetricResult) for compatibility with the generate() response envelope:

{
    "score": float,      # 0.0 – 1.0
    "label": str,        # human-readable confidence label
    "details": {
        "explanation": str,
        "avg_logprob": float,   # rounded to 2 d.p.
        "variance": float,      # rounded to 2 d.p.
        "token_count": int,
    }
}

Labels (configurable via thresholds in your config):

"High Confidence" — model is highly certain
"Medium Confidence" — moderate uncertainty
"Low Confidence" — model is uncertain about its output

Usage via `Trustifai.generate()`

You access ConfidenceMetric through the generate() method, which handles logprob collection and metric calculation automatically:

from trustifai import Trustifai

engine = Trustifai("config_file.yaml")

output = engine.generate(
    prompt="What is the boiling point of water at sea level?",
    system_prompt="You are a helpful assistant."
)

print(output["metadata"]["confidence_score"])
# 0.81

print(output["metadata"]["confidence_label"])
# "High Confidence"

The full generate() return value has two top-level keys — "response" (the generated text) and "metadata" (confidence and cost info):

{
    "response": "Water boils at 100 °C (212 °F) at standard sea-level pressure.",
    "metadata": {
        "confidence_score": 0.81,
        "confidence_label": "High Confidence",
        "confidence_details": {
            "explanation": "Model is highly confident in its response based on logprobs.",
            "avg_logprob": -0.14,
            "variance": 0.07,
            "token_count": 23,
        },
        "logprobs_available": True,
        "execution_metadata": {
            "total_cost_usd": 0.000031
        }
    }
}

Do not call ConfidenceMetric.calculate directly in production code. The logprobs list must be in the exact format returned by the LLM integration layer. Calling generate() ensures the logprobs are captured and formatted correctly before being passed to the metric.

Core API

Data Structures

Metrics

ConfidenceMetric — real-time log probability scoring

Static method

Score computation

Return value

Usage via `Trustifai.generate()`

Build docs developers (and LLMs) love

Core API

Data Structures

Metrics

Documentation Index

​Static method

​Score computation

​Return value

​Usage via Trustifai.generate()

Build docs developers (and LLMs) love

Static method

Score computation

Return value

Usage via `Trustifai.generate()`