Online metrics: real-time confidence from log probs

Online metrics are computed during LLM generation, not after. Rather than evaluating a finished response against retrieved documents, they tap into the generation process itself — specifically the log probabilities that the LLM assigns to each output token. This makes them fast (no extra API calls) and complementary to offline metrics, giving you a real-time view into how certain the model was while producing its response.

The Confidence Score

The Confidence Score is TrustifAI’s single online metric. It quantifies how sure the LLM was about its own output by analyzing the probability distribution across generated tokens. How it works: TrustifAI captures the per-token log probability (logprob) values returned by the LLM alongside the generated text. From these it computes:

Geometric mean probability — exp(mean(logprobs)) gives the normalized per-token probability for the entire sequence. This captures the model’s average certainty across all tokens.
Variance penalty — exp(-variance(logprobs)) penalizes sequences where the model oscillated between high- and low-confidence tokens. Consistent uncertainty (uniform logprobs) is rated less harshly than erratic uncertainty.

The final score combines both:

confidence_score = exp(mean(logprobs)) × exp(-variance(logprobs))

The result is a value in [0, 1] where higher means the model was more consistently certain about its output.

The Confidence Score is only available for LLMs that expose token log probabilities. This includes OpenAI models (e.g., gpt-4o, gpt-4, gpt-4-turbo) when logprobs: true is set in your config. Models served through providers that strip logprob data will return score: 0.0 with label N/A.

Threshold labels

Score	Label	Interpretation
≥ 0.90	`High Confidence`	Model is highly certain about its output
≥ 0.70	`Medium Confidence`	Model shows moderate uncertainty
< 0.70	`Low Confidence`	Model is uncertain — treat output with caution

These thresholds have defaults of 0.90 and 0.70. You can override them by adding them to any metric’s params section in your YAML — for example, alongside the trust_score thresholds:

metrics:
  - type: "trust_score"
    params:
      RELIABLE_TRUST: 0.80
      ACCEPTABLE_TRUST: 0.60
      HIGH_CONFIDENCE: 0.90    # override confidence threshold
      MEDIUM_CONFIDENCE: 0.70

Using `generate()` to get Confidence Scores

The generate() method wraps an LLM call and automatically computes the Confidence Score from the returned logprobs.

Initialize the engine

The engine reads your LLM config, including the model name and logprob settings, from the YAML file.

from trustifai import Trustifai

trust_engine = Trustifai(config_path="config_file.yaml")

Call generate()

Pass your prompt (and an optional system prompt). TrustifAI automatically requests logprobs from the LLM.

result = trust_engine.generate(
    prompt="What is the capital of France?",
    system_prompt="You are a helpful assistant."
)

Read the response and confidence metadata

The return value contains both the generated text and the full confidence breakdown.

print(f"Response:    {result['response']}")
print(f"Confidence:  {result['metadata']['confidence_score']}")
print(f"Label:       {result['metadata']['confidence_label']}")

Interpreting the result dict

generate() returns a dictionary with two top-level keys:

{
    "response": "The capital of France is Paris.",
    "metadata": {
        "confidence_score": 0.94,
        "confidence_label": "High Confidence",
        "confidence_details": {
            "explanation": "Model is highly confident in its response based on logprobs.",
            "avg_logprob": -0.06,
            "variance": 0.03,
            "token_count": 8
        },
        "logprobs_available": true,
        "execution_metadata": {
            "total_cost_usd": 0.000031
        }
    }
}

Field	Description
`response`	The generated text
`metadata.confidence_score`	Confidence Score in `[0, 1]`
`metadata.confidence_label`	`High Confidence`, `Medium Confidence`, or `Low Confidence`
`metadata.confidence_details.avg_logprob`	Mean log probability across tokens
`metadata.confidence_details.variance`	Variance of log probabilities (higher = more erratic)
`metadata.confidence_details.token_count`	Number of tokens scored
`metadata.logprobs_available`	`false` if the LLM did not return logprobs
`metadata.execution_metadata.total_cost_usd`	API cost for this call

Enabling logprobs in config

Make sure your LLM config requests logprobs. TrustifAI sets this automatically when generate() is called, but having it in the config ensures consistency:

llm:
  type: "openai"
  params:
    model_name: "gpt-4o"
    api_type: "chat_completion"
  kwargs:
    logprobs: true
    max_tokens: 2048
    temperature: 0.01

The Confidence Score is only as reliable as the LLM’s calibration. A well-calibrated model assigns high log probabilities to tokens it is genuinely likely to get right, and lower log probabilities when it is uncertain. Many models — especially smaller, fine-tuned, or instruction-tuned ones — are poorly calibrated and may express high confidence even when hallucinating. Treat the Confidence Score as a useful signal, not a guarantee, and always combine it with offline metrics for full trustworthiness evaluation.

Combining online and offline metrics

The Confidence Score operates independently of the offline metrics. A typical production workflow combines both:

from trustifai import Trustifai, MetricContext
from langchain_core.documents import Document

trust_engine = Trustifai(config_path="config_file.yaml")

# Step 1: Generate with real-time confidence
generation = trust_engine.generate(
    prompt="Summarize the key findings from the documents.",
    system_prompt="You are a research assistant."
)

# Step 2: Evaluate the generated answer offline
context = MetricContext(
    query="Summarize the key findings from the documents.",
    answer=generation["response"],
    documents=[Document(page_content="...", metadata={"source": "report.pdf"})]
)
trust_result = trust_engine.get_trust_score(context)

print(f"Online confidence:  {generation['metadata']['confidence_score']}")
print(f"Offline Trust Score: {trust_result['score']} ({trust_result['label']})")

This two-step pattern gives you the fullest picture: online confidence tells you how certain the model was during generation, and the offline Trust Score tells you how well the finished response holds up against your retrieved documents.

Get Started

Core Concepts

Guides

Online metrics: real-time confidence from log probs

The Confidence Score

Threshold labels

Using `generate()` to get Confidence Scores

Interpreting the result dict

Enabling logprobs in config

Combining online and offline metrics

Build docs developers (and LLMs) love

Get Started

Core Concepts

Guides

Documentation Index

​The Confidence Score

​Threshold labels

​Using generate() to get Confidence Scores

​Interpreting the result dict

​Enabling logprobs in config

​Combining online and offline metrics

Build docs developers (and LLMs) love

The Confidence Score

Threshold labels

Using `generate()` to get Confidence Scores

Interpreting the result dict

Enabling logprobs in config

Combining online and offline metrics