TrustifAI: Quantify and Explain AI Trustworthiness

TrustifAI gives you a principled, multi-dimensional Trust Score for any LLM or RAG response — going far beyond a single correctness check. Instead of a black-box number, TrustifAI breaks trustworthiness into four orthogonal signals (evidence coverage, epistemic consistency, semantic drift, and source diversity), combines them with configurable weights, and renders an interactive reasoning graph that shows exactly why a response was deemed reliable or unreliable.

Quickstart

Score your first RAG response in under five minutes with a working code example.

Installation

Install TrustifAI via pip and configure your environment variables.

Core Concepts

Understand how Trust Score is computed from four independent trust signals.

API Reference

Explore the full public API: Trustifai, AsyncTrustifai, MetricContext, and more.

What TrustifAI evaluates

TrustifAI computes trust across two evaluation modes: Offline metrics — for already-generated RAG responses:

Metric	What it detects
Evidence Coverage	Hallucinations — verifies every claim against retrieved documents
Epistemic Consistency	Model inconsistency — measures semantic stability across stochastic re-generations
Semantic Drift	Topic drift — ensures the answer stays within the document’s semantic envelope
Source Diversity	Over-reliance on a single source vs. synthesis across multiple sources

Online metrics — for real-time generation:

Metric	What it detects
Confidence Score	Real-time certainty via token log probability analysis

Why TrustifAI

Most evaluation frameworks return a single scalar. TrustifAI returns a structured, explainable result:

Weighted aggregation — tune metric importance to your use case (e.g., weight evidence coverage higher for medical Q&A)
Reasoning graph — a DAG visualization showing metric scores, aggregation, and the final decision
Custom metrics — plug in your own evaluation logic without touching core library code
Async-first — native async pipeline with concurrency control and rate limiting for large-scale batch evaluation
LiteLLM-backed — works with OpenAI, Anthropic, Gemini, Azure, Mistral, Ollama, and more

Offline metrics explained

Deep dive into Evidence Coverage, Epistemic Consistency, Semantic Drift, and Source Diversity.

Online metrics explained

Learn how the Confidence Score uses token log probabilities for real-time trust signals.

Reasoning graphs

Understand and customize the interactive DAG visualization of your evaluation logic.

Custom metrics

Extend TrustifAI with your own evaluation logic using the BaseMetric interface.

Get started with TrustifAI

Auto-generate your docs

What TrustifAI evaluates
Why TrustifAI

Build docs developers (and LLMs) love

Get started for free Talk to us

Get Started

Core Concepts

Guides

TrustifAI: Quantify and Explain AI Trustworthiness

Quickstart

Installation

Core Concepts

API Reference

What TrustifAI evaluates

Why TrustifAI

Offline metrics explained

Online metrics explained

Reasoning graphs

Custom metrics

Build docs developers (and LLMs) love

Get Started

Core Concepts

Guides

Documentation Index

Quickstart

Installation

Core Concepts

API Reference

​What TrustifAI evaluates

​Why TrustifAI

Offline metrics explained

Online metrics explained

Reasoning graphs

Custom metrics

Build docs developers (and LLMs) love

What TrustifAI evaluates

Why TrustifAI