TrustifAI evaluates LLM and RAG responses across multiple independent dimensions and combines them into a single Trust Score — a number between 0 and 1 that answers the question: can you rely on this response? Rather than treating quality as a single axis, TrustifAI treats it as a multi-dimensional signal, which makes it far harder to game and far more informative when something goes wrong.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/TrustifAI/trustifai/llms.txt
Use this file to discover all available pages before exploring further.
Multi-dimensional evaluation model
The Trust Score is built from up to five component metrics, computed in parallel:- 4 offline metrics — calculated after a response has already been generated, against the retrieved documents.
- 1 online metric — calculated in real time during generation, using token log probabilities.
[0, 1] range. These component scores are then combined into the final Trust Score using a configurable weighted sum.
Weighted aggregation formula
The aggregation formula is a simple weighted linear combination:config_file.yaml and automatically normalized so they always sum to 1.0. This means you can freely adjust relative priorities without worrying about the math.
Default weights
| Metric | Key | Default Weight |
|---|---|---|
| Evidence Coverage | evidence_coverage | 0.40 |
| Semantic Drift | semantic_drift | 0.30 |
| Epistemic Consistency | consistency | 0.20 |
| Source Diversity | source_diversity | 0.10 |
Configuring weights in YAML
weight: 0 (or omitted entirely) is automatically excluded from both the computation and the reasoning graph, so you can disable metrics without deleting their config block.
Decision labels
After computing the final score, TrustifAI maps it to one of three human-readable decision labels using configurable thresholds:| Score range | Label | Meaning |
|---|---|---|
| ≥ 0.80 | RELIABLE | Response is well-grounded and consistent. Safe to use. |
| ≥ 0.60 | ACCEPTABLE (WITH CAUTION) | Response has some weaknesses. Review before acting on it. |
| < 0.60 | UNRELIABLE | Response has significant trust issues. Do not use without human review. |
metrics[type=trust_score] in your config:
Calling get_trust_score()
Build a MetricContext
Wrap your query, answer, and retrieved documents in a
MetricContext object. TrustifAI accepts LangChain Document objects, LlamaIndex nodes, plain strings, dicts, or lists — it normalizes them automatically.Initialize the engine
Point
Trustifai at your config file. The engine reads weights, thresholds, and LLM/embedding settings from YAML.Interpreting the result dict
get_trust_score() returns a dictionary with four keys:
| Field | Type | Description |
|---|---|---|
score | float | Final Trust Score, rounded to 2 decimal places |
label | str | One of RELIABLE, ACCEPTABLE (WITH CAUTION), or UNRELIABLE |
details | dict | Per-metric score, label, and diagnostic details |
execution_metadata.total_cost_usd | float | Sum of all LLM and embedding API costs incurred |
If no documents are provided,
get_trust_score() immediately returns a score of 0.0 with label Unreliable without making any API calls. Always pass at least one retrieved document.Async usage
For high-concurrency server deployments, use the native async variant to avoid blocking the event loop:asyncio.gather and uses an async embedding pipeline, making it significantly faster under load.