Use this file to discover all available pages before exploring further.
TrustifAI’s metric system is a plugin registry. Every built-in metric — evidence coverage, epistemic consistency, semantic drift, and source diversity — is registered against a string key and instantiated at evaluation time. You can add your own metrics by following the same three-step pattern: inherit, register, configure.
Your class must inherit from BaseMetric and implement calculate(context: MetricContext) -> MetricResult. The context argument carries the query, answer, documents, and pre-computed embeddings for a single evaluation.
from trustifai.metrics import BaseMetricfrom trustifai.structures import MetricContext, MetricResultclass MyCustomMetric(BaseMetric): def calculate(self, context: MetricContext) -> MetricResult: # Your evaluation logic here score = 0.9 # float in [0.0, 1.0] return MetricResult( score=score, label="High", details={"note": "example"}, )
2
Register the metric class
Call Trustifai.register_metric with a unique string key. This key must match the type field you will add to config_file.yaml.
from trustifai import TrustifaiTrustifai.register_metric("my_custom_metric", MyCustomMetric)
Registration is a class-level operation — call it once, before you instantiate any Trustifai engine.
3
Add the metric to config_file.yaml
Add entries to both the metrics list (to set thresholds and mark it enabled) and the score_weights list (to assign its contribution to the final Trust Score). Weights across all enabled metrics must sum to at most 1.0.
The following example detects temporal hallucinations — cases where the answer references dates or times that are not present in the retrieved documents. It is the canonical custom metric example from the TrustifAI README.
from trustifai.metrics import BaseMetricfrom trustifai.structures import MetricContext, MetricResultclass TemporalConsistencyMetric(BaseMetric): """Detects temporal hallucinations — when the answer references dates or times that don't match the retrieved documents.""" def calculate(self, context: MetricContext) -> MetricResult: # Extract dates from answer and documents answer_dates = self._extract_dates(context.answer) doc_dates = set() for doc in context.documents: doc_dates.update(self._extract_dates(doc.page_content)) # No temporal claims in the answer — award full score if not answer_dates: return MetricResult( score=1.0, label="No Temporal Claims", details={"answer_dates": [], "doc_dates": list(doc_dates)}, ) supported_dates = [d for d in answer_dates if d in doc_dates] unsupported_dates = [d for d in answer_dates if d not in doc_dates] score = len(supported_dates) / len(answer_dates) if answer_dates else 1.0 # Read custom thresholds from config (with sensible defaults) high_threshold = getattr(self.config.thresholds, "TEMPORALLY_CONSISTENT", 0.8) low_threshold = getattr(self.config.thresholds, "PARTIAL_TEMPORAL_ISSUES", 0.5) if score >= high_threshold: label = "Temporally Consistent" elif score >= low_threshold: label = "Partial Temporal Issues" else: label = "Temporal Hallucination Detected" return MetricResult( score=score, label=label, details={ "answer_dates": answer_dates, "supported_dates": supported_dates, "unsupported_dates": unsupported_dates, "doc_dates": list(doc_dates), }, ) def _extract_dates(self, text: str) -> list[str]: """Stub — replace with a real date extraction implementation.""" import re pattern = r"\b\d{4}\b" # simple 4-digit year extraction return re.findall(pattern, text)
from trustifai import Trustifai, MetricContext# Register before instantiating any engineTrustifai.register_metric("temporal_consistency", TemporalConsistencyMetric)# The engine picks up the new metric from config_file.yamltrust_engine = Trustifai("config_file.yaml")context = MetricContext( query="When was the Eiffel Tower built?", answer="The Eiffel Tower was built in 1889.", documents=["The Eiffel Tower was constructed from 1887 to 1889."],)result = trust_engine.get_trust_score(context)print(result)
Every calculate() implementation must return a MetricResult. The to_dict() method serializes it into the format consumed by the trust score aggregator.
When you inherit from BaseMetric, your class automatically gets access to:
Attribute
Type
Description
self.service
ExternalService
Provides llm_call, embedding_call, embedding_call_batch, and reranker_call
self.config
Config
Full parsed configuration, including config.thresholds and config.weights
self.cosine_calc
CosineSimCalculator
Utility for computing cosine similarity between embedding vectors
self.threshold_evaluator
ThresholdEvaluator
Classifies a float score against the configured threshold pairs
Use self.service.llm_call(prompt, system_prompt) if your metric needs an LLM inference step, and self.service.embedding_call(text) for additional embeddings beyond what the engine pre-computes.
When adding a new metric, make sure the total of all score_weights still sums to at most 1.0. TrustifAI raises a ValueError at startup if the sum exceeds this limit. Reduce existing weights proportionally to accommodate the new metric’s weight.
Configuration
Learn the full config_file.yaml schema including metric thresholds and weights.
BaseMetric API
Full API reference for BaseMetric, MetricResult, and MetricContext.