Documentation Index
Fetch the complete documentation index at: https://mintlify.com/TrustifAI/trustifai/llms.txt
Use this file to discover all available pages before exploring further.
BaseMetric is the abstract foundation for every metric in TrustifAI, including all four built-in offline metrics. You can subclass it to create custom trust signals that receive the same service dependencies and integrate transparently with get_trust_score and the async batch pipeline. The only method you must implement is calculate.
Constructor
from trustifai.metrics.base import BaseMetric
class MyMetric(BaseMetric):
def __init__(self, service, config):
super().__init__(service, config)
# add any metric-specific setup here
The shared service layer for LLM calls, embedding calls, and document text extraction. TrustifAI injects this automatically — you do not construct it directly.
The parsed configuration object loaded from your YAML config file. Exposes threshold values, model names, and pipeline settings. TrustifAI injects this automatically.
Inherited attributes
All subclasses have access to these attributes after calling super().__init__():
| Attribute | Type | Description |
|---|
self.service | ExternalService | Makes LLM calls (llm_call, llm_call_async), embedding calls (embedding_call_batch), and document text extraction (extract_document). |
self.config | Config | Exposes config.thresholds, config.k_samples, and all other settings from your YAML file. |
self.cosine_calc | CosineSimCalculator | Computes cosine similarity between two numpy embedding vectors. Call self.cosine_calc.calculate(emb1, emb2) to get a float in [0, 1]. |
self.threshold_evaluator | ThresholdEvaluator | Maps a raw score to a (label, explanation) tuple using configured thresholds. Methods: evaluate_grounding, evaluate_drift, evaluate_consistency, evaluate_diversity, evaluate_confidence. |
Abstract methods
calculate
@abstractmethod
def calculate(self, context: MetricContext) -> MetricResult:
...
The synchronous evaluation entry point. You must implement this method. Receives a fully populated MetricContext (with embeddings already computed) and must return a MetricResult.
a_calculate (optional override)
async def a_calculate(self, context: MetricContext) -> MetricResult:
return self.calculate(context)
The async variant. The default implementation simply delegates to calculate via the synchronous path. Override this method if your metric can make non-blocking LLM or embedding calls natively — for example, using await self.service.llm_call_async(...).
Custom metric example
The following example implements a query-answer relevance metric using cosine similarity between the query and answer embeddings:
from trustifai.metrics.base import BaseMetric
from trustifai.structures import MetricContext, MetricResult
class QueryAnswerRelevanceMetric(BaseMetric):
"""Measures cosine similarity between query and answer embeddings."""
def calculate(self, context: MetricContext) -> MetricResult:
import numpy as np
if context.query_embeddings is None or context.answer_embeddings is None:
return MetricResult(
score=0.0,
label="Missing Embeddings",
details={"explanation": "Query or answer embeddings not available."},
)
score = self.cosine_calc.calculate(
np.atleast_2d(context.query_embeddings),
np.atleast_2d(context.answer_embeddings),
)
# Reuse a built-in evaluator or write your own label logic
if score >= 0.8:
label, explanation = "High Relevance", "Answer closely addresses the query."
elif score >= 0.5:
label, explanation = "Moderate Relevance", "Answer partially addresses the query."
else:
label, explanation = "Low Relevance", "Answer diverges from the query topic."
return MetricResult(
score=round(score, 4),
label=label,
details={"explanation": explanation},
)
Registering the custom metric
Use Trustifai.register_metric to add your class to the global metric registry, then configure its weight in config_file.yaml. Call register_metric before instantiating any engine.
from trustifai import Trustifai
# Register the metric globally (class-level operation)
Trustifai.register_metric("query_answer_relevance", QueryAnswerRelevanceMetric)
# Now instantiate the engine and include the metric in config_file.yaml
engine = Trustifai("config_file.yaml")
result = engine.get_trust_score(context)
# result now includes "query_answer_relevance" in its details dict
The YAML configuration for your custom metric must appear in both metrics (for thresholds) and score_weights (for its contribution weight):
metrics:
- type: "query_answer_relevance"
enabled: true
params: {}
score_weights:
- type: "query_answer_relevance"
params:
weight: 0.10
# reduce other weights so total ≤ 1.0
Use self.threshold_evaluator with one of its built-in evaluate_* methods whenever the score range aligns with an existing metric category. This ensures your custom metric respects the same configurable thresholds as the built-in metrics.
If your custom metric makes blocking I/O calls (LLM or embedding APIs), the default a_calculate will block the async event loop when used with evaluate_dataset. Override a_calculate with a native async implementation to maintain concurrency.