Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/TrustifAI/trustifai/llms.txt

Use this file to discover all available pages before exploring further.

BaseMetric is the abstract foundation for every metric in TrustifAI, including all four built-in offline metrics. You can subclass it to create custom trust signals that receive the same service dependencies and integrate transparently with get_trust_score and the async batch pipeline. The only method you must implement is calculate.

Constructor

from trustifai.metrics.base import BaseMetric

class MyMetric(BaseMetric):
    def __init__(self, service, config):
        super().__init__(service, config)
        # add any metric-specific setup here
service
ExternalService
required
The shared service layer for LLM calls, embedding calls, and document text extraction. TrustifAI injects this automatically — you do not construct it directly.
config
Config
required
The parsed configuration object loaded from your YAML config file. Exposes threshold values, model names, and pipeline settings. TrustifAI injects this automatically.

Inherited attributes

All subclasses have access to these attributes after calling super().__init__():
AttributeTypeDescription
self.serviceExternalServiceMakes LLM calls (llm_call, llm_call_async), embedding calls (embedding_call_batch), and document text extraction (extract_document).
self.configConfigExposes config.thresholds, config.k_samples, and all other settings from your YAML file.
self.cosine_calcCosineSimCalculatorComputes cosine similarity between two numpy embedding vectors. Call self.cosine_calc.calculate(emb1, emb2) to get a float in [0, 1].
self.threshold_evaluatorThresholdEvaluatorMaps a raw score to a (label, explanation) tuple using configured thresholds. Methods: evaluate_grounding, evaluate_drift, evaluate_consistency, evaluate_diversity, evaluate_confidence.

Abstract methods

calculate

@abstractmethod
def calculate(self, context: MetricContext) -> MetricResult:
    ...
The synchronous evaluation entry point. You must implement this method. Receives a fully populated MetricContext (with embeddings already computed) and must return a MetricResult.

a_calculate (optional override)

async def a_calculate(self, context: MetricContext) -> MetricResult:
    return self.calculate(context)
The async variant. The default implementation simply delegates to calculate via the synchronous path. Override this method if your metric can make non-blocking LLM or embedding calls natively — for example, using await self.service.llm_call_async(...).

Custom metric example

The following example implements a query-answer relevance metric using cosine similarity between the query and answer embeddings:
from trustifai.metrics.base import BaseMetric
from trustifai.structures import MetricContext, MetricResult


class QueryAnswerRelevanceMetric(BaseMetric):
    """Measures cosine similarity between query and answer embeddings."""

    def calculate(self, context: MetricContext) -> MetricResult:
        import numpy as np

        if context.query_embeddings is None or context.answer_embeddings is None:
            return MetricResult(
                score=0.0,
                label="Missing Embeddings",
                details={"explanation": "Query or answer embeddings not available."},
            )

        score = self.cosine_calc.calculate(
            np.atleast_2d(context.query_embeddings),
            np.atleast_2d(context.answer_embeddings),
        )

        # Reuse a built-in evaluator or write your own label logic
        if score >= 0.8:
            label, explanation = "High Relevance", "Answer closely addresses the query."
        elif score >= 0.5:
            label, explanation = "Moderate Relevance", "Answer partially addresses the query."
        else:
            label, explanation = "Low Relevance", "Answer diverges from the query topic."

        return MetricResult(
            score=round(score, 4),
            label=label,
            details={"explanation": explanation},
        )

Registering the custom metric

Use Trustifai.register_metric to add your class to the global metric registry, then configure its weight in config_file.yaml. Call register_metric before instantiating any engine.
from trustifai import Trustifai

# Register the metric globally (class-level operation)
Trustifai.register_metric("query_answer_relevance", QueryAnswerRelevanceMetric)

# Now instantiate the engine and include the metric in config_file.yaml
engine = Trustifai("config_file.yaml")

result = engine.get_trust_score(context)
# result now includes "query_answer_relevance" in its details dict
The YAML configuration for your custom metric must appear in both metrics (for thresholds) and score_weights (for its contribution weight):
metrics:
  - type: "query_answer_relevance"
    enabled: true
    params: {}

score_weights:
  - type: "query_answer_relevance"
    params:
      weight: 0.10
  # reduce other weights so total ≤ 1.0
Use self.threshold_evaluator with one of its built-in evaluate_* methods whenever the score range aligns with an existing metric category. This ensures your custom metric respects the same configurable thresholds as the built-in metrics.
If your custom metric makes blocking I/O calls (LLM or embedding APIs), the default a_calculate will block the async event loop when used with evaluate_dataset. Override a_calculate with a native async implementation to maintain concurrency.

Build docs developers (and LLMs) love