Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/TrustifAI/trustifai/llms.txt

Use this file to discover all available pages before exploring further.

TrustifAI can log metric scores to MLflow as part of every evaluation. Offline metric scores (evidence coverage, epistemic consistency, semantic drift, source diversity) are logged under the offline/ prefix, and the online confidence score from Trustifai.generate() is logged under online/. Tracing is optional and disabled by default — your evaluation pipeline works identically with or without it.

Installation

MLflow tracing requires the trace optional extra:
pip install trustifai[trace]
This installs mlflow alongside the core TrustifAI package. Without it, the tracing code path is silently skipped even if enabled: true is set in your config.

Enabling tracing in config_file.yaml

Set enabled: true in the tracing section and point tracking_uri at your MLflow server:
tracing:
  type: "default"
  params:
    enabled: true
    tracking_uri: "http://localhost:5000"   # or an MLflow-hosted URI
    experiment_name: "trustifai_experiment"
If tracking_uri is null or omitted, MLflow defaults to a local ./mlruns directory in your working directory — useful for local development without a running server.
FieldDescription
enabledtrue to activate MLflow logging, false (default) to disable
tracking_uriURI of your MLflow tracking server. null uses the local ./mlruns default
experiment_nameName of the MLflow experiment that runs are grouped under

What gets logged

TrustifAI uses mlflow.set_tag to log metric data as run tags on the active MLflow run.

Offline metrics (get_trust_score)

When you call get_trust_score, TrustifAI logs each active metric score and the final decision as tags:
MLflow tagValue
offline/<metric_name>Float score for each active metric (e.g. offline/evidence_coverage)
decisionFinal label: RELIABLE, ACCEPTABLE (WITH CAUTION), or UNRELIABLE
trust_score/finalAggregated weighted trust score

Online metrics (generate)

When you call Trustifai.generate(), the confidence score is logged as a tag:
MLflow tagValue
online/confidence_scoreFloat confidence score derived from log probabilities

Using tracing inside an MLflow run

TrustifAI logs to the currently active MLflow run. Start a run using the standard mlflow.start_run() context manager before calling get_trust_score or generate:
import mlflow
from trustifai import Trustifai, MetricContext

trust_engine = Trustifai("config_file.yaml")  # tracing.enabled: true

context = MetricContext(
    query="What is the capital of France?",
    answer="The capital of France is Paris.",
    documents=["Paris is the capital and most populous city of France."],
)

with mlflow.start_run(run_name="rag_evaluation_v1"):
    result = trust_engine.get_trust_score(context)
    # Scores are automatically logged to the active run
    print(f"Trust Score: {result['score']}  Label: {result['label']}")
You can log additional parameters and artifacts alongside TrustifAI’s automatic logs:
with mlflow.start_run(run_name="production_audit"):
    mlflow.log_param("model_version", "gpt-4o-2024-11-20")
    mlflow.log_param("retriever", "faiss-top-4")

    result = trust_engine.get_trust_score(context)

    # TrustifAI logs offline/evidence_coverage_score, offline/trust_score, etc.
    # You can add your own metrics alongside them
    mlflow.log_metric("custom/query_length", len(context.query))

Batch evaluation with tracing

When running evaluate_dataset, each evaluation is a separate Trustifai.get_trust_score call. To log each row to its own MLflow run, wrap the batch in a parent run and create child runs per item:
import mlflow
from trustifai.async_pipeline import AsyncTrustifai, evaluate_dataset
from trustifai import MetricContext

engine = AsyncTrustifai("config_file.yaml")

async def run_traced_batch(contexts: list[MetricContext]):
    with mlflow.start_run(run_name="batch_eval"):
        batch = await evaluate_dataset(engine, contexts, concurrency=5)

        # Log aggregate statistics to the parent run
        mlflow.log_metric("batch/mean_score", batch.mean_score)
        mlflow.log_metric("batch/failure_rate", batch.failure_rate)
        mlflow.log_metric("batch/total", batch.total)

    return batch
If no MLflow run is active when get_trust_score is called, TrustifAI skips logging silently. You must start a run explicitly with mlflow.start_run() for metrics to appear in the UI.

Starting a local MLflow server

For local development, start the MLflow tracking server with:
mlflow server --host 127.0.0.1 --port 5000
Then set tracking_uri: "http://localhost:5000" in your config. Open http://localhost:5000 in a browser to view experiments, compare runs, and inspect logged metrics.

Disabling tracing

Tracing is disabled by default. To disable it explicitly or to turn it off in an environment where MLflow is not available:
tracing:
  type: "default"
  params:
    enabled: false
You can also omit the tracing section entirely — TrustifAI treats a missing section as disabled.
If enabled: true but trustifai[trace] is not installed, TrustifAI catches the ImportError and skips logging rather than raising an exception. Install trustifai[trace] to activate tracing.

Configuration

See the full config_file.yaml reference including the tracing section.

Batch evaluation

Run large-scale evaluations and log aggregate results to MLflow.

Build docs developers (and LLMs) love