Track experiments with MLflow tracing

TrustifAI can log metric scores to MLflow as part of every evaluation. Offline metric scores (evidence coverage, epistemic consistency, semantic drift, source diversity) are logged under the offline/ prefix, and the online confidence score from Trustifai.generate() is logged under online/. Tracing is optional and disabled by default — your evaluation pipeline works identically with or without it.

Installation

MLflow tracing requires the trace optional extra:

pip install trustifai[trace]

This installs mlflow alongside the core TrustifAI package. Without it, the tracing code path is silently skipped even if enabled: true is set in your config.

Enabling tracing in config_file.yaml

Set enabled: true in the tracing section and point tracking_uri at your MLflow server:

tracing:
  type: "default"
  params:
    enabled: true
    tracking_uri: "http://localhost:5000"   # or an MLflow-hosted URI
    experiment_name: "trustifai_experiment"

If tracking_uri is null or omitted, MLflow defaults to a local ./mlruns directory in your working directory — useful for local development without a running server.

Field	Description
`enabled`	`true` to activate MLflow logging, `false` (default) to disable
`tracking_uri`	URI of your MLflow tracking server. `null` uses the local `./mlruns` default
`experiment_name`	Name of the MLflow experiment that runs are grouped under

What gets logged

TrustifAI uses mlflow.set_tag to log metric data as run tags on the active MLflow run.

Offline metrics (get_trust_score)

When you call get_trust_score, TrustifAI logs each active metric score and the final decision as tags:

MLflow tag	Value
`offline/<metric_name>`	Float score for each active metric (e.g. `offline/evidence_coverage`)
`decision`	Final label: `RELIABLE`, `ACCEPTABLE (WITH CAUTION)`, or `UNRELIABLE`
`trust_score/final`	Aggregated weighted trust score

Online metrics (generate)

When you call Trustifai.generate(), the confidence score is logged as a tag:

MLflow tag	Value
`online/confidence_score`	Float confidence score derived from log probabilities

Using tracing inside an MLflow run

TrustifAI logs to the currently active MLflow run. Start a run using the standard mlflow.start_run() context manager before calling get_trust_score or generate:

import mlflow
from trustifai import Trustifai, MetricContext

trust_engine = Trustifai("config_file.yaml")  # tracing.enabled: true

context = MetricContext(
    query="What is the capital of France?",
    answer="The capital of France is Paris.",
    documents=["Paris is the capital and most populous city of France."],
)

with mlflow.start_run(run_name="rag_evaluation_v1"):
    result = trust_engine.get_trust_score(context)
    # Scores are automatically logged to the active run
    print(f"Trust Score: {result['score']}  Label: {result['label']}")

You can log additional parameters and artifacts alongside TrustifAI’s automatic logs:

with mlflow.start_run(run_name="production_audit"):
    mlflow.log_param("model_version", "gpt-4o-2024-11-20")
    mlflow.log_param("retriever", "faiss-top-4")

    result = trust_engine.get_trust_score(context)

    # TrustifAI logs offline/evidence_coverage_score, offline/trust_score, etc.
    # You can add your own metrics alongside them
    mlflow.log_metric("custom/query_length", len(context.query))

Batch evaluation with tracing

When running evaluate_dataset, each evaluation is a separate Trustifai.get_trust_score call. To log each row to its own MLflow run, wrap the batch in a parent run and create child runs per item:

import mlflow
from trustifai.async_pipeline import AsyncTrustifai, evaluate_dataset
from trustifai import MetricContext

engine = AsyncTrustifai("config_file.yaml")

async def run_traced_batch(contexts: list[MetricContext]):
    with mlflow.start_run(run_name="batch_eval"):
        batch = await evaluate_dataset(engine, contexts, concurrency=5)

        # Log aggregate statistics to the parent run
        mlflow.log_metric("batch/mean_score", batch.mean_score)
        mlflow.log_metric("batch/failure_rate", batch.failure_rate)
        mlflow.log_metric("batch/total", batch.total)

    return batch

If no MLflow run is active when get_trust_score is called, TrustifAI skips logging silently. You must start a run explicitly with mlflow.start_run() for metrics to appear in the UI.

Starting a local MLflow server

For local development, start the MLflow tracking server with:

mlflow server --host 127.0.0.1 --port 5000

Then set tracking_uri: "http://localhost:5000" in your config. Open http://localhost:5000 in a browser to view experiments, compare runs, and inspect logged metrics.

Disabling tracing

Tracing is disabled by default. To disable it explicitly or to turn it off in an environment where MLflow is not available:

tracing:
  type: "default"
  params:
    enabled: false

You can also omit the tracing section entirely — TrustifAI treats a missing section as disabled.

If enabled: true but trustifai[trace] is not installed, TrustifAI catches the ImportError and skips logging rather than raising an exception. Install trustifai[trace] to activate tracing.

Configuration

See the full config_file.yaml reference including the tracing section.

Batch evaluation

Run large-scale evaluations and log aggregate results to MLflow.

Get Started

Core Concepts

Guides

Track experiments with MLflow tracing

Installation

Enabling tracing in config_file.yaml

What gets logged

Offline metrics (get_trust_score)

Online metrics (generate)

Using tracing inside an MLflow run

Batch evaluation with tracing

Starting a local MLflow server

Disabling tracing

Configuration

Batch evaluation

Build docs developers (and LLMs) love

Get Started

Core Concepts

Guides

Documentation Index

​Installation

​Enabling tracing in config_file.yaml

​What gets logged

​Offline metrics (get_trust_score)

​Online metrics (generate)

​Using tracing inside an MLflow run

​Batch evaluation with tracing

​Starting a local MLflow server

​Disabling tracing

Configuration

Batch evaluation

Build docs developers (and LLMs) love

Installation

Enabling tracing in config_file.yaml

What gets logged

Offline metrics (get_trust_score)

Online metrics (generate)

Using tracing inside an MLflow run

Batch evaluation with tracing

Starting a local MLflow server

Disabling tracing