Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/characat0/mlops-fundamentals-homework/llms.txt

Use this file to discover all available pages before exploring further.

evaluate.py closes the training loop by promoting the single best-performing model to production. It queries every MLflow run logged by train.py, picks the one with the highest accuracy metric, registers that run’s artifact under the name spotify-genre-classifier in the MLflow Model Registry, and stamps it with the @champion alias. The Dockerfile uses this alias to pull the correct model at container build time without ever hard-coding a version number.

Function Signature

def evaluate_and_register(train_data_path: str = "data/train.csv")
ArgumentTypeDefaultDescription
train_data_pathstr"data/train.csv"Path to the training CSV (used for context; the function primarily interacts with MLflow)

How It Works

1

Connect to MLflow

Read the tracking server address from the MLFLOW_TRACKING_URI environment variable, defaulting to http://localhost:5000 if unset. Set the URI with mlflow.set_tracking_uri(tracking_uri) and instantiate mlflow.tracking.MlflowClient().
2

Find the default experiment

Resolve the experiment by name (None falls back to experiment ID "0", the default experiment created automatically by MLflow).
experiment = client.get_experiment_by_name(None) or client.get_experiment("0")
3

Search all runs ordered by accuracy

Retrieve up to 100 runs from the experiment, sorted by metrics.accuracy descending so the best run is always first:
runs = client.search_runs(
    experiment_ids=[experiment.experiment_id],
    order_by=["metrics.accuracy DESC"],
    max_results=100
)
4

Pick the best run

Select runs[0] as best_run. If no runs exist, the function logs an error and returns early rather than raising an exception.
5

Register the model version (TODO)

This step is a student exercise. Call client.create_model_version() to attach the run’s model artifact to a named entry in the Model Registry. The scaffolded variables model_name, model_uri, and best_run are already available:
# TODO: implement in evaluate.py
model_version = client.create_model_version(
    name=model_name,
    source=model_uri,
    run_id=best_run.info.run_id
)
client.create_model_version() returns a ModelVersion object whose .version attribute is needed in the next step.
6

Assign the @champion alias (TODO)

This step is a student exercise. Call client.set_registered_model_alias() to tag the new version so downstream consumers (the Dockerfile) can reference it by alias rather than version number:
# TODO: implement in evaluate.py
client.set_registered_model_alias(
    name=model_name,
    alias="champion",
    version=model_version.version
)
7

Write metrics.json

Serialise evaluation results to metrics.json in the current working directory. DVC reads this file as a pipeline metric so dvc metrics show can surface it without launching the MLflow UI.

Output: metrics.json

{
  "best_run_id": "<mlflow-run-id>",
  "best_accuracy": 0.87,
  "model_type": "xgboost",
  "model_name": "spotify-genre-classifier",
  "champion_alias": "champion"
}
FieldDescription
best_run_idThe MLflow run UUID of the champion model
best_accuracyTraining accuracy of the winning run
model_typeValue of the model param logged to the run (e.g. "xgboost")
model_nameRegistry name used for all versions: "spotify-genre-classifier"
champion_aliasAlways "champion" — the alias the Dockerfile resolves at build time

DVC Stage

evaluate:
  cmd: python src/evaluate.py --train_data data/train.csv
  deps:
    - src/evaluate.py
    - models/
  metrics:
    - metrics.json:
        cache: false
metrics.json is declared under metrics: (not outs:) so DVC treats it as a structured metric file. Run dvc metrics show or dvc metrics diff to compare values across Git commits.

CLI Usage

python src/evaluate.py --train_data data/train.csv
evaluate.py must be run after train.py has completed at least one successful MLflow run. If no runs exist in the default experiment, the function logs "No runs found. Did you run train.py?" and exits without writing metrics.json or registering anything.

Build docs developers (and LLMs) love