Overview
ML Experiment Autopilot automatically logs all experiments to MLflow, providing a web UI for exploring metrics, parameters, and artifacts. This guide covers how to launch and navigate the MLflow interface.Quick Start
MLflow Storage Location
All tracking data is stored locally in theoutputs/mlruns/ directory (defined in src/config.py:22):
Each experiment session creates a new MLflow experiment. Multiple runs are logged within each experiment.
What Gets Logged
TheMLflowTracker (defined in src/persistence/mlflow_tracker.py) logs comprehensive data for each iteration:
Parameters
Logged inlog_experiment() at src/persistence/mlflow_tracker.py:103-112:
Model class name (e.g.,
XGBRegressor, RandomForestClassifier)Iteration number (0 = baseline)
All model hyperparameters with
model_ prefix:model_max_depth: 5model_learning_rate: 0.05model_n_estimators: 100
Preprocessing configuration:
preprocessing_missing: “median”preprocessing_scaling: “standard”preprocessing_encoding: “onehot”
Metrics
Logged inlog_experiment() at src/persistence/mlflow_tracker.py:114-120:
All metrics from the experiment result:Regression:
rmse, mae, r2Classification: accuracy, f1, precision, recall, roc_aucTime in seconds to train and evaluate the model
1 if experiment succeeded, 0 if failed
Tags
Logged inlog_experiment() at src/persistence/mlflow_tracker.py:122-126:
The hypothesis being tested (truncated to 250 chars)
“True” or “False” string representation
Artifacts
Logged files attached to each run:| Artifact | Path | Description |
|---|---|---|
| reasoning.txt | artifacts/reasoning.txt | Gemini’s reasoning for experiment design |
| experiment_.py | artifacts/experiment_{N}.py | Generated Python training script |
| error.txt | artifacts/error.txt | Error message if experiment failed |
| visualizations | artifacts/*.png | Charts (final_summary run only) |
Navigating the MLflow UI
Experiments List
The home page shows all experiments:Runs Table
The runs table displays all iterations:| Run Name | Start Time | Duration | rmse | r2 | success |
|---|---|---|---|---|---|
| data_profile | 2026-03-02 10:00:00 | 2s | - | - | - |
| baseline | 2026-03-02 10:00:05 | 8s | 0.7456 | 0.6012 | 1 |
| log_transform_rf | 2026-03-02 10:00:15 | 12s | 0.4201 | 0.7834 | 1 |
| xgboost_tuned | 2026-03-02 10:00:30 | 15s | 0.1332 | 0.8456 | 1 |
| final_summary | 2026-03-02 10:00:50 | 1s | - | - | - |
Special Runs
Logged at the start of each session (in
log_data_profile() at src/persistence/mlflow_tracker.py:60-91).Parameters:n_rows,n_columnsn_numeric_features,n_categorical_featurestarget_column,target_type
total_missing_values
data_profile.json— Full data profile
Logged at the end of each session (in
log_final_summary() at src/persistence/mlflow_tracker.py:151-185).Metrics:total_iterationssuccessful_experimentstotal_time_secondsbest_metric
best_experimenttermination_reasonphase
final_state.json— Complete experiment state- Visualization plots (
*.png)
Run Detail View
Click any run name to view detailed information:- Overview
- Parameters
- Metrics
- Artifacts
High-level run information:
- Run ID and name
- Start time and duration
- User and source
Comparing Runs
Example Comparison
Comparing iterations 1, 2, and 3:| Run | model_type | model_max_depth | rmse | r2 |
|---|---|---|---|---|
| log_transform_rf | RandomForestRegressor | 10 | 0.4201 | 0.7834 |
| xgboost_initial | XGBRegressor | 3 | 0.3567 | 0.8123 |
| xgboost_tuned | XGBRegressor | 5 | 0.1332 | 0.8456 |
- Deeper trees (max_depth 5 vs 3) improved XGBoost performance
- XGBoost outperforms RandomForest on this dataset
- RMSE reduced by 68% from iteration 1 to 3
Searching and Filtering
Filter by Metric
Use the search bar with MLflow’s query syntax:Filter by Parameter
Parameters are stored as strings in MLflow. Use string comparisons even for numeric values.
Filter by Tag
Downloading Artifacts
Useful Artifacts
reasoning.txt
reasoning.txt
Gemini’s full reasoning for designing the experiment:
experiment_{N}.py
experiment_{N}.py
The generated Python script that was executed:Useful for:
- Reproducing results manually
- Debugging preprocessing steps
- Understanding exact model configuration
data_profile.json
data_profile.json
Complete dataset analysis from the data_profile run:
Programmatic Access
Query MLflow data programmatically using theMLflowTracker API:
Troubleshooting
No experiments appear in MLflow UI
No experiments appear in MLflow UI
Cause: Incorrect
--backend-store-uriSolution:Port 5000 already in use
Port 5000 already in use
Cause: Another process using port 5000Solution: Specify a different port
Runs missing metrics
Runs missing metrics
Cause: Experiment failed before metrics loggedSolution: Check
success metric:- Filter for
metrics.success = 0 - Download
error.txtartifact to see failure reason - Check
outputs/experiments/{session_id}/for generated code
MLflow UI is slow
MLflow UI is slow
Cause: Large number of runs in experimentSolution:
- Use search filters to reduce visible runs
- Archive old experiments (move directories out of
mlruns/) - Consider upgrading to SQLite/PostgreSQL backend for production
Next Steps
Understanding Results
Interpret metrics and analysis outputs
Troubleshooting
Resolve common issues