ExperimentController API

Overview

The ExperimentController is the central orchestrator that manages the complete ML experiment lifecycle. It coordinates data profiling, baseline modeling, iterative experiment design, code generation, execution, analysis, and reporting.

Architecture

The controller implements a multi-phase state machine:

Data Profiling - Analyze dataset characteristics
Baseline Modeling - Establish performance baseline
Experiment Loop - Iterative design, execute, analyze
Finalization - Generate report and visualizations

Class Definition

ExperimentController

from src.orchestration.controller import ExperimentController
from pathlib import Path

controller = ExperimentController(
    data_path=Path("housing.csv"),
    target_column="price",
    task_type="regression",
    constraints="Focus on interpretable models",
    max_iterations=20,
    time_budget=3600,
    output_dir=Path("outputs/"),
    verbose=True,
    resume_path=None
)

data_path

Path

required

Path to the dataset file (CSV or Parquet)

target_column

str

required

Name of the target column for prediction

task_type

str

required

Type of ML task: 'classification' or 'regression'

constraints

Optional[str]

Natural language constraints for the AI:

“Only use tree-based models”
“Prioritize interpretability over accuracy”
“Must achieve R² > 0.85”
“Avoid deep learning methods”

max_iterations

int

Maximum number of experiment iterations. Default: 20

time_budget

int

Time budget in seconds. Default: 3600 (1 hour)

output_dir

Optional[Path]

Output directory for results. Defaults to project outputs/ folder

verbose

bool

Whether to show detailed reasoning and analysis. Default: False

resume_path

Optional[Path]

Path to state JSON file to resume from previous session

Methods

run()

Run the complete experiment loop from start to finish.

controller.run()

This method:

Profiles the dataset (if not already done)
Runs baseline model (if not already done)
Iteratively designs and executes experiments
Analyzes results after each iteration
Generates hypotheses for next iteration
Terminates based on stopping conditions
Generates final report and visualizations

Stopping conditions:

Maximum iterations reached
Time budget exhausted
Performance plateau detected (3 iterations without improvement)
Target metric achieved
AI agent recommends stopping

The method handles all phases automatically. If execution fails, state is saved for potential resume.

save_state()

Save current experiment state to disk.

controller.save_state()

Saves to: {output_dir}/state_{session_id}.json Saved information:

All experiment results
Data profile
Best metric tracking
Current phase
Gemini conversation history
Termination status

Experiment Phases

The controller tracks progress through these phases:

INITIALIZING

ExperimentPhase

Initial state before any work

DATA_PROFILING

ExperimentPhase

Analyzing dataset with DataProfiler

BASELINE_MODELING

ExperimentPhase

Running baseline model for comparison

EXPERIMENT_DESIGN

ExperimentPhase

Using Gemini to design next experiment

CODE_GENERATION

ExperimentPhase

Generating Python script from specification

EXPERIMENT_EXECUTION

ExperimentPhase

Running generated script in subprocess

RESULTS_ANALYSIS

ExperimentPhase

Analyzing experiment results with Gemini

HYPOTHESIS_GENERATION

ExperimentPhase

Generating hypotheses for next iteration

REPORT_GENERATION

ExperimentPhase

Creating final report with Gemini

COMPLETED

ExperimentPhase

Successfully finished all work

FAILED

ExperimentPhase

Fatal error occurred

Components

The controller initializes and coordinates these components:

Cognitive Layer (Gemini-powered)

GeminiClient - API client for Gemini
ExperimentDesigner - Designs experiments based on data and history
ResultsAnalyzer - Analyzes results and identifies patterns
HypothesisGenerator - Generates testable hypotheses
ReportGenerator - Creates final markdown report

Execution Layer

DataProfiler - Profiles dataset characteristics
CodeGenerator - Generates Python experiment scripts
ExperimentRunner - Executes scripts in subprocesses
VisualizationGenerator - Creates matplotlib plots

Persistence Layer

MLflowTracker - Logs experiments to MLflow
ExperimentState - Pydantic model for state management

Complete Example

from pathlib import Path
from src.orchestration.controller import ExperimentController

# Create controller
controller = ExperimentController(
    data_path=Path("data/titanic.csv"),
    target_column="survived",
    task_type="classification",
    constraints="""
        - Prioritize interpretable models
        - Must achieve F1 > 0.80
        - Avoid ensemble methods with >100 trees
    """,
    max_iterations=15,
    time_budget=1800,  # 30 minutes
    output_dir=Path("outputs/titanic_run"),
    verbose=True
)

# Run complete experiment loop
try:
    controller.run()
    print("✓ Experiment completed successfully")
    
    # Access results
    state = controller.state
    print(f"Best experiment: {state.best_experiment}")
    print(f"Best {state.config.primary_metric}: {state.best_metric:.4f}")
    print(f"Total iterations: {state.current_iteration}")
    print(f"Elapsed time: {state.get_elapsed_time():.0f}s")
    
except Exception as e:
    print(f"✗ Experiment failed: {e}")
    # State is automatically saved for resume

Resuming Experiments

Resume from a saved state file:

# Resume from previous session
controller = ExperimentController(
    data_path=Path("data/housing.csv"),
    target_column="price",
    task_type="regression",
    resume_path=Path("outputs/state_abc123.json")
)

controller.run()

When resuming, all initialization parameters except resume_path are loaded from the state file.

State Management

The controller maintains state using the ExperimentState Pydantic model:

# Access current state
state = controller.state

# Get summary
summary = state.get_summary()
print(summary)
# Output:
# {
#   'session_id': 'abc123',
#   'phase': 'experiment_design',
#   'current_iteration': 5,
#   'max_iterations': 20,
#   'elapsed_time': 450.3,
#   'best_metric': 0.876,
#   'best_experiment': 'rf_tuned_depth',
#   'total_experiments': 6,
#   'successful_experiments': 5
# }

MLflow Integration

The controller automatically logs to MLflow:

# MLflow experiment created as:
experiment_name = f"autopilot_{dataset_name}_{session_id}"

# Logged information:
# - Data profile (parameters and JSON artifact)
# - Each experiment run (parameters, metrics, code)
# - Final summary (metrics, state JSON)
# - Visualizations (PNG files)

View in MLflow UI:

mlflow ui --backend-store-uri ./mlruns
# Open http://localhost:5000

Iteration Loop Details

Each iteration follows this sequence:

def _run_iteration(self):
    # 1. Design experiment using Gemini
    spec = self._design_experiment()
    
    # 2. Generate Python script
    script_path = self.code_generator.generate(spec, ...)
    
    # 3. Execute in subprocess
    result = self.runner.run(script_path, spec, iteration)
    
    # 4. Update state
    self.state.add_experiment(result)
    
    # 5. Log to MLflow
    self.tracker.log_experiment(result)
    
    # 6. Analyze results with Gemini
    analysis = self._analyze_results(result)
    
    # 7. Generate hypotheses for next iteration
    hypotheses = self._generate_hypotheses(analysis)
    
    # 8. Save state
    self.save_state()

Constraint Parsing

The controller parses natural language constraints:

constraints = """
- Only tree-based models (RandomForest, XGBoost)
- Target RMSE < 5000
- Prefer models with <500 trees for speed
"""

# Gemini extracts:
# - Model restrictions
# - Target metric value
# - Performance vs speed tradeoffs

Hypothesis-Driven Design

The controller maintains cross-iteration context:

# After each iteration:
# 1. ResultsAnalyzer creates AnalysisResult
#    - Compares to baseline and previous best
#    - Identifies trend patterns
#    - Notes key observations

# 2. HypothesisGenerator creates HypothesisSet
#    - Multiple testable hypotheses
#    - Confidence scores and priorities
#    - Suggested models and parameters

# 3. ExperimentDesigner uses top hypothesis
#    - Incorporates into next experiment design
#    - Balances exploration vs exploitation

Output Files

The controller generates these outputs:

outputs/
├── state_{session_id}.json          # Experiment state
├── plots/
│   ├── metric_progression.png       # Metric over time
│   ├── model_comparison.png         # Model type comparison
│   └── improvement_over_baseline.png # Baseline vs best
├── report_{session_id}.md           # AI-generated report
└── experiments/
    └── {session_id}/
        ├── baseline_*.py            # Baseline script
        ├── experiment_1_*.py        # Generated scripts
        └── ...

Error Recovery

try:
    controller.run()
except Exception as e:
    # State is saved automatically
    print(f"Error: {e}")
    print(f"State saved to: {controller.output_dir}/state_*.json")
    
    # Can resume later:
    # controller = ExperimentController(
    #     data_path=...,
    #     resume_path=Path("state_abc123.json")
    # )

Verbose Mode

Enable verbose output to see detailed reasoning:

controller = ExperimentController(
    ...,
    verbose=True
)

# Shows:
# - Gemini's reasoning for each experiment design
# - Detailed analysis after each iteration
# - Hypothesis generation process
# - Conversation history length

Source Location

~/workspace/source/src/orchestration/controller.py

Cognitive Components

Execution Layer

Orchestration

Persistence

Overview

Architecture

Class Definition

ExperimentController

Methods

run()

save_state()

Experiment Phases

Components

Cognitive Layer (Gemini-powered)

Execution Layer

Persistence Layer

Complete Example

Resuming Experiments

State Management

MLflow Integration

Iteration Loop Details

Constraint Parsing

Hypothesis-Driven Design

Output Files

Error Recovery

Verbose Mode

Source Location

Build docs developers (and LLMs) love

Cognitive Components

Execution Layer

Orchestration

Persistence

​Overview

​Architecture

​Class Definition

​ExperimentController

​Methods

​run()

​save_state()

​Experiment Phases

​Components

​Cognitive Layer (Gemini-powered)

​Execution Layer

​Persistence Layer

​Complete Example

​Resuming Experiments

​State Management

​MLflow Integration

​Iteration Loop Details

​Constraint Parsing

​Hypothesis-Driven Design

​Output Files

​Error Recovery

​Verbose Mode

​Source Location

Build docs developers (and LLMs) love

Overview

Architecture

Class Definition

ExperimentController

Methods

run()

save_state()

Experiment Phases

Components

Cognitive Layer (Gemini-powered)

Execution Layer

Persistence Layer

Complete Example

Resuming Experiments

State Management

MLflow Integration

Iteration Loop Details

Constraint Parsing

Hypothesis-Driven Design

Output Files

Error Recovery

Verbose Mode

Source Location