Skip to main content

Overview

The ExperimentRunner class executes generated ML experiment scripts in isolated subprocesses with comprehensive error handling, timeout management, and metrics parsing.

Features

  • Subprocess isolation - Each experiment runs in a separate Python process
  • Timeout handling - Automatic termination of long-running experiments
  • Output capture - Captures stdout/stderr for debugging
  • JSON metrics parsing - Parses structured experiment results
  • Graceful error handling - Returns structured error information on failure

Class Definition

ExperimentRunner

from src.execution.experiment_runner import ExperimentRunner

runner = ExperimentRunner(
    timeout=300,
    python_executable="python"
)
timeout
Optional[int]
Timeout in seconds for each experiment. Defaults to system config value (typically 300s)
python_executable
str
Path to Python executable. Defaults to "python"

Methods

run()

Execute an experiment script and capture results.
result = runner.run(
    script_path=Path("experiments/rf_experiment.py"),
    spec=experiment_spec,
    iteration=1
)
script_path
Path
required
Path to the generated Python experiment script
spec
ExperimentSpec
required
Experiment specification used to create the script
iteration
int
required
Current iteration number for tracking
Returns: ExperimentResult - Pydantic model containing:
experiment_name
str
Name of the experiment from spec
iteration
int
Iteration number
model_type
str
Model class name (e.g., “RandomForestClassifier”)
model_params
dict[str, Any]
Hyperparameters used
preprocessing
PreprocessingConfig
Preprocessing configuration applied
metrics
dict[str, float]
Performance metrics from the experiment:Classification:
  • accuracy, precision, recall, f1
  • roc_auc (if applicable)
Regression:
  • rmse, mae, r2
hypothesis
str
Hypothesis being tested
reasoning
str
Reasoning behind experiment design
execution_time
float
Wall clock time in seconds
success
bool
Whether the experiment completed successfully
error_message
Optional[str]
Error message if success=False
code_path
str
Path to the executed script
timestamp
datetime
When the experiment was run

run_script_directly()

Run a script and return raw output (primarily for testing).
output = runner.run_script_directly(
    script_path=Path("test_script.py")
)
script_path
Path
required
Path to the Python script to execute
Returns: dict with keys:
  • stdout (str) - Standard output
  • stderr (str) - Standard error
  • returncode (int) - Process exit code

Expected Script Output Format

Generated experiment scripts must output JSON to stdout:
{
  "success": true,
  "metrics": {
    "accuracy": 0.87,
    "f1": 0.85,
    "precision": 0.86,
    "recall": 0.84
  }
}
For errors:
{
  "success": false,
  "error": "ValueError: Target column not found"
}

Error Handling

The runner handles multiple failure modes gracefully:

1. Script Exit Errors

# Script exits with non-zero code
result.success = False
result.error_message = "Script exited with code 1: ImportError: No module named sklearn"

2. JSON Parsing Errors

# Script outputs invalid JSON
result.success = False
result.error_message = "Failed to parse JSON output: Expecting value: line 1 column 1"

3. Timeout Errors

# Script exceeds timeout
result.success = False
result.error_message = "Experiment timed out after 300 seconds"

4. Unexpected Exceptions

# Any other exception
result.success = False
result.error_message = str(exception)

Complete Example

from pathlib import Path
from src.execution.experiment_runner import ExperimentRunner
from src.execution.code_generator import CodeGenerator
from src.orchestration.state import ExperimentSpec, PreprocessingConfig

# Create specification
spec = ExperimentSpec(
    experiment_name="logistic_baseline",
    hypothesis="Establish baseline with logistic regression",
    model_type="LogisticRegression",
    model_params={"C": 1.0, "max_iter": 1000},
    preprocessing=PreprocessingConfig(
        missing_values="median",
        scaling="standard",
        encoding="onehot"
    ),
    reasoning="Starting with simple linear model"
)

# Generate script
generator = CodeGenerator()
script_path = generator.generate(
    spec=spec,
    data_path=Path("data.csv"),
    target_column="target",
    task_type="classification"
)

# Run experiment
runner = ExperimentRunner(timeout=300)
result = runner.run(
    script_path=script_path,
    spec=spec,
    iteration=0
)

# Check results
if result.success:
    print(f"✓ Experiment succeeded in {result.execution_time:.2f}s")
    print(f"Metrics: {result.metrics}")
    
    if "accuracy" in result.metrics:
        print(f"Accuracy: {result.metrics['accuracy']:.4f}")
else:
    print(f"✗ Experiment failed: {result.error_message}")

Timeout Configuration

Default timeout from config:
from src.config import ExperimentDefaults

default_timeout = ExperimentDefaults().experiment_timeout  # Usually 300s
Custom timeout per runner:
# Short timeout for quick models
fast_runner = ExperimentRunner(timeout=60)

# Longer timeout for complex models
slow_runner = ExperimentRunner(timeout=1800)  # 30 minutes

Process Isolation

Each experiment runs in a clean subprocess:
subprocess.run(
    [python_executable, str(script_path)],
    capture_output=True,
    text=True,
    timeout=timeout,
    cwd=script_path.parent
)
Working directory is set to the script’s parent directory, so relative paths in the script work correctly.

Metrics Access

Access metrics from results:
# Get specific metric
accuracy = result.get_primary_metric("accuracy")

# Iterate all metrics
for metric_name, value in result.metrics.items():
    print(f"{metric_name}: {value:.4f}")

# Check if metric exists
if "roc_auc" in result.metrics:
    auc = result.metrics["roc_auc"]

Error Types

ExperimentExecutionError

Raised for unrecoverable execution errors:
from src.execution.experiment_runner import ExperimentExecutionError

try:
    result = runner.run(script_path, spec, iteration)
except ExperimentExecutionError as e:
    print(f"Fatal execution error: {e}")
Most errors are captured gracefully in ExperimentResult.error_message rather than raising exceptions.

Source Location

~/workspace/source/src/execution/experiment_runner.py

Build docs developers (and LLMs) love