ML Experiment Autopilot

Beyond Black-Box AutoML

ML Experiment Autopilot reimagines automated machine learning by combining autonomous experimentation with explainable AI. Unlike traditional AutoML tools that operate as black boxes, our system uses Google Gemini 3 to explain every decision, test data-driven hypotheses, and generate publication-ready reports.

Explainable Decisions

Every experiment design, model selection, and hyperparameter choice comes with Gemini’s detailed reasoning

Hypothesis Testing

Generates and tests data-driven hypotheses across iterations, learning from successes and failures

Natural Language Constraints

Guide experiments using plain English preferences instead of complex configuration files

Publication-Ready Reports

Automatically generates narrative Markdown reports with methodology, insights, and recommendations

How It Works

The autopilot runs an autonomous experiment loop powered by Gemini 3’s Marathon Agent capabilities:

Data Profiling

Analyzes your dataset’s schema, distributions, missing values, and statistical properties

Baseline Model

Establishes a performance floor with a simple model

Iterative Experimentation

Gemini designs experiments, generates Python code, executes training, analyzes results, and proposes next steps — all autonomously

Intelligent Termination

Stops when performance plateaus, time budget expires, or target metrics are achieved

Report Generation

Synthesizes findings into a comprehensive narrative report with visualizations

Key Features

Thought Signatures for Reasoning Continuity

All cognitive components share a single Gemini conversation, maintaining context across 100+ API calls. This means Gemini can reference results from iteration 1 when designing iteration 10.

Four Cognitive Components

Component	Role	Output
ExperimentDesigner	Designs next experiment based on data profile, history, and constraints	Structured JSON: model, hyperparameters, preprocessing
ResultsAnalyzer	Compares current results against baseline and best	Trend detection, metric comparison, observations
HypothesisGenerator	Synthesizes all iterations into ranked next steps	Hypotheses with confidence scores, explore/exploit strategy
ReportGenerator	Writes final narrative report	Markdown with executive summary, methodology, insights

Supported Models

scikit-learn: LinearRegression, LogisticRegression, RandomForest, GradientBoosting, SVM, and more
XGBoost: XGBRegressor, XGBClassifier
LightGBM: LGBMRegressor, LGBMClassifier

Automatic Preprocessing

Gemini decides per experiment — no fixed pipeline:

Missing value handling (drop, mean, median, mode imputation)
Feature scaling (standard, min-max, or none)
Categorical encoding (one-hot, ordinal)
Target transformations (log, sqrt for skewed distributions)

Architecture

┌─────────────────────────────────────────────────────────────┐
│                   ML EXPERIMENT AUTOPILOT                   │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  ┌────────────────────────────────────────────────────────┐ │
│  │                 ORCHESTRATION LAYER                    │ │
│  │ ExperimentController — main loop & state machine       │ │
│  │ Pydantic state management — JSON with type validation  │ │
│  └────────────────────────────────────────────────────────┘ │
│                             │                               │
│                             ▼                               │
│  ┌────────────────────────────────────────────────────────┐ │
│  │           COGNITIVE CORE  (Gemini 3 Flash)             │ │
│  │  ExperimentDesigner — designs next experiment          │ │
│  │  ResultsAnalyzer — compares results, detects trends    │ │
│  │  HypothesisGenerator — hypotheses with confidence      │ │
│  │  ReportGenerator — publication-ready narrative reports │ │
│  │  Thought Signatures maintain reasoning continuity      │ │
│  └────────────────────────────────────────────────────────┘ │
│                             │                               │
│                             ▼                               │
│  ┌────────────────────────────────────────────────────────┐ │
│  │                 EXECUTION LAYER                        │ │
│  │  DataProfiler — schema, stats, missing values          │ │
│  │  CodeGenerator — Jinja2 template-based Python scripts  │ │
│  │  ExperimentRunner — subprocess execution with timeout  │ │
│  └────────────────────────────────────────────────────────┘ │
│                             │                               │
│                             ▼                               │
│  ┌────────────────────────────────────────────────────────┐ │
│  │                  PERSISTENCE LAYER                     │ │
│  │  MLflow tracking — metrics, params, artifacts          │ │
│  │  JSON state files — resumable experiment sessions      │ │
│  └────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────┘

Why This Qualifies for “The Marathon Agent”

Autonomous: Runs 20+ iterations without human intervention
Long-Running: Maintains context across multi-hour execution via Thought Signatures
Self-Correcting: Learns from failures, adjusts strategy, detects performance plateaus
Explainable: Every decision is documented with Gemini’s reasoning

Next Steps

Quickstart

Run your first experiment in under 5 minutes

Installation

Complete setup and configuration guide

CLI Reference

All command-line options and arguments

Examples

Real-world examples with regression and classification

Get Started

Core Concepts

CLI Reference

Guides

Examples

Beyond Black-Box AutoML

Explainable Decisions

Hypothesis Testing

Natural Language Constraints

Publication-Ready Reports

How It Works

Key Features

Thought Signatures for Reasoning Continuity

Four Cognitive Components

Supported Models

Automatic Preprocessing

Architecture

Why This Qualifies for “The Marathon Agent”

Next Steps

Quickstart

Installation

CLI Reference

Examples

Build docs developers (and LLMs) love

Get Started

Core Concepts

CLI Reference

Guides

Examples

​Beyond Black-Box AutoML

Explainable Decisions

Hypothesis Testing

Natural Language Constraints

Publication-Ready Reports

​How It Works

​Key Features

​Thought Signatures for Reasoning Continuity

​Four Cognitive Components

​Supported Models

​Automatic Preprocessing

​Architecture

​Why This Qualifies for “The Marathon Agent”

​Next Steps

Quickstart

Installation

CLI Reference

Examples

Build docs developers (and LLMs) love

Beyond Black-Box AutoML

How It Works

Key Features

Thought Signatures for Reasoning Continuity

Four Cognitive Components

Supported Models

Automatic Preprocessing

Architecture

Why This Qualifies for “The Marathon Agent”

Next Steps