Skip to main content
ML Experiment Autopilot Hero Light

Beyond Black-Box AutoML

ML Experiment Autopilot reimagines automated machine learning by combining autonomous experimentation with explainable AI. Unlike traditional AutoML tools that operate as black boxes, our system uses Google Gemini 3 to explain every decision, test data-driven hypotheses, and generate publication-ready reports.

Explainable Decisions

Every experiment design, model selection, and hyperparameter choice comes with Gemini’s detailed reasoning

Hypothesis Testing

Generates and tests data-driven hypotheses across iterations, learning from successes and failures

Natural Language Constraints

Guide experiments using plain English preferences instead of complex configuration files

Publication-Ready Reports

Automatically generates narrative Markdown reports with methodology, insights, and recommendations

How It Works

The autopilot runs an autonomous experiment loop powered by Gemini 3’s Marathon Agent capabilities:
1

Data Profiling

Analyzes your dataset’s schema, distributions, missing values, and statistical properties
2

Baseline Model

Establishes a performance floor with a simple model
3

Iterative Experimentation

Gemini designs experiments, generates Python code, executes training, analyzes results, and proposes next steps — all autonomously
4

Intelligent Termination

Stops when performance plateaus, time budget expires, or target metrics are achieved
5

Report Generation

Synthesizes findings into a comprehensive narrative report with visualizations

Key Features

Thought Signatures for Reasoning Continuity

All cognitive components share a single Gemini conversation, maintaining context across 100+ API calls. This means Gemini can reference results from iteration 1 when designing iteration 10.

Four Cognitive Components

ComponentRoleOutput
ExperimentDesignerDesigns next experiment based on data profile, history, and constraintsStructured JSON: model, hyperparameters, preprocessing
ResultsAnalyzerCompares current results against baseline and bestTrend detection, metric comparison, observations
HypothesisGeneratorSynthesizes all iterations into ranked next stepsHypotheses with confidence scores, explore/exploit strategy
ReportGeneratorWrites final narrative reportMarkdown with executive summary, methodology, insights

Supported Models

  • scikit-learn: LinearRegression, LogisticRegression, RandomForest, GradientBoosting, SVM, and more
  • XGBoost: XGBRegressor, XGBClassifier
  • LightGBM: LGBMRegressor, LGBMClassifier

Automatic Preprocessing

Gemini decides per experiment — no fixed pipeline:
  • Missing value handling (drop, mean, median, mode imputation)
  • Feature scaling (standard, min-max, or none)
  • Categorical encoding (one-hot, ordinal)
  • Target transformations (log, sqrt for skewed distributions)

Architecture

┌─────────────────────────────────────────────────────────────┐
│                   ML EXPERIMENT AUTOPILOT                   │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  ┌────────────────────────────────────────────────────────┐ │
│  │                 ORCHESTRATION LAYER                    │ │
│  │ ExperimentController — main loop & state machine       │ │
│  │ Pydantic state management — JSON with type validation  │ │
│  └────────────────────────────────────────────────────────┘ │
│                             │                               │
│                             ▼                               │
│  ┌────────────────────────────────────────────────────────┐ │
│  │           COGNITIVE CORE  (Gemini 3 Flash)             │ │
│  │  ExperimentDesigner — designs next experiment          │ │
│  │  ResultsAnalyzer — compares results, detects trends    │ │
│  │  HypothesisGenerator — hypotheses with confidence      │ │
│  │  ReportGenerator — publication-ready narrative reports │ │
│  │  Thought Signatures maintain reasoning continuity      │ │
│  └────────────────────────────────────────────────────────┘ │
│                             │                               │
│                             ▼                               │
│  ┌────────────────────────────────────────────────────────┐ │
│  │                 EXECUTION LAYER                        │ │
│  │  DataProfiler — schema, stats, missing values          │ │
│  │  CodeGenerator — Jinja2 template-based Python scripts  │ │
│  │  ExperimentRunner — subprocess execution with timeout  │ │
│  └────────────────────────────────────────────────────────┘ │
│                             │                               │
│                             ▼                               │
│  ┌────────────────────────────────────────────────────────┐ │
│  │                  PERSISTENCE LAYER                     │ │
│  │  MLflow tracking — metrics, params, artifacts          │ │
│  │  JSON state files — resumable experiment sessions      │ │
│  └────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────┘

Why This Qualifies for “The Marathon Agent”

  • Autonomous: Runs 20+ iterations without human intervention
  • Long-Running: Maintains context across multi-hour execution via Thought Signatures
  • Self-Correcting: Learns from failures, adjusts strategy, detects performance plateaus
  • Explainable: Every decision is documented with Gemini’s reasoning

Next Steps

Quickstart

Run your first experiment in under 5 minutes

Installation

Complete setup and configuration guide

CLI Reference

All command-line options and arguments

Examples

Real-world examples with regression and classification

Build docs developers (and LLMs) love