Skip to main content

Prerequisites

Before you begin, ensure you have:
  • Python 3.9 or higher installed
  • A Gemini API key (get one free here)
  • A dataset in CSV or Parquet format
The free tier works, but Tier 1 or higher is recommended for production use to avoid rate limits during long-running experiments.

Quick Setup

1

Clone the repository

git clone https://github.com/srikar161720/ml-experiment-autopilot.git
cd ml-experiment-autopilot
2

Create virtual environment

python3 -m venv venv
source venv/bin/activate
3

Install dependencies

pip install -r requirements.txt
This will install:
  • google-generativeai - Gemini 3 API client
  • pandas, numpy - Data processing
  • scikit-learn, xgboost, lightgbm - ML models
  • mlflow - Experiment tracking
  • typer, rich - CLI interface
  • pydantic - Data validation
4

Configure API key

Copy the example environment file:
cp .env.example .env
Edit .env and add your Gemini API key:
.env
GEMINI_API_KEY=your_actual_gemini_api_key_here

Run Your First Experiment

The repository includes sample datasets to help you get started immediately.

Regression Example

Run an experiment on the California Housing dataset (20,640 samples):
python -m src.main run \
  --data data/sample/california_housing.csv \
  --target MedHouseVal \
  --task regression \
  --max-iterations 3 \
  --verbose
Use --verbose (or -v) to see Gemini’s detailed reasoning at each step. This helps you understand how the agent makes decisions.

Classification Example

Run an experiment on the Bank Marketing dataset (11,162 samples):
python -m src.main run \
  --data data/sample/bank.csv \
  --target deposit \
  --task classification \
  --max-iterations 3 \
  --verbose

Understanding the Output

Console Output

With --verbose enabled, you’ll see Gemini’s reasoning process in real-time:
╔══════════════════════════════════════════════════════════════╗
║  ITERATION 3 - GEMINI'S REASONING                            ║
║  Thought Signature Active | Context: 12 turns                ║
╚══════════════════════════════════════════════════════════════╝

Based on the previous 2 experiments, I've observed that:
- Tree-based models consistently outperform linear models on this dataset
- Iteration 2's log-transform hypothesis improved RMSE by 80%
- Feature distributions suggest boosting may capture residual patterns

For this iteration, I'm testing XGBoost with tuned learning rate
and max_depth to see if gradient boosting further reduces error...

┌─────────────────────────────────────────────────────────────┐
│ RESULTS ANALYSIS                                            │
├─────────────────────────────────────────────────────────────┤
│ Trend: IMPROVING                                            │
│ RMSE: 0.1332   ★ NEW BEST                                   │
│   82.1% better than baseline                                │
│                                                             │
│ Key Observations:                                           │
│   - Boosting provided 10.3% improvement over bagging        │
│   - Log transformation remains critical for this target     │
│   - Diminishing returns suggest stopping after next round   │
└─────────────────────────────────────────────────────────────┘

Generated Outputs

All experiment artifacts are saved to the outputs/ directory:
OutputLocationDescription
Markdown reportsoutputs/reports/Publication-ready narrative reports
Visualizationsoutputs/plots/Metric progression, model comparison charts
Generated codeoutputs/experiments/Python scripts for each experiment
MLflow dataoutputs/mlruns/Experiment tracking database
Saved modelsoutputs/models/Serialized best models

View Results in MLflow

MLflow provides a web dashboard to explore all experiments:
mlflow ui --backend-store-uri file:./outputs/mlruns
Then open http://127.0.0.1:5000 in your browser. The MLflow UI shows:
  • All experiment runs with metrics
  • Parameter comparisons
  • Artifact downloads (models, plots, code)
  • Run metadata and timing

Experiment Loop

The autopilot follows this iterative process:
1

Data Profiling

Analyzes schema, distributions, missing values, and statistical properties.
# From src/execution/data_profiler.py
profiler = DataProfiler(data_path, target_column, task_type)
profile = profiler.profile()
2

Baseline Model

Establishes a performance floor with a simple model (e.g., mean prediction for regression).
3

Experiment Design

Gemini designs the next experiment based on:
  • Data profile
  • Previous experiment results
  • User constraints (if provided)
  • Hypothesis to test
4

Code Generation

Generates validated Python code using Jinja2 templates:
# From src/execution/code_generator.py
generator = CodeGenerator()
code = generator.generate(
    spec=experiment_spec,
    data_path=data_path,
    output_dir=output_dir
)
5

Execution

Runs the generated code in a subprocess with timeout protection.
6

Results Analysis

Gemini analyzes results, compares against baseline and best, and detects trends.
7

Hypothesis Generation

Generates ranked hypotheses for the next iteration with confidence scores.
8

Termination Check

Decides whether to continue based on:
  • Max iterations reached
  • Performance plateau (3 consecutive non-improvements by default)
  • Time budget exceeded
  • Target metric achieved (if specified in constraints)

Using Your Own Dataset

To run the autopilot on your own data:
python -m src.main run \
  --data /path/to/your/dataset.csv \
  --target your_target_column \
  --task regression \
  --max-iterations 10 \
  --verbose
Ensure your target column name is spelled exactly as it appears in the dataset (case-sensitive).

Supported File Formats

  • CSV: .csv files (automatically detected)
  • Parquet: .parquet files (automatically detected)

Task Types

  • regression - For continuous target variables (house prices, temperature, etc.)
  • classification - For categorical target variables (yes/no, spam/ham, etc.)

Adding Constraints

Guide Gemini’s experimentation with natural language preferences:
python -m src.main run \
  --data data/sample/california_housing.csv \
  --target MedHouseVal \
  --task regression \
  --constraints my_constraints.md \
  --max-iterations 5 \
  --verbose
Create a constraints file in Markdown:
my_constraints.md
# Experiment Constraints

## Metrics
- Primary metric: RMSE

## Models
- Prefer tree-based models
- Prefer boosting methods

## Preprocessing
- Log-transform the target variable
- Use median imputation for missing values

## Termination
- Stop if no improvement for 3 iterations
Constraints are natural language guidelines, not hard rules. Gemini will consider them when designing experiments but may deviate if it has good reasoning to do so.

Resuming Interrupted Experiments

If an experiment is interrupted (Ctrl+C or crash), the state is automatically saved:
python -m src.main run \
  --resume outputs/state_20260302_143022.json
The autopilot will continue from where it left off, preserving all conversation context.

Next Steps

CLI Reference

Complete list of all command-line options

Advanced Constraints

Learn how to write powerful constraint files

Understanding Reports

How to interpret generated Markdown reports

Troubleshooting

Common issues and solutions

Build docs developers (and LLMs) love