Dataset Overview
The California Housing dataset contains information about housing prices in California. Features:MedInc: Median income in block groupHouseAge: Median house age in block groupAveRooms: Average number of rooms per householdAveBedrms: Average number of bedrooms per householdPopulation: Block group populationAveOccup: Average number of household membersLatitude: Block group latitudeLongitude: Block group longitude
MedHouseVal: Median house value in block group (in $100,000s)
Basic Usage
Run a regression experiment with default settings:What Happens
- Data Profiling: Analyzes schema, distributions, and missing values
- Baseline Model: Trains a simple Linear Regression to establish performance floor
- Iteration Loop: Gemini designs, executes, and analyzes experiments
- Report Generation: Creates a narrative Markdown report with insights
Expected Output
With Constraints
Guide the experiment with natural language preferences:- Command
- constraints.md
Impact of Constraints
With these constraints, Gemini will:- Focus on tree-based models (Random Forest, XGBoost, LightGBM)
- Apply log transformation to the target variable
- Use RMSE as the primary optimization metric
- Stop early if performance plateaus
Advanced Configuration
- Extended Budget
- Custom Output
- Resume
Run more iterations with a longer time budget:
Interpreting Results
Metric Progression
After running the experiment, you’ll see a progression chart inoutputs/plots/metric_progression.png:
- X-axis: Iteration number
- Y-axis: RMSE (lower is better)
- Trend: Ideally decreasing over iterations
Best Model Details
The final report (outputs/reports/experiment_report_TIMESTAMP.md) includes:
Key Insights Example
“Log transformation of the target variable was critical, reducing RMSE by 80%. Gradient boosting methods (XGBoost, LightGBM) consistently outperformed bagging approaches. Geographic features (Latitude, Longitude) showed high feature importance, suggesting spatial patterns in housing prices.”
Viewing in MLflow
Launch the MLflow UI to explore all experiments:- Compare metrics across all iterations
- View hyperparameters for each experiment
- Download saved model artifacts
- Visualize feature importance
- Export results to CSV
Common Results
Typical results for this dataset:| Iteration | Model | RMSE | Improvement |
|---|---|---|---|
| Baseline | LinearRegression | 0.7445 | - |
| 1 | RandomForest | 0.4823 | 35.2% |
| 2 | RandomForest + log | 0.1489 | 80.0% |
| 3 | XGBRegressor | 0.1332 | 82.1% |
| 4 | LGBMRegressor | 0.1345 | 81.9% |
| 5 | XGBRegressor tuned | 0.1287 | 82.7% |
Why These Results?
- Tree-based models excel: Capture non-linear relationships between features and target
- Log transformation critical: Target variable (house prices) is right-skewed
- Boosting outperforms bagging: Gradient boosting captures residual patterns
- Geographic features important: Spatial location strongly correlates with price
Next Steps
Classification Example
Learn how to run classification experiments
Advanced Constraints
Explore complex constraint configurations
CLI Reference
View all available command options
Interpretation
Deep dive into result interpretation