Overview
TheCodeGenerator class generates executable Python ML experiment scripts from Jinja2 templates. It supports multiple model types and automatically handles parameter formatting and code validation.
Class Definition
CodeGenerator
Path to templates directory. Defaults to project’s
templates/ folderMethods
generate()
Generate an experiment script from a specification.Experiment specification containing:
experiment_name: Unique name for the experimenthypothesis: Hypothesis being testedmodel_type: sklearn model class name (e.g., “RandomForestClassifier”)model_params: Dictionary of model hyperparameterspreprocessing: PreprocessingConfig objectreasoning: Explanation of experiment design
Path to the dataset file
Name of the target column
Type of ML task:
'classification' or 'regression'Output directory for generated script. Defaults to project’s
experiments/ folderPath - Path to the generated Python script
Raises:
CodeGenerationError- If template not found, code has syntax errors, or file write fails
generate_baseline()
Generate a baseline experiment script with default settings.Path to the dataset file
Name of the target column
Type of ML task:
'classification' or 'regression'Output directory for generated script
Path - Path to the generated baseline script
Baseline models:
- Regression: LinearRegression with standard scaling and median imputation
- Classification: LogisticRegression with standard scaling and median imputation
Template Selection
Templates are automatically selected based on model type:| Model Type Prefix | Template |
|---|---|
XGB* or xgb* | xgboost_model.py.jinja |
LGBM* or lgb* | lightgbm_model.py.jinja |
| Regression task | sklearn_regressor.py.jinja |
| Classification task | sklearn_classifier.py.jinja |
Parameter Handling
The generator intelligently handles model parameters:Template-managed parameters are automatically handled by templates and excluded from the parameter string:
random_state,n_jobs,verbose,max_iterprobability,verbosity,nthread,num_threads
Code Validation
All generated code is validated using Python’s AST parser:Template Context
Templates receive the following context variables:Unique experiment identifier
ISO format timestamp of generation
Absolute path to the dataset
Name of the target column
‘classification’ or ‘regression’
sklearn model class name
Dictionary of model hyperparameters
Formatted parameter string for template insertion
Preprocessing configuration:
missing_values: ‘drop’, ‘mean’, ‘median’, ‘mode’, ‘constant’scaling: ‘standard’, ‘minmax’, ‘none’encoding: ‘onehot’, ‘ordinal’target_transform: ‘log’, ‘none’, or None
Hypothesis being tested
Explanation of experiment design
Complete Example
Helper Functions
create_experiment_from_gemini_response()
Convert Gemini’s JSON response to an ExperimentSpec.ValueError- If response is missing required fields:experiment_name,hypothesis,model_type
Error Handling
CodeGenerationError
Custom exception raised when code generation fails:- Template file not found
- Invalid model parameters causing syntax errors
- File system permission issues
- Invalid preprocessing configuration
Source Location
~/workspace/source/src/execution/code_generator.py