CodeGenerator API

Overview

The CodeGenerator class generates executable Python ML experiment scripts from Jinja2 templates. It supports multiple model types and automatically handles parameter formatting and code validation.

Class Definition

CodeGenerator

from src.execution.code_generator import CodeGenerator
from pathlib import Path

generator = CodeGenerator(templates_dir=Path("templates/"))

templates_dir

Optional[Path]

Path to templates directory. Defaults to project’s templates/ folder

Methods

generate()

Generate an experiment script from a specification.

script_path = generator.generate(
    spec=experiment_spec,
    data_path=Path("data.csv"),
    target_column="target",
    task_type="classification",
    output_dir=Path("experiments/")
)

spec

ExperimentSpec

required

Experiment specification containing:

experiment_name: Unique name for the experiment
hypothesis: Hypothesis being tested
model_type: sklearn model class name (e.g., “RandomForestClassifier”)
model_params: Dictionary of model hyperparameters
preprocessing: PreprocessingConfig object
reasoning: Explanation of experiment design

data_path

Path

required

Path to the dataset file

target_column

str

required

Name of the target column

task_type

str

required

Type of ML task: 'classification' or 'regression'

output_dir

Optional[Path]

Output directory for generated script. Defaults to project’s experiments/ folder

Returns: Path - Path to the generated Python script Raises:

CodeGenerationError - If template not found, code has syntax errors, or file write fails

generate_baseline()

Generate a baseline experiment script with default settings.

script_path = generator.generate_baseline(
    data_path=Path("data.csv"),
    target_column="target",
    task_type="regression",
    output_dir=Path("experiments/")
)

data_path

Path

required

Path to the dataset file

target_column

str

required

Name of the target column

task_type

str

required

Type of ML task: 'classification' or 'regression'

output_dir

Optional[Path]

Output directory for generated script

Returns: Path - Path to the generated baseline script Baseline models:

Regression: LinearRegression with standard scaling and median imputation
Classification: LogisticRegression with standard scaling and median imputation

Template Selection

Templates are automatically selected based on model type:

Model Type Prefix	Template
`XGB` or `xgb`	`xgboost_model.py.jinja`
`LGBM` or `lgb`	`lightgbm_model.py.jinja`
Regression task	`sklearn_regressor.py.jinja`
Classification task	`sklearn_classifier.py.jinja`

Parameter Handling

The generator intelligently handles model parameters:

Template-managed parameters are automatically handled by templates and excluded from the parameter string:

random_state, n_jobs, verbose, max_iter
probability, verbosity, nthread, num_threads

Custom parameters are formatted and injected into the template:

# Example parameter formatting
params = {
    "n_estimators": 100,
    "max_depth": 5,
    "learning_rate": 0.1,
    "criterion": "gini"
}
# Formatted as: "n_estimators=100, max_depth=5, learning_rate=0.1, criterion='gini', "

Code Validation

All generated code is validated using Python’s AST parser:

import ast
ast.parse(generated_code)  # Raises SyntaxError if invalid

If generated code has syntax errors, CodeGenerationError is raised with the specific line number and error message.

Template Context

Templates receive the following context variables:

experiment_name

str

Unique experiment identifier

timestamp

str

ISO format timestamp of generation

data_path

str

Absolute path to the dataset

target_column

str

Name of the target column

task_type

str

‘classification’ or ‘regression’

model_type

str

sklearn model class name

model_params

dict

Dictionary of model hyperparameters

model_params_str

str

Formatted parameter string for template insertion

preprocessing

PreprocessingConfig

Preprocessing configuration:

missing_values: ‘drop’, ‘mean’, ‘median’, ‘mode’, ‘constant’
scaling: ‘standard’, ‘minmax’, ‘none’
encoding: ‘onehot’, ‘ordinal’
target_transform: ‘log’, ‘none’, or None

hypothesis

str

Hypothesis being tested

reasoning

str

Explanation of experiment design

Complete Example

from pathlib import Path
from src.execution.code_generator import CodeGenerator
from src.orchestration.state import ExperimentSpec, PreprocessingConfig

# Create experiment specification
spec = ExperimentSpec(
    experiment_name="rf_tuned_depth",
    hypothesis="Limiting tree depth will reduce overfitting",
    model_type="RandomForestClassifier",
    model_params={
        "n_estimators": 200,
        "max_depth": 10,
        "min_samples_split": 5,
        "class_weight": "balanced"
    },
    preprocessing=PreprocessingConfig(
        missing_values="median",
        scaling="standard",
        encoding="onehot"
    ),
    reasoning="Previous experiment showed overfitting with unlimited depth"
)

# Initialize generator
generator = CodeGenerator()

# Generate script
script_path = generator.generate(
    spec=spec,
    data_path=Path("data/titanic.csv"),
    target_column="survived",
    task_type="classification",
    output_dir=Path("experiments/")
)

print(f"Generated script: {script_path}")
# Output: Generated script: experiments/rf_tuned_depth.py

Helper Functions

create_experiment_from_gemini_response()

Convert Gemini’s JSON response to an ExperimentSpec.

from src.execution.code_generator import create_experiment_from_gemini_response

gemini_response = {
    "experiment_name": "xgboost_tuned",
    "hypothesis": "XGBoost will handle non-linear patterns better",
    "model_type": "XGBClassifier",
    "model_params": {"max_depth": 6, "n_estimators": 100},
    "preprocessing": {
        "missing_values": "median",
        "scaling": "standard",
        "encoding": "onehot"
    },
    "reasoning": "Data shows complex interactions"
}

spec = create_experiment_from_gemini_response(gemini_response)

Raises:

ValueError - If response is missing required fields: experiment_name, hypothesis, model_type

Error Handling

CodeGenerationError

Custom exception raised when code generation fails:

try:
    script_path = generator.generate(spec, data_path, target_column, task_type)
except CodeGenerationError as e:
    print(f"Code generation failed: {e}")

Common causes:

Template file not found
Invalid model parameters causing syntax errors
File system permission issues
Invalid preprocessing configuration

Source Location

~/workspace/source/src/execution/code_generator.py

Cognitive Components

Execution Layer

Orchestration

Persistence

Overview

Class Definition

CodeGenerator

Methods

generate()

generate_baseline()

Template Selection

Parameter Handling

Code Validation

Template Context

Complete Example

Helper Functions

create_experiment_from_gemini_response()

Error Handling

CodeGenerationError

Source Location

Build docs developers (and LLMs) love

Cognitive Components

Execution Layer

Orchestration

Persistence

​Overview

​Class Definition

​CodeGenerator

​Methods

​generate()

​generate_baseline()

​Template Selection

​Parameter Handling

​Code Validation

​Template Context

​Complete Example

​Helper Functions

​create_experiment_from_gemini_response()

​Error Handling

​CodeGenerationError

​Source Location

Build docs developers (and LLMs) love

Overview

Class Definition

CodeGenerator

Methods

generate()

generate_baseline()

Template Selection

Parameter Handling

Code Validation

Template Context

Complete Example

Helper Functions

create_experiment_from_gemini_response()

Error Handling

CodeGenerationError

Source Location