Run MIPROv2, COPRO, SIMBA, GEPA, and Bootstrap Optimizers

DSPy optimizers improve a RAG pipeline by automatically searching over the space of prompt instructions and few-shot demonstrations. Each optimizer script loads a YAML config, initialises all pipeline components, splits the HuggingFace dataset into train and test sets, runs the optimizer to produce a compiled program, saves the compiled state to a JSON file, and then evaluates it using dspy.Evaluate with DeepEval metrics. All scripts follow the same structure — only the optimizer class and its hyperparameters differ.

All optimizer scripts must be run from inside the dataset directory (e.g. src/dspy_opt/freshqa/). Config YAML files are opened with a relative path, so the working directory must match the script location.

What Happens During Optimization

Load the YAML config

The script opens <dataset>_rag_<optimizer>_config.yml from the current directory. This single file controls every configurable aspect of the run — models, Weaviate connection, dataset coordinates, metric thresholds, optimizer hyperparameters, and evaluation settings.

Initialise LLMs, embedding model, and Weaviate retriever

Two dspy.LM instances are created: an answer_llm configured as the default DSPy LM (dspy.configure(lm=answer_llm)), and a separate extractor_llm for metadata extraction. A SentenceTransformer embedding model and a WeaviateRetriever are also initialised from config values.

Load and split the dataset

The dataset is fetched from HuggingFace and split into train and test sets using the test_size fraction from the config. Both sets are converted into dspy.Example objects with a question input field and a answer label field.

dataset = load_dataset(
    config["dataset"]["name"],
    config["dataset"]["subset"],
    split=config["dataset"]["split"],
)
dataset = dataset.train_test_split(test_size=config["dataset"]["test_size"])
trainset = [
    dspy.Example(question=question, answer=answer).with_inputs("question")
    for question, answer in zip(dataset["train"]["question"], dataset["train"]["answer"])
]

Compile the pipeline with the optimizer

The optimizer is instantiated with its hyperparameters from the config and compile() is called with the uninitialised RAG pipeline and the training set. The optimizer explores prompt variants and demo combinations, evaluating each using the DeepEval metrics function.

optimizer = dspy.MIPROv2(
    metric=metrics_function,
    max_bootstrapped_demos=config["optimizer"]["max_bootstrapped_demos"],
    max_labeled_demos=config["optimizer"]["max_labeled_demos"],
    auto=config["optimizer"]["auto"],
)
optimized_rag = optimizer.compile(rag_pipeline, trainset=trainset)

Save the optimized pipeline

The compiled program is saved to a JSON file in the current directory. This file captures all tuned instructions and selected few-shot demonstrations and can be reloaded without re-running optimization.

optimized_rag.save("optimized_rag_mipro.json")

Evaluate on the test set

dspy.Evaluate runs the compiled pipeline on the held-out test set across multiple threads, reporting per-example scores and an aggregate result using the same DeepEval metrics function.

evaluate = dspy.Evaluate(
    devset=testset,
    num_threads=config["evaluation"]["settings"]["num_threads"],
    display_progress=config["evaluation"]["settings"]["display_progress"],
    display_table=config["evaluation"]["settings"]["display_table"],
    provide_traceback=config["evaluation"]["settings"]["provide_traceback"],
)
results = evaluate(optimized_rag, metric=metrics_function)
print(results)

MIPROv2

MIPROv2 jointly optimises prompt instructions and few-shot demonstrations using Bayesian search. It is the recommended general-purpose optimizer when sufficient search budget is available.

cd src/dspy_opt/freshqa
python freshqa_rag_mipro.py

COPRO

COPRO performs instruction-only optimisation via coordinate ascent, proposing and evaluating instruction edits across a breadth/depth schedule. Use it when you want fast prompt-only gains.

cd src/dspy_opt/freshqa
python freshqa_rag_copro.py

BootstrapFewShot

BootstrapFewShotWithRandomSearch focuses purely on few-shot demo selection. It bootstraps candidate demonstrations by running the pipeline on training examples, then runs random search over demo subsets. Useful as a baseline before joint optimization.

cd src/dspy_opt/freshqa
python freshqa_rag_bootstrap_few_shot.py

SIMBA

SIMBA (Stochastic Introspective Mini-Batch Ascent) samples mini-batches from the training set, identifies challenging examples, and uses the LLM to generate self-reflective improvement rules or demonstrations. It is more efficient than full-eval search on larger training sets.

cd src/dspy_opt/freshqa
python freshqa_rag_simba.py

GEPA

GEPA (Genetic-Pareto) evolves prompts using a reflection-driven loop. A separate reflection LLM analyses execution traces and textual feedback from the metric function, then proposes improved instructions. Candidates are managed via a Pareto frontier to balance exploration and retention.

GEPA requires create_gepa_metrics_function() instead of the standard create_metrics_function(). The GEPA metric function returns a dspy.Prediction containing both a numeric score and a per-metric feedback string. GEPA’s reflection LLM consumes this textual feedback to diagnose failures and propose targeted prompt improvements. Additionally, a reflection_llm section is required in the GEPA config file.

cd src/dspy_opt/freshqa
python freshqa_rag_gepa.py

Standalone Evaluation

After optimization, you can re-evaluate a saved pipeline state at any time without re-running the optimizer. The evaluation script loads the pipeline from the saved JSON, reconstitutes all components from the evaluation config, and runs dspy.Evaluate on the test set.

cd src/dspy_opt/freshqa
python freshqa_rag_evaluation.py

Saving and Loading Compiled Pipelines

Each optimizer script saves the compiled program to a JSON file immediately after optimization completes:

# Save after optimization
optimized_rag.save("optimized_rag_mipro.json")

To reload a previously compiled pipeline and skip re-optimization:

import dspy
from dspy_opt.freshqa.freshqa_rag_module import FreshQARAG

# Reconstruct the uninitialised pipeline (same init args as during optimization)
rag_pipeline = FreshQARAG(
    query_rewriter=query_rewriter,
    sub_query_generator=sub_query_generator,
    metadata_extractor=metadata_extractor,
    metadata_schema=metadata_schema,
    weaviate_retriever=weaviate_retriever,
    embedding_model=model,
    top_k=5,
)

# Load the saved optimized state
rag_pipeline.load("optimized_rag_mipro.json")

# Run inference
result = rag_pipeline(question="What is the capital of France?")
print(result.answer)

Optimizer Comparison

Optimizer	Script suffix	What it tunes	Key hyperparameters
MIPROv2	`_mipro.py`	Instructions + few-shot (jointly)	`max_bootstrapped_demos`, `max_labeled_demos`, `auto`
COPRO	`_copro.py`	Instructions only	`breadth`, `depth`, `init_temperature`
BootstrapFewShot	`_bootstrap_few_shot.py`	Few-shot examples only	`max_bootstrapped_demos`, `max_labeled_demos`, `max_rounds`
SIMBA	`_simba.py`	Rules + few-shot (mini-batch)	`bsize`, `num_candidates`, `max_steps`, `max_demos`
GEPA	`_gepa.py`	Instructions + few-shot (reflective)	`max_full_evals`, `reflection_minibatch_size`, `candidate_selection_strategy`, `use_merge`

Get Started

Core Concepts

Pipeline Components

Dataset Pipelines

Guides

Run MIPROv2, COPRO, SIMBA, GEPA, and Bootstrap Optimizers

What Happens During Optimization

MIPROv2

COPRO

BootstrapFewShot

SIMBA

GEPA

Standalone Evaluation

Saving and Loading Compiled Pipelines

Optimizer Comparison

Build docs developers (and LLMs) love

Get Started

Core Concepts

Pipeline Components

Dataset Pipelines

Guides

Documentation Index

​What Happens During Optimization

​MIPROv2

​COPRO

​BootstrapFewShot

​SIMBA

​GEPA

​Standalone Evaluation

​Saving and Loading Compiled Pipelines

​Optimizer Comparison

Build docs developers (and LLMs) love

What Happens During Optimization

MIPROv2

COPRO

BootstrapFewShot

SIMBA

GEPA

Standalone Evaluation

Saving and Loading Compiled Pipelines

Optimizer Comparison