DSPy optimizers improve a RAG pipeline by automatically searching over the space of prompt instructions and few-shot demonstrations. Each optimizer script loads a YAML config, initialises all pipeline components, splits the HuggingFace dataset into train and test sets, runs the optimizer to produce a compiled program, saves the compiled state to a JSON file, and then evaluates it usingDocumentation Index
Fetch the complete documentation index at: https://mintlify.com/avnlp/dspy-opt/llms.txt
Use this file to discover all available pages before exploring further.
dspy.Evaluate with DeepEval metrics. All scripts follow the same structure — only the optimizer class and its hyperparameters differ.
All optimizer scripts must be run from inside the dataset directory (e.g.
src/dspy_opt/freshqa/). Config YAML files are opened with a relative path, so the working directory must match the script location.What Happens During Optimization
Load the YAML config
The script opens
<dataset>_rag_<optimizer>_config.yml from the current directory. This single file controls every configurable aspect of the run — models, Weaviate connection, dataset coordinates, metric thresholds, optimizer hyperparameters, and evaluation settings.Initialise LLMs, embedding model, and Weaviate retriever
Two
dspy.LM instances are created: an answer_llm configured as the default DSPy LM (dspy.configure(lm=answer_llm)), and a separate extractor_llm for metadata extraction. A SentenceTransformer embedding model and a WeaviateRetriever are also initialised from config values.Load and split the dataset
The dataset is fetched from HuggingFace and split into train and test sets using the
test_size fraction from the config. Both sets are converted into dspy.Example objects with a question input field and a answer label field.Compile the pipeline with the optimizer
The optimizer is instantiated with its hyperparameters from the config and
compile() is called with the uninitialised RAG pipeline and the training set. The optimizer explores prompt variants and demo combinations, evaluating each using the DeepEval metrics function.Save the optimized pipeline
The compiled program is saved to a JSON file in the current directory. This file captures all tuned instructions and selected few-shot demonstrations and can be reloaded without re-running optimization.
MIPROv2
MIPROv2 jointly optimises prompt instructions and few-shot demonstrations using Bayesian search. It is the recommended general-purpose optimizer when sufficient search budget is available.COPRO
COPRO performs instruction-only optimisation via coordinate ascent, proposing and evaluating instruction edits across a breadth/depth schedule. Use it when you want fast prompt-only gains.BootstrapFewShot
BootstrapFewShotWithRandomSearch focuses purely on few-shot demo selection. It bootstraps candidate demonstrations by running the pipeline on training examples, then runs random search over demo subsets. Useful as a baseline before joint optimization.
SIMBA
SIMBA (Stochastic Introspective Mini-Batch Ascent) samples mini-batches from the training set, identifies challenging examples, and uses the LLM to generate self-reflective improvement rules or demonstrations. It is more efficient than full-eval search on larger training sets.GEPA
GEPA (Genetic-Pareto) evolves prompts using a reflection-driven loop. A separate reflection LLM analyses execution traces and textual feedback from the metric function, then proposes improved instructions. Candidates are managed via a Pareto frontier to balance exploration and retention.GEPA requires
create_gepa_metrics_function() instead of the standard create_metrics_function(). The GEPA metric function returns a dspy.Prediction containing both a numeric score and a per-metric feedback string. GEPA’s reflection LLM consumes this textual feedback to diagnose failures and propose targeted prompt improvements. Additionally, a reflection_llm section is required in the GEPA config file.Standalone Evaluation
After optimization, you can re-evaluate a saved pipeline state at any time without re-running the optimizer. The evaluation script loads the pipeline from the saved JSON, reconstitutes all components from the evaluation config, and runsdspy.Evaluate on the test set.
Saving and Loading Compiled Pipelines
Each optimizer script saves the compiled program to a JSON file immediately after optimization completes:Optimizer Comparison
| Optimizer | Script suffix | What it tunes | Key hyperparameters |
|---|---|---|---|
| MIPROv2 | _mipro.py | Instructions + few-shot (jointly) | max_bootstrapped_demos, max_labeled_demos, auto |
| COPRO | _copro.py | Instructions only | breadth, depth, init_temperature |
| BootstrapFewShot | _bootstrap_few_shot.py | Few-shot examples only | max_bootstrapped_demos, max_labeled_demos, max_rounds |
| SIMBA | _simba.py | Rules + few-shot (mini-batch) | bsize, num_candidates, max_steps, max_demos |
| GEPA | _gepa.py | Instructions + few-shot (reflective) | max_full_evals, reflection_minibatch_size, candidate_selection_strategy, use_merge |