DSPy-Opt supports five optimizers that automatically improve the RAG pipeline by searching over instructions (the prompt text embedded in each DSPy module) and/or few-shot demonstrations (examples selected or bootstrapped from the training set). The optimization objective is always provided by the DeepEval metrics loop — every candidate program is scored against Answer Relevancy, Faithfulness, Contextual Precision, Contextual Recall, and Contextual Relevancy before the optimizer decides which direction to explore next. GEPA additionally requires a dedicated reflection LLM and a feedback-producing metric function that returns textual per-metric explanations alongside the numeric score.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/avnlp/dspy-opt/llms.txt
Use this file to discover all available pages before exploring further.
Optimizer Comparison
| Optimizer | What it tunes | Search strategy | Recommended for | Hyperparameters |
|---|---|---|---|---|
| MIPROv2 | Instructions + Few-shot examples (jointly) | Bayesian optimization over candidate prompt/demo sets | Strong general-purpose default; sufficient search budget available | max_bootstrapped_demos, max_labeled_demos, auto |
| COPRO | Instructions only | Coordinate ascent over instruction variants | Quick prompt-only gains; testing whether instruction tuning alone helps | breadth, depth, init_temperature |
| BootstrapFewShotWithRandomSearch | Few-shot examples only | Random search over bootstrapped demo subsets | Measuring demo impact as a baseline before joint optimization | max_bootstrapped_demos, max_labeled_demos, max_rounds |
| SIMBA | Rules/instructions + few-shot examples | Mini-batch iterative ascent with self-reflective rule generation | Efficient batch-based optimization on larger training sets | bsize, num_candidates, max_steps, max_demos |
| GEPA | Instructions + few-shot examples (reflective evolution) | Pareto-based candidate selection with LLM reflection on failures | Reflection-driven improvements with multi-metric trade-offs | max_full_evals, reflection_minibatch_size, candidate_selection_strategy, use_merge |
Optimizer Details
MIPROv2 — Bayesian Joint Optimization
MIPROv2 — Bayesian Joint Optimization
MIPROv2 (Multiprompt Instruction PROposal Optimizer v2) jointly optimizes instructions and few-shot demonstrations using Bayesian search. It operates in three sequential stages:Instantiation:
- Bootstrap demos — runs the uncompiled pipeline on training examples and collects high-scoring traces as candidate demonstrations.
- Propose instructions — generates candidate instruction variants grounded in dataset summaries and code context.
- Search combinations — uses Bayesian optimization with mini-batch evaluation to efficiently explore the space of instruction/demo combinations, converging on the highest-scoring compiled program.
auto="medium" for a balanced budget, or auto="light" / auto="heavy" to trade speed against thoroughness.Key parameters (from freshqa_rag_mipro_config.yml):COPRO — Coordinate Ascent Instruction Optimization
COPRO — Coordinate Ascent Instruction Optimization
COPRO performs instruction-only optimization via coordinate ascent. It iteratively proposes instruction edits across a breadth/depth schedule, evaluates each variant against the metric, and keeps changes that improve performance. Instructions are optimized independently per DSPy module, making COPRO fast when few-shot selection is not required.Use COPRO when you want to measure how much instruction tuning alone can improve the pipeline before committing to a more expensive joint search.Key parameters (from Instantiation:
freshqa_rag_copro_config.yml):BootstrapFewShotWithRandomSearch — Demo Selection Baseline
BootstrapFewShotWithRandomSearch — Demo Selection Baseline
BootstrapFewShotWithRandomSearch focuses purely on few-shot demonstration selection. It bootstraps candidate demonstrations by running the pipeline on training examples and filtering for high-scoring traces, then runs random search over demo subsets to find the combination that maximizes the metric. No instruction text is modified.This optimizer is the natural baseline to run before joint optimization — it quantifies how much of the potential gain comes from demonstrations alone versus from instruction tuning.Key parameters (from Instantiation:
freshqa_rag_bootstrap_few_shot_config.yml):SIMBA — Stochastic Introspective Mini-Batch Ascent
SIMBA — Stochastic Introspective Mini-Batch Ascent
SIMBA samples mini-batches from the training set, identifies challenging examples with high output variability, then uses the LLM to introspectively generate self-reflective improvement rules or add successful examples as demonstrations. This batch-based approach is more computationally efficient than full-evaluation search on larger training sets, since it never needs to score the entire training set in a single pass.SIMBA jointly tunes both rule-based instructions and few-shot demonstrations, making it a strong choice when the training set is large enough that MIPROv2’s full-eval passes would be too slow.Key parameters (from Instantiation:
freshqa_rag_simba_config.yml):GEPA — Genetic-Pareto Reflective Evolution
GEPA — Genetic-Pareto Reflective Evolution
GEPA (Genetic-Pareto) evolves prompts using a reflection-driven loop. A separate reflection LLM — configured independently of the answer LLM — analyzes execution traces and the textual feedback produced by Instantiation:
create_gepa_metrics_function(), then proposes improved instructions. Candidate programs are managed via a Pareto frontier: only programs that achieve the highest score on at least one training instance are retained, ensuring exploration of diverse strategies rather than convergence on a single local optimum. GEPA also supports candidate merging/crossover across lineages via use_merge=True.Key parameters (from freshqa_rag_gepa_config.yml):Choosing an Optimizer
MIPROv2
Best default choice. Jointly tunes instructions and demonstrations via Bayesian search. Use when you have a moderate training set and want the strongest out-of-the-box results.
COPRO
Fastest prompt-only gain. Only modifies instruction text. Use when you want to quickly validate whether better prompts alone help before running a full joint search.
BootstrapFewShot
Demonstration baseline. Only selects few-shot examples. Run this first to understand how much demonstrations contribute before adding instruction optimization.
SIMBA
Efficient on large training sets. Mini-batch iterative ascent avoids expensive full-eval passes. Use when the training set is too large for MIPROv2’s per-candidate full evaluation.
GEPA
Multi-metric reflection. Pareto-based evolution with an LLM reflection loop. Use when you want the optimizer to reason about why specific metrics are failing and adapt accordingly.
Running an Optimizer
Each optimizer script follows the same pattern: load a YAML config, initialize components, build the pipeline, run.compile(), save the result, and evaluate on the test set.
optimized_rag_mipro.json) and can be reloaded with rag_pipeline.load("optimized_rag_mipro.json") for inference or further evaluation.