Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/avnlp/dspy-opt/llms.txt

Use this file to discover all available pages before exploring further.

DSPy-Opt ships five ready-to-use RAG pipeline implementations, each targeting a different question answering benchmark. Together they span the full breadth of QA difficulty — from fast single-hop lookups and false-premise debunking through multi-hop entity chaining, biomedical abstract reasoning, trivia factoid recall, and broad general-knowledge retrieval over cleaned Wikipedia articles. Every pipeline shares the same five-stage architecture and the same optimizer scripts pattern, so switching datasets or adding a new one requires only minimal changes.

Supported Datasets

DatasetHuggingFace IDDescriptionComplexity Type
FreshQA (SealQA)vtllms/sealqaDynamic QA benchmark with diverse question types and false-premise debunkingSingle-hop
HotpotQAhotpotqa/hotpot_qaMulti-hop questions with strong supervision for supporting factsMulti-hop
PubMedQAqiaojin/PubMedQABiomedical QA based on PubMed abstractsBiomedical
TriviaQAmandarjoshi/trivia_qaQuestion-answer-evidence triples authored by trivia enthusiastsTrivia, factoid
Wikipediawikimedia/wikipedia + WikiQALarge-scale cleaned Wikipedia articles with WikiQA pairs for QAGeneral knowledge

Shared Five-Stage Pipeline Architecture

Although every dataset uses a different module class and metadata schema, all five pipelines execute the same logical stages in forward(). Each stage is implemented as a reusable component drawn from dspy_opt.utils.
1

Query Rewriting

QueryRewriter expands synonyms, removes conversational noise, and produces a search-optimized version of the original question.
2

Sub-Query Generation

SubQueryGenerator decomposes the rewritten query into multiple self-contained sub-queries that can be executed in parallel, boosting retrieval coverage for complex questions.
3

Metadata Extraction

MetadataExtractor calls an LLM with structured-output generation to parse each query against a dataset-specific JSON schema, producing typed filter values for Weaviate.
4

Hybrid Retrieval

WeaviateRetriever performs hybrid search (vector + keyword) on the main query and all sub-queries, applying the extracted metadata as filters. Results are aggregated and deduplicated.
5

Answer Generation

dspy.ChainOfThought receives the unique retrieved passages and generates a final answer together with an explicit reasoning trace.

Optimizer Scripts Pattern

All five dataset directories follow the same file-naming convention. Each dataset exposes scripts for every supported DSPy optimizer and a dedicated evaluation script:
<dataset>_indexing.py                    # Index documents to Weaviate
<dataset>_rag_module.py                  # Pipeline class definition
<dataset>_rag_mipro.py                   # MIPROv2 optimization
<dataset>_rag_copro.py                   # COPRO optimization
<dataset>_rag_bootstrap_few_shot.py      # BootstrapFewShot optimization
<dataset>_rag_simba.py                   # SIMBA optimization
<dataset>_rag_gepa.py                    # GEPA optimization
<dataset>_rag_evaluation.py              # DeepEval evaluation
Each script reads its hyperparameters from a matching _config.yml file that lives in the same directory. All scripts must be run from inside the dataset subdirectory so that relative config paths resolve correctly.
All pipelines share the same five DeepEval metrics: Answer Relevancy, Faithfulness, Contextual Precision, Contextual Recall, and Contextual Relevancy. Thresholds are configured per-dataset in each _evaluation_config.yml.

Dataset Pages

FreshQA

Single-hop and false-premise debunking on the SealQA longseal benchmark. Uses title and category metadata filters.

HotpotQA

Multi-hop reasoning over distractor documents. Uses title and category metadata filters with entity chaining.

PubMedQA

Biomedical domain QA from PubMed abstracts. Uses a rich six-field metadata schema for high-precision filtering.

TriviaQA

Factoid and trivia questions with typed metadata including an enum content_type and numeric year field.

Wikipedia

General knowledge QA over cleaned Wikipedia articles indexed from wikimedia/wikipedia with WikiQA pairs.

Build docs developers (and LLMs) love