PubMedQA RAG Pipeline: Biomedical Question Answering

The PubMedQA pipeline targets the PubMedQA benchmark, a biomedical question answering dataset built from PubMed research abstracts. Each question requires reasoning over dense scientific text involving diseases, biological entities, study designs, and experimental findings. The pipeline uses a rich six-field metadata schema — covering conditions, biological entities, species, study type, findings, and effect direction — to give the WeaviateRetriever highly specific filter predicates, substantially narrowing the candidate set before hybrid search ranks results.

Dataset

Property	Value
HuggingFace ID	`qiaojin/PubMedQA`
Subset	`pqa_artificial`
Split	`test` (10 % held out via `test_size: 0.1`)
Weaviate collection	`PubMedQA`
Complexity type	Biomedical domain QA

Pipeline Class

The PubMedQARAG class is defined in pubmedqa_rag_module.py and subclasses dspy.Module. It applies the same five-stage architecture as other dataset pipelines but is configured with the biomedical metadata schema for more targeted document filtering.

from dspy_opt.pubmedqa.pubmedqa_rag_module import PubMedQARAG

class PubMedQARAG(dspy.Module):
    def __init__(
        self,
        query_rewriter: QueryRewriter,
        sub_query_generator: SubQueryGenerator,
        metadata_extractor: MetadataExtractor,
        metadata_schema: Dict[str, Any],
        weaviate_retriever: WeaviateRetriever,
        embedding_model: SentenceTransformer,
        top_k: int = 3,
    ): ...

    def forward(self, question: str) -> dspy.Prediction:
        """Execute the complete RAG pipeline."""

`forward()` Return Fields

forward() returns a dspy.Prediction containing the following fields:

Field	Description
`question`	The original input question (passed through unchanged)
`rewritten_query`	Search-optimized version of the question produced by `QueryRewriter`
`sub_queries`	List of decomposed sub-queries from `SubQueryGenerator`
`retrieved_context`	Deduplicated list of passages returned by `WeaviateRetriever`
`answer`	Concise answer generated by `dspy.ChainOfThought`
`reasoning`	Explanation of how the answer was derived from the biomedical literature

Metadata Schema

PubMedQA uses the most expressive metadata schema of all five pipelines. The six fields model the key dimensions of biomedical literature, enabling fine-grained filter predicates across diseases, biological entities, species, study design, and results direction.

Field	Type	Description
`diseases_conditions`	string	Diseases, disorders, or medical conditions mentioned in the text
`biological_entities`	string	Genes, proteins, cells, molecules, or biological pathways studied
`species`	string	Species involved in the study (e.g., human, mouse, rat)
`study_type`	string	Type of research study design
`main_findings`	string	Key results or conclusions from the study
`effect_direction`	string	Direction of main effects reported

metadata_schema:
  properties:
    diseases_conditions:
      type: "string"
      description: "Diseases, disorders, or medical conditions mentioned in the text"
    biological_entities:
      type: "string"
      description: "Genes, proteins, cells, molecules, or biological pathways studied"
    species:
      type: "string"
      description: "Species involved in the study (e.g., human, mouse, rat)"
    study_type:
      type: "string"
      description: "Type of research study design"
    main_findings:
      type: "string"
      description: "Key results or conclusions from the study"
    effect_direction:
      type: "string"
      description: "Direction of main effects reported"

The expanded metadata schema is the primary differentiator of the PubMedQA pipeline. When the MetadataExtractor successfully populates several of these fields, the Weaviate filter can exclude large portions of the abstract corpus that are irrelevant to the disease, species, or study design in question.

Models

Role	Model
Answer LLM	`groq/qwen3-32b`
Extractor LLM	`groq/llama-3.3-70b-versatile`
Embedding	`Qwen/Qwen3-Embedding-0.6B`
Evaluator LLM	`groq/qwen3-32b`

Scripts

Script	Description
`pubmedqa_indexing.py`	Load dataset from HuggingFace, extract metadata, embed, and store in Weaviate
`pubmedqa_rag_module.py`	Pipeline class definition — imported by optimizer and evaluation scripts
`pubmedqa_rag_mipro.py`	Run MIPROv2 optimization
`pubmedqa_rag_copro.py`	Run COPRO optimization
`pubmedqa_rag_bootstrap_few_shot.py`	Run BootstrapFewShot optimization
`pubmedqa_rag_simba.py`	Run SIMBA optimization
`pubmedqa_rag_gepa.py`	Run GEPA optimization
`pubmedqa_rag_evaluation.py`	Evaluate the optimized pipeline with DeepEval metrics

Configuration Files

File	Description
`pubmedqa_indexing_config.yml`	Indexing parameters: embedding model, metadata schema, collection name
`pubmedqa_rag_mipro_config.yml`	MIPROv2 parameters: `max_bootstrapped_demos`, `max_labeled_demos`, `auto`
`pubmedqa_rag_copro_config.yml`	COPRO parameters: `breadth`, `depth`, `init_temperature`
`pubmedqa_rag_bootstrap_few_shot_config.yml`	BootstrapFewShot parameters: `max_bootstrapped_demos`, `max_rounds`
`pubmedqa_rag_simba_config.yml`	SIMBA parameters: `bsize`, `num_candidates`, `max_steps`, `max_demos`
`pubmedqa_rag_gepa_config.yml`	GEPA parameters: `max_full_evals`, `reflection_minibatch_size`, `candidate_selection_strategy`
`pubmedqa_rag_evaluation_config.yml`	Evaluation settings and DeepEval metric thresholds

MIPROv2 Configuration

answer_llm:
  model: "groq/qwen3-32b"
  api_key_env: "GROQ_API_KEY"

extractor_llm:
  model: "groq/llama-3.3-70b-versatile"
  api_key_env: "GROQ_API_KEY"

embedding:
  embedding_model: "Qwen/Qwen3-Embedding-0.6B"
  tokenizer_kwargs:
    padding_side: "left"

weaviate:
  url_env: "WEAVIATE_URL"
  api_key_env: "WEAVIATE_API_KEY"
  collection_name: "PubMedQA"
  top_k: 5

dataset:
  name: "qiaojin/PubMedQA"
  subset: "pqa_artificial"
  split: "test"
  test_size: 0.1

optimizer:
  max_bootstrapped_demos: 3
  max_labeled_demos: 16
  auto: "medium"

Running the Pipeline

All scripts must be run from the pubmedqa/ directory so that relative config file paths resolve correctly.

# Index documents into Weaviate
cd src/dspy_opt/pubmedqa
python pubmedqa_indexing.py

# Run MIPROv2 optimization
cd src/dspy_opt/pubmedqa
python pubmedqa_rag_mipro.py

# Run SIMBA optimization
cd src/dspy_opt/pubmedqa
python pubmedqa_rag_simba.py

# Run GEPA optimization
cd src/dspy_opt/pubmedqa
python pubmedqa_rag_gepa.py

# Evaluate the optimized pipeline
cd src/dspy_opt/pubmedqa
python pubmedqa_rag_evaluation.py

Programmatic Usage

import dspy
from sentence_transformers import SentenceTransformer

from dspy_opt.pubmedqa.pubmedqa_rag_module import PubMedQARAG
from dspy_opt.utils.metadata_extractor import MetadataExtractor
from dspy_opt.utils.query_rewriter import QueryRewriter
from dspy_opt.utils.sub_query_generator import SubQueryGenerator
from dspy_opt.utils.weaviate_retriever import WeaviateRetriever

# Configure LLMs
answer_lm = dspy.LM("groq/qwen3-32b", api_key="your-groq-api-key")
extractor_lm = dspy.LM("groq/llama-3.3-70b-versatile", api_key="your-groq-api-key")
dspy.configure(lm=answer_lm)

# Initialize components
query_rewriter = QueryRewriter()
sub_query_generator = SubQueryGenerator()
metadata_extractor = MetadataExtractor(extractor_llm=extractor_lm)
embedding_model = SentenceTransformer("Qwen/Qwen3-Embedding-0.6B")

retriever = WeaviateRetriever(
    weaviate_url="your-weaviate-url",
    weaviate_api_key="your-weaviate-api-key",
    collection_name="PubMedQA",
    top_k=5,
)

metadata_schema = {
    "properties": {
        "diseases_conditions": {
            "type": "string",
            "description": "Diseases, disorders, or medical conditions mentioned in the text",
        },
        "biological_entities": {
            "type": "string",
            "description": "Genes, proteins, cells, molecules, or biological pathways studied",
        },
        "species": {
            "type": "string",
            "description": "Species involved in the study (e.g., human, mouse, rat)",
        },
        "study_type": {"type": "string", "description": "Type of research study design"},
        "main_findings": {
            "type": "string",
            "description": "Key results or conclusions from the study",
        },
        "effect_direction": {
            "type": "string",
            "description": "Direction of main effects reported",
        },
    }
}

# Build and run the pipeline
pipeline = PubMedQARAG(
    query_rewriter=query_rewriter,
    sub_query_generator=sub_query_generator,
    metadata_extractor=metadata_extractor,
    metadata_schema=metadata_schema,
    weaviate_retriever=retriever,
    embedding_model=embedding_model,
    top_k=5,
)

result = pipeline("Does metformin reduce cardiovascular mortality in type 2 diabetes?")
print(result.answer)
print(result.reasoning)

Get Started

Core Concepts

Pipeline Components

Dataset Pipelines

Guides

PubMedQA RAG Pipeline: Biomedical Question Answering

Dataset

Pipeline Class

`forward()` Return Fields

Metadata Schema

Models

Scripts

Configuration Files

MIPROv2 Configuration

Running the Pipeline

Programmatic Usage

Build docs developers (and LLMs) love

Get Started

Core Concepts

Pipeline Components

Dataset Pipelines

Guides

Documentation Index

​Dataset

​Pipeline Class

​forward() Return Fields

​Metadata Schema

​Models

​Scripts

​Configuration Files

​MIPROv2 Configuration

​Running the Pipeline

​Programmatic Usage

Build docs developers (and LLMs) love

Dataset

Pipeline Class

`forward()` Return Fields

Metadata Schema

Models

Scripts

Configuration Files

MIPROv2 Configuration

Running the Pipeline

Programmatic Usage