Preparing Datasets for verl RL Post-Training

Before launching a verl training job, your dataset must be converted to Parquet format with a specific set of fields that the RL trainer knows how to consume. The preprocessing step is intentionally separate from training: you run a script once to produce .parquet files, then point your training config at those files. This separation keeps the training loop fast and lets you preprocess on CPU without holding GPU resources.

Required Parquet Schema

Every row in your training (and evaluation) Parquet file must contain the following fields:

Field	Type	Description
`data_source`	`string`	Dataset name used to route the sample to the correct reward function in `RewardManager`
`prompt`	`list[dict]`	The conversation prompt in HuggingFace chat-template format — a list of `{"role": ..., "content": ...}` dicts
`ability`	`string`	Task category (e.g., `"math"`, `"code"`, `"qa"`) — used for logging and filtering
`reward_model`	`dict`	Contains at minimum `{"style": "rule", "ground_truth": "<answer>"}` for rule-based rewards
`extra_info`	`dict`	Arbitrary metadata (split name, index, original question text, etc.)

verl applies the model’s chat template automatically at training time using the tokenizer. The prompt field should be a list of role/content dicts — not a pre-formatted string. The tokenizer’s apply_chat_template is called during data loading inside RLHFDataset, so the model always sees the correct format regardless of which model family you use.

The `reward_model` Field

The reward_model dict carries the information needed by the reward function:

{
  "style": "rule",
  "ground_truth": "42"
}

The ground_truth value is extracted from this field at reward-computation time and passed directly to your compute_score function. Its format must exactly match what your reward function expects — for GSM8K, for example, it is a plain numeric string with commas removed.

Writing a Preprocessing Script

Preprocessing scripts follow a consistent two-part structure: load the raw dataset with the HuggingFace datasets library, apply a make_map_fn transformation to each row, then write Parquet files to disk.

Load the Raw Dataset

Use datasets.load_dataset to fetch your dataset from HuggingFace Hub or from a local path.

import datasets

data_source = "openai/gsm8k"
dataset = datasets.load_dataset(data_source, "main")

train_dataset = dataset["train"]
test_dataset = dataset["test"]

Implement extract_solution

Write a helper that extracts the clean answer string from the raw label. For GSM8K, the answer appears after ####:

import re

def extract_solution(solution_str):
    solution = re.search(r"#### (\-?[0-9\.\,]+)", solution_str)
    assert solution is not None
    final_solution = solution.group(0)
    final_solution = final_solution.split("#### ")[1].replace(",", "")
    return final_solution

Implement make_map_fn

Return a process_fn that transforms each raw example into the verl schema:

instruction_following = 'Let\'s think step by step and output the final answer after "####".'

def make_map_fn(split):
    def process_fn(example, idx):
        question_raw = example.pop("question")
        question = question_raw + " " + instruction_following

        answer_raw = example.pop("answer")
        solution = extract_solution(answer_raw)

        data = {
            "data_source": data_source,
            "prompt": [
                {
                    "role": "user",
                    "content": question,
                }
            ],
            "ability": "math",
            "reward_model": {
                "style": "rule",
                "ground_truth": solution,
            },
            "extra_info": {
                "split": split,
                "index": idx,
                "answer": answer_raw,
                "question": question_raw,
            },
        }
        return data

    return process_fn

Apply the Map and Write Parquet

Apply make_map_fn to every row and save the result:

import os
from verl.utils.hdfs_io import copy, makedirs

train_dataset = train_dataset.map(
    function=make_map_fn("train"), with_indices=True
)
test_dataset = test_dataset.map(
    function=make_map_fn("test"), with_indices=True
)

local_save_dir = args.local_save_dir  # e.g. ~/data/gsm8k
train_dataset.to_parquet(os.path.join(local_save_dir, "train.parquet"))
test_dataset.to_parquet(os.path.join(local_save_dir, "test.parquet"))

# Optionally copy to HDFS
if args.hdfs_dir is not None:
    makedirs(args.hdfs_dir)
    copy(src=local_save_dir, dst=args.hdfs_dir)

Parquet is a columnar binary format. Compared to JSONL, it loads dramatically faster during training because column pruning avoids deserializing fields you don’t need, and Arrow memory-mapping enables zero-copy reads. For large datasets (millions of samples), the difference is significant.

Complete GSM8K Example

The full preprocessing script for GSM8K ships at examples/data_preprocess/gsm8k.py. Run it as:

python examples/data_preprocess/gsm8k.py \
    --local_save_dir ~/data/gsm8k \
    --hdfs_dir hdfs://user/data/gsm8k  # optional

Command-line arguments:

Argument	Default	Description
`--local_save_dir`	`~/data/gsm8k`	Directory where `.parquet` files are written
`--local_dataset_path`	`None`	Path to a locally cached raw dataset (skips HuggingFace download)
`--hdfs_dir`	`None`	HDFS destination; if set, files are copied there after local save

Pre-Built Preprocessing Scripts

verl ships preprocessing scripts for several common datasets under examples/data_preprocess/:

Dataset	Script	Task
GSM8K	`gsm8k.py`	Grade-school math
MATH	`math_dataset.py`	Competition math
HellaSwag	`hellaswag.py`	Common-sense reasoning
Full HH-RLHF	`full_hh_rlhf.py`	Dialogue helpfulness/harmlessness

To prepare a dataset not covered by these scripts, copy the structure of the closest existing script and adjust the extract_solution and make_map_fn implementations for your label format.

HDFS Support

Both local and HDFS paths are supported throughout verl. The verl.utils.hdfs_io module provides copy and makedirs helpers that work transparently with hdfs:// prefixed URIs. Set your training config’s data.train_files and data.val_files to either a local glob pattern or an HDFS path:

data:
  train_files: ~/data/gsm8k/train.parquet
  val_files: ~/data/gsm8k/test.parquet

Filtering Overlong Prompts

Very long prompts can cause out-of-memory errors at the start of training. verl can pre-filter them during data loading:

data:
  filter_overlong_prompts: true
  filter_overlong_prompts_workers: 4   # parallel CPU workers for filtering

When enabled, any prompt whose tokenized length exceeds the configured maximum sequence length is discarded before it reaches the GPU workers.

Filtering is applied at load time, so the effective dataset size may differ from the number of rows in your Parquet files. Log the post-filter dataset size to avoid surprises when estimating training duration.

Multi-Turn and Image Inputs

For multi-turn conversations, extend the prompt list with additional {"role": "assistant", ...} and {"role": "user", ...} turns — the chat template handles the rest. For vision-language models, add an images key to the extra_info dict with the image data; the RLHFDataset class will pass it through to the tokenizer/processor.

Get Started

Core Concepts

Algorithms

Workers & Engines

Advanced Usage

Configuration & Reference

Preparing Datasets for verl RL Post-Training

Required Parquet Schema

The `reward_model` Field

Writing a Preprocessing Script

Complete GSM8K Example

Pre-Built Preprocessing Scripts

HDFS Support

Filtering Overlong Prompts

Multi-Turn and Image Inputs

Build docs developers (and LLMs) love

Get Started

Core Concepts

Algorithms

Workers & Engines

Advanced Usage

Configuration & Reference

Documentation Index

​Required Parquet Schema

​The reward_model Field

​Writing a Preprocessing Script

​Complete GSM8K Example

​Pre-Built Preprocessing Scripts

​HDFS Support

​Filtering Overlong Prompts

​Multi-Turn and Image Inputs

Build docs developers (and LLMs) love

Required Parquet Schema

The `reward_model` Field

Writing a Preprocessing Script

Complete GSM8K Example

Pre-Built Preprocessing Scripts

HDFS Support

Filtering Overlong Prompts

Multi-Turn and Image Inputs