Before launching a verl training job, your dataset must be converted to Parquet format with a specific set of fields that the RL trainer knows how to consume. The preprocessing step is intentionally separate from training: you run a script once to produceDocumentation Index
Fetch the complete documentation index at: https://mintlify.com/verl-project/verl/llms.txt
Use this file to discover all available pages before exploring further.
.parquet files, then point your training config at those files. This separation keeps the training loop fast and lets you preprocess on CPU without holding GPU resources.
Required Parquet Schema
Every row in your training (and evaluation) Parquet file must contain the following fields:| Field | Type | Description |
|---|---|---|
data_source | string | Dataset name used to route the sample to the correct reward function in RewardManager |
prompt | list[dict] | The conversation prompt in HuggingFace chat-template format — a list of {"role": ..., "content": ...} dicts |
ability | string | Task category (e.g., "math", "code", "qa") — used for logging and filtering |
reward_model | dict | Contains at minimum {"style": "rule", "ground_truth": "<answer>"} for rule-based rewards |
extra_info | dict | Arbitrary metadata (split name, index, original question text, etc.) |
verl applies the model’s chat template automatically at training time using the tokenizer. The
prompt field should be a list of role/content dicts — not a pre-formatted string. The tokenizer’s apply_chat_template is called during data loading inside RLHFDataset, so the model always sees the correct format regardless of which model family you use.The reward_model Field
The reward_model dict carries the information needed by the reward function:
ground_truth value is extracted from this field at reward-computation time and passed directly to your compute_score function. Its format must exactly match what your reward function expects — for GSM8K, for example, it is a plain numeric string with commas removed.
Writing a Preprocessing Script
Preprocessing scripts follow a consistent two-part structure: load the raw dataset with the HuggingFacedatasets library, apply a make_map_fn transformation to each row, then write Parquet files to disk.
Load the Raw Dataset
Use
datasets.load_dataset to fetch your dataset from HuggingFace Hub or from a local path.Implement extract_solution
Write a helper that extracts the clean answer string from the raw label. For GSM8K, the answer appears after
####:Complete GSM8K Example
The full preprocessing script for GSM8K ships atexamples/data_preprocess/gsm8k.py. Run it as:
| Argument | Default | Description |
|---|---|---|
--local_save_dir | ~/data/gsm8k | Directory where .parquet files are written |
--local_dataset_path | None | Path to a locally cached raw dataset (skips HuggingFace download) |
--hdfs_dir | None | HDFS destination; if set, files are copied there after local save |
Pre-Built Preprocessing Scripts
verl ships preprocessing scripts for several common datasets underexamples/data_preprocess/:
| Dataset | Script | Task |
|---|---|---|
| GSM8K | gsm8k.py | Grade-school math |
| MATH | math_dataset.py | Competition math |
| HellaSwag | hellaswag.py | Common-sense reasoning |
| Full HH-RLHF | full_hh_rlhf.py | Dialogue helpfulness/harmlessness |
extract_solution and make_map_fn implementations for your label format.
HDFS Support
Both local and HDFS paths are supported throughout verl. Theverl.utils.hdfs_io module provides copy and makedirs helpers that work transparently with hdfs:// prefixed URIs. Set your training config’s data.train_files and data.val_files to either a local glob pattern or an HDFS path:
Filtering Overlong Prompts
Very long prompts can cause out-of-memory errors at the start of training. verl can pre-filter them during data loading:Multi-Turn and Image Inputs
For multi-turn conversations, extend theprompt list with additional {"role": "assistant", ...} and {"role": "user", ...} turns — the chat template handles the rest. For vision-language models, add an images key to the extra_info dict with the image data; the RLHFDataset class will pass it through to the tokenizer/processor.