Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/verl-project/verl/llms.txt

Use this file to discover all available pages before exploring further.

Reinforcement learning fine-tuning runs are long-lived workloads that can span hours or days across large GPU clusters. Hardware failures, preemptions, and transient errors are common realities at scale. verl’s checkpoint system lets you save the full training state — model weights, optimizer state, LR scheduler, and RNG states — and resume seamlessly from any saved step, minimizing wasted compute.

What Gets Saved

Checkpoint contents are controlled by the checkpoint.save_contents field on each model role (actor, critic, reference). The field accepts any subset of four values:
ValueDescription
modelFramework-native sharded model weights. For FSDP: per-rank shards. For Megatron: HF format via mbridge, or Megatron dist checkpoint when use_dist_checkpointing=True.
optimizerSharded optimizer state (Adam moments, etc.).
extraLR scheduler state, RNG states, and (for Megatron) the serialized TransformerConfig.
hf_modelFull HuggingFace format weights consolidated on rank 0. Suitable for inference without any conversion step.
For FSDP, the model, optimizer, and extra contents are bound together — they are always saved and loaded as a unit. Always include all three to maintain a consistent and resumable checkpoint. Omitting optimizer means you cannot resume training (only the weights are useful for inference).

Configuration

Save Frequency

trainer:
  save_freq: 100             # save every 100 training iterations
  default_local_dir: checkpoints/${trainer.project_name}/${trainer.experiment_name}
  default_hdfs_dir: null     # optional: hdfs://path/for/remote/storage

actor_rollout_ref:
  actor:
    checkpoint:
      save_contents: ['model', 'optimizer', 'extra']
      load_contents: ['model', 'optimizer', 'extra']

critic:
  checkpoint:
    save_contents: ['model', 'optimizer', 'extra']
    load_contents: ['model', 'optimizer', 'extra']
trainer.save_freq
int
default:"-1"
How often (in training iterations) to write a checkpoint. -1 disables periodic checkpointing; a final checkpoint is written at the end of training only.
trainer.default_local_dir
string
Root directory for checkpoint storage on local (or NFS-mounted) filesystems. Defaults to checkpoints/{project_name}/{experiment_name}.
trainer.default_hdfs_dir
string
Optional HDFS path for remote checkpoint storage. When set, checkpoints are also uploaded to HDFS after each local save.
trainer.max_actor_ckpt_to_keep
int
Maximum number of actor checkpoint steps to retain. Older ones are deleted automatically. null keeps all.

Resuming from a Checkpoint

trainer:
  resume_mode: auto          # auto | disable | resume_path
  resume_from_path: null     # set this when resume_mode: resume_path

actor_rollout_ref:
  actor:
    checkpoint:
      load_contents: ['model', 'optimizer', 'extra']
trainer.resume_mode
string
default:"auto"
Resume behavior:
  • auto — automatically resume from the latest checkpoint in default_local_dir if one exists; start fresh if not
  • disable — always start from scratch, ignoring any existing checkpoints
  • resume_path — resume from the explicit path in resume_from_path
trainer.resume_from_path
string
Explicit path to resume from when resume_mode=resume_path. Point to a specific global_steps_N directory.
To resume only model weights (e.g., after changing optimizer hyperparameters), override load_contents:
python -m verl.trainer.main_ppo \
    trainer.resume_mode=resume_path \
    trainer.resume_from_path=checkpoints/my_project/my_run/global_steps_500 \
    "actor_rollout_ref.actor.checkpoint.load_contents=['model']"

FSDP Checkpoint Structure

For FSDP-based training, checkpoints are sharded per rank and laid out as follows:
checkpoints/{project_name}/{experiment_name}/
├── global_steps_100/
│   ├── actor/
│   │   ├── huggingface/          # config.json, tokenizer files; full HF weights if hf_model is saved
│   │   ├── fsdp_config.json      # world_size and FSDP version metadata
│   │   ├── model_world_size_8_rank_0.pt
│   │   ├── model_world_size_8_rank_1.pt
│   │   ├── ...
│   │   ├── optim_world_size_8_rank_0.pt
│   │   ├── ...
│   │   └── extra_state_world_size_8_rank_0.pt
│   └── critic/
│       ├── huggingface/
│       ├── fsdp_config.json
│       ├── model_world_size_8_rank_0.pt
│       └── ...
└── latest_checkpointed_iteration.txt
All model shards, optimizer states, and extra states are stored in a distributed, sharded format. The latest_checkpointed_iteration.txt file records the most recently saved step and is used by resume_mode=auto.

Megatron Checkpoint Structure

Megatron uses a more structured layout (schema v2) with a ckpt_contents.json manifest:
checkpoints/{project_name}/{experiment_name}/
├── global_steps_100/
│   ├── actor/
│   │   ├── ckpt_contents.json        # manifest: maps logical names to on-disk paths
│   │   ├── transformer_config.json   # serialized Megatron TransformerConfig
│   │   ├── model/
│   │   │   ├── huggingface/          # HF weights (requires use_mbridge=True)
│   │   │   └── dist_ckpt/           # Megatron shards (use_dist_checkpointing=True)
│   │   ├── optimizer/
│   │   │   └── dist_ckpt/           # optimizer + LR scheduler shards
│   │   └── extra/
│   │       └── dist_ckpt/           # RNG state shards
│   └── critic/                      # same layout as actor
└── latest_checkpointed_iteration.txt
The ckpt_contents.json manifest is written last during saving, so its presence indicates a fully complete checkpoint. Example manifest:
{
  "schema_version": 2,
  "framework": "megatron",
  "role": "actor",
  "global_step": 100,
  "save_contents": ["model", "optimizer", "extra"],
  "contents": {
    "model":       {"path": "model/huggingface", "format": "huggingface"},
    "optimizer":   {"path": "optimizer/dist_ckpt", "format": "megatron_dist_checkpoint"},
    "lr_scheduler":{"path": "optimizer/dist_ckpt", "format": "megatron_dist_checkpoint"},
    "rng_state":   {"path": "extra/dist_ckpt",     "format": "megatron_dist_checkpoint"}
  }
}

Megatron Backend Options

Megatron model checkpoint behavior is controlled by two flags on actor_rollout_ref.actor.megatron:
actor_rollout_ref.actor.megatron.use_mbridge
boolean
default:"True"
When True, the Megatron engine builds a mbridge instance that enables saving and loading model weights in HuggingFace format under model/huggingface/. Required for hf_model in save_contents.
actor_rollout_ref.actor.megatron.use_dist_checkpointing
boolean
default:"False"
When True, Megatron’s dist_checkpointing writes sharded model weights under model/dist_ckpt/. Can be used alongside use_mbridge=True to save both formats in one step.
The two flags are independent and can be combined. The table below summarizes behavior:
use_mbridgeuse_dist_checkpointingsave_contentsOn-disk Result
modelHF weights at model/huggingface/
hf_modelSame HF tree (deduplicated)
model + hf_modelSame HF checkpoint saved once (deduplicated)
modelMegatron shards at model/dist_ckpt/
hf_modelError — mbridge required
modelMegatron shards at model/dist_ckpt/ only (no HF export)
model + hf_modelBoth: model/dist_ckpt/ and model/huggingface/
hf_model in save_contents requires use_mbridge=True. Without it, the checkpoint manager will raise an error at save time. If you need to save HF-format weights without mbridge, use the verl.model_merger tool after training instead.
Default / production — keep use_mbridge=True and save all three core contents:
actor_rollout_ref:
  actor:
    megatron:
      use_mbridge: True
    checkpoint:
      save_contents: ['model', 'optimizer', 'extra']
HF-only export — only need a deployable checkpoint, not resumable state:
actor_rollout_ref:
  actor:
    checkpoint:
      save_contents: ['hf_model']
Hybrid (resume + HF export) — write both Megatron shards and HF weights in one step:
actor_rollout_ref:
  actor:
    megatron:
      use_mbridge: True
      use_dist_checkpointing: True
    checkpoint:
      save_contents: ['model', 'hf_model', 'optimizer', 'extra']

Exporting to HuggingFace Format

verl provides verl.model_merger to convert sharded FSDP or Megatron checkpoints into a single HuggingFace model directory for inference.

FSDP Checkpoint Merge

python -m verl.model_merger merge \
    --backend fsdp \
    --local_dir checkpoints/my_project/my_run/global_steps_500/actor \
    --target_dir /path/to/merged_hf_model

Megatron Checkpoint Merge (Single Node)

python -m verl.model_merger merge \
    --backend megatron \
    --tie-word-embedding \
    --local_dir checkpoints/my_project/my_run/global_steps_500/actor \
    --target_dir /path/to/merged_hf_model

Megatron Checkpoint Merge (Distributed, Multi-Node)

torchrun --nproc_per_node 1 --nnodes 8 --node_rank ${RANK} \
    -m verl.model_merger merge \
    --backend megatron \
    --tie-word-embedding \
    --local_dir checkpoints/my_project/my_run/global_steps_500/actor \
    --target_dir /path/to/merged_hf_model
Once merged, the target_dir contains a standard HuggingFace model that can be loaded with AutoModelForCausalLM.from_pretrained().

Validate a Merged Checkpoint

python -m verl.model_merger test \
    --backend fsdp \
    --local_dir checkpoints/.../global_steps_500/actor \
    --reference_dir /path/to/original_hf_model

Migrating Pre-v2 Megatron Checkpoints

Older verl releases produced a flatter checkpoint layout that is incompatible with the current v2 loader. Migrate an existing checkpoint with:
# Migrate a single step
python scripts/migrate_megatron_checkpoint_layout.py \
    --checkpoint /path/to/global_step_100/actor

# Migrate all steps under a run
python scripts/migrate_megatron_checkpoint_layout.py \
    --checkpoint-root /path/to/run \
    --all-steps
The migration uses hard links by default, so it is fast and does not duplicate disk space.

Optimizer Checkpoint Format (Megatron)

Megatron optimizer checkpoints support two formats controlled by dist_ckpt_optim_fully_reshardable:
  • False (default, DP-reshardable): Faster and lower memory overhead. Supports resuming with different data parallel sizes.
  • True (fully-reshardable): Slower but supports resuming with arbitrary parallelism configurations.
When dist_ckpt_optim_fully_reshardable=True, optimizer states are temporarily gathered on data-parallel rank 0 before being re-sharded for storage. For large models this intermediate aggregation can cause CPU OOM. Use the default DP-reshardable format unless you specifically need to change parallelism on resume.

Build docs developers (and LLMs) love