verl’s worker layer sits between the single-processDocumentation Index
Fetch the complete documentation index at: https://mintlify.com/verl-project/verl/llms.txt
Use this file to discover all available pages before exploring further.
PPORayTrainer controller and the backend model engines (FSDP, Megatron-LM, etc.). Workers run in SPMD mode across all GPU ranks — the controller never manages individual ranks directly, it calls methods on WorkerGroup objects and the framework handles data dispatch and result collection. The two core worker classes are ActorRolloutRefWorker, the hybrid worker that co-locates the actor, rollout engine, and optional reference policy, and TrainingWorker, the generic single-engine worker used for the critic, reward model, and standalone SFT/DPO training.
Both classes live in verl/workers/engine_workers.py and are engine-agnostic: FSDP, FSDP2, Megatron-LM, Automodel, VeOmni, and TorchTitan are all wired in through the same entry points.
Class Hierarchy
TrainingWorker is also used standalone for the critic, reference model, reward model, and SFT/DPO training — it is essentially a Ray-wrapped BaseEngine that exposes a Tinker-like API as RPCs to the single controller.
ActorRolloutRefWorker
ActorRolloutRefWorker is the hybrid worker used for PPO / GRPO training. The role argument passed at construction selects which sub-workers are built inside init_model:
role | What is built inside init_model |
|---|---|
actor | self.actor (TrainingWorker) + checkpoint engine |
rollout | self.rollout (BaseRollout) |
ref | self.ref (TrainingWorker, forward_only engine config) |
actor_rollout | actor + rollout + checkpoint engine (most common for colocated PPO) |
actor_rollout_ref | all three |
Key RPCs
ONE_TO_ALL means the driver calls init_model() once and the same routine executes on every GPU worker. It builds the TrainingWorker (which in turn instantiates the BaseEngine via EngineRegistry.new), the rollout engine, and the checkpoint engine used for trainer-to-rollout weight synchronization. The rollout engine is always built last so that vLLM / SGLang can accurately estimate available KV cache memory.@register(dispatch_mode=make_nd_compute_dataproto_dispatch_fn(mesh_name="actor"))
def compute_log_prob(self, data: TensorDict) -> TensorDict:
output = self.actor.infer_batch(data)
return output.cpu() if output is not None else None
@register(dispatch_mode=make_nd_compute_dataproto_dispatch_fn(mesh_name="ref"))
def compute_ref_log_prob(self, data: TensorDict) -> TensorDict:
output = self.ref.infer_batch(data)
return output.cpu() if output is not None else None
TrainingWorker.infer_batch drives BaseEngine.infer_batch in eval mode with no_grad. The n-dimensional dispatch function is built from the engine’s actual parallel topology, so the Megatron pipeline-parallel dimension is transparently surfaced as an extra data-parallel axis to the single controller — no backend-specific dispatch logic required.@register(dispatch_mode=make_nd_compute_dataproto_dispatch_fn(mesh_name="actor"))
def update_actor(self, data: TensorDict) -> TensorDict:
output = self.actor.train_mini_batch(data=data)
return output.cpu() if output is not None else None
train_mini_batch splits the incoming batch into PPO mini-batches, iterates over the configured number of PPO epochs, and calls TrainingWorker.train_batch for each mini-batch (one optimizer step per mini-batch). The PPO loss or distillation loss is pre-installed via TrainingWorker.set_loss_fn during init_model.@register(dispatch_mode=Dispatch.ONE_TO_ALL, blocking=False)
async def update_weights(self, global_steps: int = None, mode: str = "auto"):
...
Pushes the latest trainer weights to the rollout engine after each actor update. The
mode parameter selects the transfer strategy:"naive" (colocated sync) — exports per-tensor parameters from the training engine via engine.get_per_tensor_param() and calls rollout.update_weights() directly in-process. For LoRA setups with model.lora.merge=True, adapters are merged into base weights before the sync.checkpoint_engine.send_weights(), suitable for configurations where trainer and rollout run on separate node pools."auto" — resolves to the backend configured in config.rollout.checkpoint_engine.backend.@register(dispatch_mode=Dispatch.ONE_TO_ALL)
def save_checkpoint(self, local_path, hdfs_path=None, global_step=0, max_ckpt_to_keep=None):
self.actor.save_checkpoint(local_path, hdfs_path, global_step, max_ckpt_to_keep)
@register(dispatch_mode=Dispatch.ONE_TO_ALL)
def load_checkpoint(self, local_path, hdfs_path=None, del_local_after_load=False):
self.actor.load_checkpoint(local_path, hdfs_path, del_local_after_load)
TrainingWorker
TrainingWorker is the generic single-engine worker. Construction takes a TrainingWorkerConfig that bundles the model_config, engine_config, optimizer_config, checkpoint_config, and profiler_config. The backend engine is selected from engine_config.strategy.
Usage Patterns
- Inside ActorRolloutRefWorker
- Standalone (Critic / Reward / SFT)
TrainingWorker is instantiated internally as self.actor and self.ref — you never construct it directly in this case.Key RPCs
| RPC | Dispatch | Description |
|---|---|---|
reset() | ONE_TO_ALL | First call initializes the engine; subsequent calls reload weights and reset optimizer/scheduler state. |
to(device, model, optimizer, grad) | ONE_TO_ALL | Manual load/offload control. device must be "cpu" or "device" (mapped to the actual accelerator). |
set_loss_fn(loss_fn) | ONE_TO_ALL | Install the loss closure (PPO loss, distillation loss, or any callable accepting (model_output, batch)). |
train_mini_batch(data) | n-d compute | Mini-batch + PPO-epoch loop; one optimizer step per mini-batch; allgathers metrics across DP. |
train_batch(data) | n-d compute | Single mini-batch train step. Usually invoked indirectly via train_mini_batch. |
infer_batch(data) | n-d compute | Forward-only step for log-prob / value / reward / distillation-teacher computation. Accepts no_lora_adapter=True to temporarily disable the LoRA adapter at inference. |
save_checkpoint(...) | ONE_TO_ALL | Delegates to BaseEngine.save_checkpoint. |
load_checkpoint(...) | ONE_TO_ALL | Delegates to BaseEngine.load_checkpoint. |
Backend Selection
Set thestrategy field on the engine config in your Hydra config file. All roles — actor, ref, critic — can use different backends independently:
EngineRegistry dispatches on (model_type, backend, device) to select the concrete BaseEngine subclass. See the Model Engine page for the full dispatch table.
Migrating from Legacy Workers
The legacyverl.workers.fsdp_workers and verl.workers.megatron_workers modules, along with verl.workers.actor, verl.workers.critic, verl.workers.sharding_manager, and verl.workers.legacy, have been removed. Use the unified verl.workers.engine_workers entry points instead:
| Legacy (removed) | Current (verl.workers.engine_workers) |
|---|---|
fsdp_workers.ActorRolloutRefWorker | ActorRolloutRefWorker (strategy=fsdp/fsdp2) |
megatron_workers.ActorRolloutRefWorker | ActorRolloutRefWorker (strategy=megatron) |
fsdp_workers.CriticWorker | TrainingWorker (critic config + value-model engine) |
megatron_workers.CriticWorker | TrainingWorker (critic config + value-model engine) |
actor.DataParallelPPOActor | FSDPEngineWithLMHead + TrainingWorker |
actor.MegatronPPOActor | MegatronEngineWithLMHead + TrainingWorker |
critic.DataParallelPPOCritic | FSDPEngineWithValueHead + TrainingWorker |
critic.MegatronPPOCritic | MegatronEngineWithValueHead + TrainingWorker |
sharding_manager.FSDPUlyssesShardingManager | verl.utils.ulysses.FSDPUlyssesShardingManager |
Dispatch.MEGATRON_PP_AS_DP_PROTO | make_nd_compute_dataproto_dispatch_fn(mesh_name=...) |
use_legacy_worker_impl: True | Removed — only the unified engine is available |
Extending with a New Backend
To add a new training backend, implement aBaseEngine subclass under verl/workers/engine/<your_backend>/ and register it with @EngineRegistry.register(model_type=..., backend=...). The worker layer (TrainingWorker / ActorRolloutRefWorker) is already engine-agnostic and will pick up the new backend as soon as engine_config.strategy is set to its name. See Model Engine for the full extension guide and test harness.
Model Engine
Understand the BaseEngine abstraction, FSDP/Megatron backends, and the EngineRegistry dispatch table.
Ray Trainer
See how PPORayTrainer creates WorkerGroups and drives the training loop via these RPCs.