Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/verl-project/verl/llms.txt

Use this file to discover all available pages before exploring further.

verl’s worker layer sits between the single-process PPORayTrainer controller and the backend model engines (FSDP, Megatron-LM, etc.). Workers run in SPMD mode across all GPU ranks — the controller never manages individual ranks directly, it calls methods on WorkerGroup objects and the framework handles data dispatch and result collection. The two core worker classes are ActorRolloutRefWorker, the hybrid worker that co-locates the actor, rollout engine, and optional reference policy, and TrainingWorker, the generic single-engine worker used for the critic, reward model, and standalone SFT/DPO training. Both classes live in verl/workers/engine_workers.py and are engine-agnostic: FSDP, FSDP2, Megatron-LM, Automodel, VeOmni, and TorchTitan are all wired in through the same entry points.

Class Hierarchy

ActorRolloutRefWorker          # hybrid: co-locates actor + rollout + optional ref
├── self.actor  : TrainingWorker     (built when role contains "actor")
├── self.ref    : TrainingWorker     (built when role contains "ref")
├── self.rollout: BaseRollout        (vLLM / SGLang, built when role contains "rollout")
└── self.checkpoint_engine           (built when role contains "actor")

TrainingWorker                 # generic: one engine + optimizer + profiler
└── self.engine : BaseEngine         (fsdp / fsdp2 / megatron / automodel / veomni / torchtitan)
TrainingWorker is also used standalone for the critic, reference model, reward model, and SFT/DPO training — it is essentially a Ray-wrapped BaseEngine that exposes a Tinker-like API as RPCs to the single controller.

ActorRolloutRefWorker

ActorRolloutRefWorker is the hybrid worker used for PPO / GRPO training. The role argument passed at construction selects which sub-workers are built inside init_model:
roleWhat is built inside init_model
actorself.actor (TrainingWorker) + checkpoint engine
rolloutself.rollout (BaseRollout)
refself.ref (TrainingWorker, forward_only engine config)
actor_rolloutactor + rollout + checkpoint engine (most common for colocated PPO)
actor_rollout_refall three

Key RPCs

1
init_model
2
@register(dispatch_mode=Dispatch.ONE_TO_ALL)
def init_model(self):
    ...
3
ONE_TO_ALL means the driver calls init_model() once and the same routine executes on every GPU worker. It builds the TrainingWorker (which in turn instantiates the BaseEngine via EngineRegistry.new), the rollout engine, and the checkpoint engine used for trainer-to-rollout weight synchronization. The rollout engine is always built last so that vLLM / SGLang can accurately estimate available KV cache memory.
4
compute_log_prob / compute_ref_log_prob
5
@register(dispatch_mode=make_nd_compute_dataproto_dispatch_fn(mesh_name="actor"))
def compute_log_prob(self, data: TensorDict) -> TensorDict:
    output = self.actor.infer_batch(data)
    return output.cpu() if output is not None else None

@register(dispatch_mode=make_nd_compute_dataproto_dispatch_fn(mesh_name="ref"))
def compute_ref_log_prob(self, data: TensorDict) -> TensorDict:
    output = self.ref.infer_batch(data)
    return output.cpu() if output is not None else None
6
TrainingWorker.infer_batch drives BaseEngine.infer_batch in eval mode with no_grad. The n-dimensional dispatch function is built from the engine’s actual parallel topology, so the Megatron pipeline-parallel dimension is transparently surfaced as an extra data-parallel axis to the single controller — no backend-specific dispatch logic required.
7
update_actor
8
@register(dispatch_mode=make_nd_compute_dataproto_dispatch_fn(mesh_name="actor"))
def update_actor(self, data: TensorDict) -> TensorDict:
    output = self.actor.train_mini_batch(data=data)
    return output.cpu() if output is not None else None
9
train_mini_batch splits the incoming batch into PPO mini-batches, iterates over the configured number of PPO epochs, and calls TrainingWorker.train_batch for each mini-batch (one optimizer step per mini-batch). The PPO loss or distillation loss is pre-installed via TrainingWorker.set_loss_fn during init_model.
10
update_weights
11
@register(dispatch_mode=Dispatch.ONE_TO_ALL, blocking=False)
async def update_weights(self, global_steps: int = None, mode: str = "auto"):
    ...
12
Pushes the latest trainer weights to the rollout engine after each actor update. The mode parameter selects the transfer strategy:
13
  • "naive" (colocated sync) — exports per-tensor parameters from the training engine via engine.get_per_tensor_param() and calls rollout.update_weights() directly in-process. For LoRA setups with model.lora.merge=True, adapters are merged into base weights before the sync.
  • Any other value (disaggregated async) — sends weights through checkpoint_engine.send_weights(), suitable for configurations where trainer and rollout run on separate node pools.
  • "auto" — resolves to the backend configured in config.rollout.checkpoint_engine.backend.
  • 14
    save_checkpoint / load_checkpoint
    15
    @register(dispatch_mode=Dispatch.ONE_TO_ALL)
    def save_checkpoint(self, local_path, hdfs_path=None, global_step=0, max_ckpt_to_keep=None):
        self.actor.save_checkpoint(local_path, hdfs_path, global_step, max_ckpt_to_keep)
    
    @register(dispatch_mode=Dispatch.ONE_TO_ALL)
    def load_checkpoint(self, local_path, hdfs_path=None, del_local_after_load=False):
        self.actor.load_checkpoint(local_path, hdfs_path, del_local_after_load)
    
    16
    Both delegate to the actor TrainingWorker, which in turn calls BaseEngine.save_checkpoint / load_checkpoint. The backend engine is responsible for saving sharded model weights, optimizer state, and LR scheduler state, as well as HuggingFace-format export where applicable.

    TrainingWorker

    TrainingWorker is the generic single-engine worker. Construction takes a TrainingWorkerConfig that bundles the model_config, engine_config, optimizer_config, checkpoint_config, and profiler_config. The backend engine is selected from engine_config.strategy.

    Usage Patterns

    TrainingWorker is instantiated internally as self.actor and self.ref — you never construct it directly in this case.
    # Built automatically by ActorRolloutRefWorker.init_model()
    actor_training_config = TrainingWorkerConfig(
        model_type="language_model",
        model_config=actor_config.model_config,
        engine_config=actor_config.engine,
        optimizer_config=actor_config.optim,
        checkpoint_config=actor_config.checkpoint,
    )
    self.actor = TrainingWorker(config=actor_training_config)
    self.actor.reset()
    self.actor.set_loss_fn(ppo_loss_fn)
    

    Key RPCs

    RPCDispatchDescription
    reset()ONE_TO_ALLFirst call initializes the engine; subsequent calls reload weights and reset optimizer/scheduler state.
    to(device, model, optimizer, grad)ONE_TO_ALLManual load/offload control. device must be "cpu" or "device" (mapped to the actual accelerator).
    set_loss_fn(loss_fn)ONE_TO_ALLInstall the loss closure (PPO loss, distillation loss, or any callable accepting (model_output, batch)).
    train_mini_batch(data)n-d computeMini-batch + PPO-epoch loop; one optimizer step per mini-batch; allgathers metrics across DP.
    train_batch(data)n-d computeSingle mini-batch train step. Usually invoked indirectly via train_mini_batch.
    infer_batch(data)n-d computeForward-only step for log-prob / value / reward / distillation-teacher computation. Accepts no_lora_adapter=True to temporarily disable the LoRA adapter at inference.
    save_checkpoint(...)ONE_TO_ALLDelegates to BaseEngine.save_checkpoint.
    load_checkpoint(...)ONE_TO_ALLDelegates to BaseEngine.load_checkpoint.

    Backend Selection

    Set the strategy field on the engine config in your Hydra config file. All roles — actor, ref, critic — can use different backends independently:
    actor_rollout_ref:
      actor:
        strategy: fsdp2          # fsdp | fsdp2 | megatron | automodel | veomni | torchtitan
        engine:
          strategy: fsdp2
          param_offload: false
          optimizer_offload: false
      ref:
        strategy: fsdp2
    critic:
      strategy: fsdp2
    
    The EngineRegistry dispatches on (model_type, backend, device) to select the concrete BaseEngine subclass. See the Model Engine page for the full dispatch table.

    Migrating from Legacy Workers

    The legacy verl.workers.fsdp_workers and verl.workers.megatron_workers modules, along with verl.workers.actor, verl.workers.critic, verl.workers.sharding_manager, and verl.workers.legacy, have been removed. Use the unified verl.workers.engine_workers entry points instead:
    Legacy (removed)Current (verl.workers.engine_workers)
    fsdp_workers.ActorRolloutRefWorkerActorRolloutRefWorker (strategy=fsdp/fsdp2)
    megatron_workers.ActorRolloutRefWorkerActorRolloutRefWorker (strategy=megatron)
    fsdp_workers.CriticWorkerTrainingWorker (critic config + value-model engine)
    megatron_workers.CriticWorkerTrainingWorker (critic config + value-model engine)
    actor.DataParallelPPOActorFSDPEngineWithLMHead + TrainingWorker
    actor.MegatronPPOActorMegatronEngineWithLMHead + TrainingWorker
    critic.DataParallelPPOCriticFSDPEngineWithValueHead + TrainingWorker
    critic.MegatronPPOCriticMegatronEngineWithValueHead + TrainingWorker
    sharding_manager.FSDPUlyssesShardingManagerverl.utils.ulysses.FSDPUlyssesShardingManager
    Dispatch.MEGATRON_PP_AS_DP_PROTOmake_nd_compute_dataproto_dispatch_fn(mesh_name=...)
    use_legacy_worker_impl: TrueRemoved — only the unified engine is available

    Extending with a New Backend

    To add a new training backend, implement a BaseEngine subclass under verl/workers/engine/<your_backend>/ and register it with @EngineRegistry.register(model_type=..., backend=...). The worker layer (TrainingWorker / ActorRolloutRefWorker) is already engine-agnostic and will pick up the new backend as soon as engine_config.strategy is set to its name. See Model Engine for the full extension guide and test harness.

    Model Engine

    Understand the BaseEngine abstraction, FSDP/Megatron backends, and the EngineRegistry dispatch table.

    Ray Trainer

    See how PPORayTrainer creates WorkerGroups and drives the training loop via these RPCs.

    Build docs developers (and LLMs) love