verl Engine Workers: ActorRolloutRefWorker & TrainingWorker

verl’s worker layer sits between the single-process PPORayTrainer controller and the backend model engines (FSDP, Megatron-LM, etc.). Workers run in SPMD mode across all GPU ranks — the controller never manages individual ranks directly, it calls methods on WorkerGroup objects and the framework handles data dispatch and result collection. The two core worker classes are ActorRolloutRefWorker, the hybrid worker that co-locates the actor, rollout engine, and optional reference policy, and TrainingWorker, the generic single-engine worker used for the critic, reward model, and standalone SFT/DPO training. Both classes live in verl/workers/engine_workers.py and are engine-agnostic: FSDP, FSDP2, Megatron-LM, Automodel, VeOmni, and TorchTitan are all wired in through the same entry points.

Class Hierarchy

ActorRolloutRefWorker          # hybrid: co-locates actor + rollout + optional ref
├── self.actor  : TrainingWorker     (built when role contains "actor")
├── self.ref    : TrainingWorker     (built when role contains "ref")
├── self.rollout: BaseRollout        (vLLM / SGLang, built when role contains "rollout")
└── self.checkpoint_engine           (built when role contains "actor")

TrainingWorker                 # generic: one engine + optimizer + profiler
└── self.engine : BaseEngine         (fsdp / fsdp2 / megatron / automodel / veomni / torchtitan)

TrainingWorker is also used standalone for the critic, reference model, reward model, and SFT/DPO training — it is essentially a Ray-wrapped BaseEngine that exposes a Tinker-like API as RPCs to the single controller.

ActorRolloutRefWorker

ActorRolloutRefWorker is the hybrid worker used for PPO / GRPO training. The role argument passed at construction selects which sub-workers are built inside init_model:

`role`	What is built inside `init_model`
`actor`	`self.actor` (`TrainingWorker`) + checkpoint engine
`rollout`	`self.rollout` (`BaseRollout`)
`ref`	`self.ref` (`TrainingWorker`, `forward_only` engine config)
`actor_rollout`	actor + rollout + checkpoint engine (most common for colocated PPO)
`actor_rollout_ref`	all three

Key RPCs

init_model

@register(dispatch_mode=Dispatch.ONE_TO_ALL)
def init_model(self):
    ...

ONE_TO_ALL means the driver calls init_model() once and the same routine executes on every GPU worker. It builds the TrainingWorker (which in turn instantiates the BaseEngine via EngineRegistry.new), the rollout engine, and the checkpoint engine used for trainer-to-rollout weight synchronization. The rollout engine is always built last so that vLLM / SGLang can accurately estimate available KV cache memory.

compute_log_prob / compute_ref_log_prob

@register(dispatch_mode=make_nd_compute_dataproto_dispatch_fn(mesh_name="actor"))
def compute_log_prob(self, data: TensorDict) -> TensorDict:
    output = self.actor.infer_batch(data)
    return output.cpu() if output is not None else None

@register(dispatch_mode=make_nd_compute_dataproto_dispatch_fn(mesh_name="ref"))
def compute_ref_log_prob(self, data: TensorDict) -> TensorDict:
    output = self.ref.infer_batch(data)
    return output.cpu() if output is not None else None

TrainingWorker.infer_batch drives BaseEngine.infer_batch in eval mode with no_grad. The n-dimensional dispatch function is built from the engine’s actual parallel topology, so the Megatron pipeline-parallel dimension is transparently surfaced as an extra data-parallel axis to the single controller — no backend-specific dispatch logic required.

update_actor

@register(dispatch_mode=make_nd_compute_dataproto_dispatch_fn(mesh_name="actor"))
def update_actor(self, data: TensorDict) -> TensorDict:
    output = self.actor.train_mini_batch(data=data)
    return output.cpu() if output is not None else None

train_mini_batch splits the incoming batch into PPO mini-batches, iterates over the configured number of PPO epochs, and calls TrainingWorker.train_batch for each mini-batch (one optimizer step per mini-batch). The PPO loss or distillation loss is pre-installed via TrainingWorker.set_loss_fn during init_model.

update_weights

@register(dispatch_mode=Dispatch.ONE_TO_ALL, blocking=False)
async def update_weights(self, global_steps: int = None, mode: str = "auto"):
    ...

Pushes the latest trainer weights to the rollout engine after each actor update. The mode parameter selects the transfer strategy:

"naive" (colocated sync) — exports per-tensor parameters from the training engine via engine.get_per_tensor_param() and calls rollout.update_weights() directly in-process. For LoRA setups with model.lora.merge=True, adapters are merged into base weights before the sync.

Any other value (disaggregated async) — sends weights through checkpoint_engine.send_weights(), suitable for configurations where trainer and rollout run on separate node pools.

"auto" — resolves to the backend configured in config.rollout.checkpoint_engine.backend.

save_checkpoint / load_checkpoint

@register(dispatch_mode=Dispatch.ONE_TO_ALL)
def save_checkpoint(self, local_path, hdfs_path=None, global_step=0, max_ckpt_to_keep=None):
    self.actor.save_checkpoint(local_path, hdfs_path, global_step, max_ckpt_to_keep)

@register(dispatch_mode=Dispatch.ONE_TO_ALL)
def load_checkpoint(self, local_path, hdfs_path=None, del_local_after_load=False):
    self.actor.load_checkpoint(local_path, hdfs_path, del_local_after_load)

Both delegate to the actor TrainingWorker, which in turn calls BaseEngine.save_checkpoint / load_checkpoint. The backend engine is responsible for saving sharded model weights, optimizer state, and LR scheduler state, as well as HuggingFace-format export where applicable.

TrainingWorker

TrainingWorker is the generic single-engine worker. Construction takes a TrainingWorkerConfig that bundles the model_config, engine_config, optimizer_config, checkpoint_config, and profiler_config. The backend engine is selected from engine_config.strategy.

Usage Patterns

Inside ActorRolloutRefWorker
Standalone (Critic / Reward / SFT)

TrainingWorker is instantiated internally as self.actor and self.ref — you never construct it directly in this case.

# Built automatically by ActorRolloutRefWorker.init_model()
actor_training_config = TrainingWorkerConfig(
    model_type="language_model",
    model_config=actor_config.model_config,
    engine_config=actor_config.engine,
    optimizer_config=actor_config.optim,
    checkpoint_config=actor_config.checkpoint,
)
self.actor = TrainingWorker(config=actor_training_config)
self.actor.reset()
self.actor.set_loss_fn(ppo_loss_fn)

TrainingWorker is used directly when you need a standalone training or inference worker — for example, the PPO critic or a reward model.

critic_cls = RayClassWithInitArgs(
    cls=TrainingWorker,
    config=critic_training_worker_config,
)
critic_worker_group = RayWorkerGroup(
    resource_pool=resource_pool,
    ray_cls_with_init=critic_cls,
)
critic_worker_group.reset()
critic_worker_group.set_loss_fn(critic_loss_fn)

Key RPCs

RPC	Dispatch	Description
`reset()`	`ONE_TO_ALL`	First call initializes the engine; subsequent calls reload weights and reset optimizer/scheduler state.
`to(device, model, optimizer, grad)`	`ONE_TO_ALL`	Manual load/offload control. `device` must be `"cpu"` or `"device"` (mapped to the actual accelerator).
`set_loss_fn(loss_fn)`	`ONE_TO_ALL`	Install the loss closure (PPO loss, distillation loss, or any callable accepting `(model_output, batch)`).
`train_mini_batch(data)`	n-d compute	Mini-batch + PPO-epoch loop; one optimizer step per mini-batch; allgathers metrics across DP.
`train_batch(data)`	n-d compute	Single mini-batch train step. Usually invoked indirectly via `train_mini_batch`.
`infer_batch(data)`	n-d compute	Forward-only step for log-prob / value / reward / distillation-teacher computation. Accepts `no_lora_adapter=True` to temporarily disable the LoRA adapter at inference.
`save_checkpoint(...)`	`ONE_TO_ALL`	Delegates to `BaseEngine.save_checkpoint`.
`load_checkpoint(...)`	`ONE_TO_ALL`	Delegates to `BaseEngine.load_checkpoint`.

Backend Selection

Set the strategy field on the engine config in your Hydra config file. All roles — actor, ref, critic — can use different backends independently:

actor_rollout_ref:
  actor:
    strategy: fsdp2          # fsdp | fsdp2 | megatron | automodel | veomni | torchtitan
    engine:
      strategy: fsdp2
      param_offload: false
      optimizer_offload: false
  ref:
    strategy: fsdp2
critic:
  strategy: fsdp2

The EngineRegistry dispatches on (model_type, backend, device) to select the concrete BaseEngine subclass. See the Model Engine page for the full dispatch table.

Migrating from Legacy Workers

The legacy verl.workers.fsdp_workers and verl.workers.megatron_workers modules, along with verl.workers.actor, verl.workers.critic, verl.workers.sharding_manager, and verl.workers.legacy, have been removed. Use the unified verl.workers.engine_workers entry points instead:

Legacy (removed)	Current (`verl.workers.engine_workers`)
`fsdp_workers.ActorRolloutRefWorker`	`ActorRolloutRefWorker` (`strategy=fsdp`/`fsdp2`)
`megatron_workers.ActorRolloutRefWorker`	`ActorRolloutRefWorker` (`strategy=megatron`)
`fsdp_workers.CriticWorker`	`TrainingWorker` (critic config + value-model engine)
`megatron_workers.CriticWorker`	`TrainingWorker` (critic config + value-model engine)
`actor.DataParallelPPOActor`	`FSDPEngineWithLMHead` + `TrainingWorker`
`actor.MegatronPPOActor`	`MegatronEngineWithLMHead` + `TrainingWorker`
`critic.DataParallelPPOCritic`	`FSDPEngineWithValueHead` + `TrainingWorker`
`critic.MegatronPPOCritic`	`MegatronEngineWithValueHead` + `TrainingWorker`
`sharding_manager.FSDPUlyssesShardingManager`	`verl.utils.ulysses.FSDPUlyssesShardingManager`
`Dispatch.MEGATRON_PP_AS_DP_PROTO`	`make_nd_compute_dataproto_dispatch_fn(mesh_name=...)`
`use_legacy_worker_impl: True`	Removed — only the unified engine is available

Extending with a New Backend

To add a new training backend, implement a BaseEngine subclass under verl/workers/engine/<your_backend>/ and register it with @EngineRegistry.register(model_type=..., backend=...). The worker layer (TrainingWorker / ActorRolloutRefWorker) is already engine-agnostic and will pick up the new backend as soon as engine_config.strategy is set to its name. See Model Engine for the full extension guide and test harness.

Model Engine

Understand the BaseEngine abstraction, FSDP/Megatron backends, and the EngineRegistry dispatch table.

Ray Trainer

See how PPORayTrainer creates WorkerGroups and drives the training loop via these RPCs.

Get Started

Core Concepts

Algorithms

Workers & Engines

Advanced Usage

Configuration & Reference

verl Engine Workers: ActorRolloutRefWorker & TrainingWorker

Class Hierarchy

ActorRolloutRefWorker

Key RPCs

TrainingWorker

Usage Patterns

Key RPCs

Backend Selection

Migrating from Legacy Workers

Extending with a New Backend

Model Engine

Ray Trainer

Build docs developers (and LLMs) love

Get Started

Core Concepts

Algorithms

Workers & Engines

Advanced Usage

Configuration & Reference

Documentation Index

​Class Hierarchy

​ActorRolloutRefWorker

​Key RPCs

​TrainingWorker

​Usage Patterns

​Key RPCs

​Backend Selection

​Migrating from Legacy Workers

​Extending with a New Backend

Model Engine

Ray Trainer

Build docs developers (and LLMs) love

Class Hierarchy

ActorRolloutRefWorker

Key RPCs

TrainingWorker

Usage Patterns

Key RPCs

Backend Selection

Migrating from Legacy Workers

Extending with a New Backend