Overview
The training API provides the main entry points for running RL training loops, creating training models (actor and critic), and managing the training lifecycle.Main Training Function
train()
The main training loop that orchestrates rollout generation, model training, and evaluation.- Configure logger and allocate GPUs via placement groups
- Initialize tracking (wandb/tensorboard)
- Create rollout manager with inference engines
- Create actor and critic models
- Execute training loop:
- Generate rollout data
- Train critic model (if enabled)
- Train actor model
- Update weights in inference engines
- Periodically save checkpoints and run evaluation
- Supports synchronous and asynchronous training modes
- Optional weight offloading for memory efficiency
- Configurable evaluation intervals
- Checkpoint saving with async support
Parsed arguments containing all training configuration. Use
parse_args() to create.train.py:9, train_async.py:10
Model Creation
create_training_models()
Creates and initializes actor and critic models with proper placement groups.Training arguments
Dictionary containing placement groups for actor, critic, and rollout:
Initialized rollout manager instance
actor_model: Ray actor group for policy trainingcritic_model: Ray actor group for value function training (None if not using critic)
slime/ray/placement_group.py:132
Training Model Classes
MegatronTrainRayActor
Ray actor for Megatron-based distributed training.Initialize the training actor with model, optimizer, and optional reference/teacher models.Parameters:
args(Namespace): Training configurationrole(str): Either “actor” or “critic”with_ref(bool): Whether to load reference model for KL penaltywith_opd_teacher(bool): Whether to load teacher model for on-policy distillation
int - Starting rollout IDTrain the model on rollout data asynchronously.Parameters:
rollout_id(int): Current rollout steprollout_data_ref(ray.ObjectRef): Reference to rollout data in Ray object store
Save model checkpoint to disk.Parameters:
rollout_id(int): Current rollout stepforce_sync(bool): Whether to force synchronous save (default: False)
Update weights in rollout inference engines from training model.
Uses efficient weight transfer mechanisms (tensor-based for colocated, distributed for separate).
slime/backends/megatron_utils/actor.py:45
FSDPTrainRayActor
Ray actor for FSDP (Fully Sharded Data Parallel) based training using pure HuggingFace models.- Native HuggingFace model support with FSDP2 wrapping
- Optional CPU offloading for memory efficiency
- Compatible with gradient checkpointing
- Supports AdamW optimizer
slime/backends/fsdp_utils/actor.py:34
Placement Groups
create_placement_groups()
Create Ray placement groups for distributing actor, critic, and rollout engines across GPUs.Must contain:
actor_num_nodes: Number of nodes for actoractor_num_gpus_per_node: GPUs per node for actorcritic_num_nodes: Number of nodes for critic (if using)critic_num_gpus_per_node: GPUs per node for criticrollout_num_gpus: Total GPUs for rollout enginescolocate: Whether to colocate training and inference
Dictionary with keys:
"actor": Tuple of (placement_group, bundle_indices, gpu_ids)"critic": Tuple of (placement_group, bundle_indices, gpu_ids) or None"rollout": Tuple of (placement_group, bundle_indices, gpu_ids)
slime/ray/placement_group.py:79
Training Configuration
Key Arguments
Model Architecture:--actor-num-nodes: Number of nodes for actor training--actor-num-gpus-per-node: GPUs per node for actor--critic-num-nodes: Number of nodes for critic (optional)--use-critic: Enable critic model for PPO
--colocate: Colocate training and inference on same GPUs--offload-train: Offload training model to CPU during rollout--offload-rollout: Offload rollout model during training--true-on-policy-mode: Enable strict on-policy training
--lr: Learning rate (default: 1e-6)--clip-grad: Gradient clipping value (default: 1.0)--global-batch-size: Global batch size across all ranks--micro-batch-size: Micro batch size per GPU
- Arguments API - Complete argument reference
- Rollout API - Rollout generation
- Backends API - Training backend implementations