Training API

Overview

The training API provides the main entry points for running RL training loops, creating training models (actor and critic), and managing the training lifecycle.

Main Training Function

train()

The main training loop that orchestrates rollout generation, model training, and evaluation.

from slime.train import train
from slime.utils.arguments import parse_args

args = parse_args()
train(args)

Training Loop Steps:

Configure logger and allocate GPUs via placement groups
Initialize tracking (wandb/tensorboard)
Create rollout manager with inference engines
Create actor and critic models
Execute training loop:
- Generate rollout data
- Train critic model (if enabled)
- Train actor model
- Update weights in inference engines
- Periodically save checkpoints and run evaluation

Key Features:

Supports synchronous and asynchronous training modes
Optional weight offloading for memory efficiency
Configurable evaluation intervals
Checkpoint saving with async support

args

Namespace

required

Parsed arguments containing all training configuration. Use parse_args() to create.

Source: train.py:9, train_async.py:10

Model Creation

create_training_models()

Creates and initializes actor and critic models with proper placement groups.

from slime.ray.placement_group import create_training_models

actor_model, critic_model = create_training_models(args, pgs, rollout_manager)

args

Namespace

required

Training arguments

pgs

dict

required

Dictionary containing placement groups for actor, critic, and rollout:

{
    "actor": (pg, bundle_indices, gpu_ids),
    "critic": (pg, bundle_indices, gpu_ids),  # if args.use_critic
    "rollout": (pg, bundle_indices, gpu_ids)
}

rollout_manager

RolloutManager

required

Initialized rollout manager instance

Returns

tuple[RayTrainGroup, RayTrainGroup | None]

actor_model: Ray actor group for policy training
critic_model: Ray actor group for value function training (None if not using critic)

Source: slime/ray/placement_group.py:132

Training Model Classes

MegatronTrainRayActor

Ray actor for Megatron-based distributed training.

class MegatronTrainRayActor(TrainRayActor):
    def init(self, args, role, with_ref=False, with_opd_teacher=False):
        """Initialize training actor"""
    
    def async_train(self, rollout_id, rollout_data_ref):
        """Asynchronously train on rollout data"""
    
    def save_model(self, rollout_id, force_sync=False):
        """Save model checkpoint"""
    
    def update_weights(self):
        """Update weights in inference engines"""

Key Methods:

init

method

Initialize the training actor with model, optimizer, and optional reference/teacher models.Parameters:

args (Namespace): Training configuration
role (str): Either “actor” or “critic”
with_ref (bool): Whether to load reference model for KL penalty
with_opd_teacher (bool): Whether to load teacher model for on-policy distillation

Returns: int - Starting rollout ID

async_train

method

Train the model on rollout data asynchronously.Parameters:

rollout_id (int): Current rollout step
rollout_data_ref (ray.ObjectRef): Reference to rollout data in Ray object store

Returns: Ray ObjectRef for async execution

save_model

method

Save model checkpoint to disk.Parameters:

rollout_id (int): Current rollout step
force_sync (bool): Whether to force synchronous save (default: False)

update_weights

method

Update weights in rollout inference engines from training model. Uses efficient weight transfer mechanisms (tensor-based for colocated, distributed for separate).

Source: slime/backends/megatron_utils/actor.py:45

FSDPTrainRayActor

Ray actor for FSDP (Fully Sharded Data Parallel) based training using pure HuggingFace models.

class FSDPTrainRayActor(TrainRayActor):
    def init(self, args, role, with_ref=False, with_opd_teacher=False):
        """Initialize FSDP training actor"""
    
    def train(self, rollout_id, rollout_data):
        """Train on rollout data"""
    
    def save(self, rollout_id, force_sync=False):
        """Save FSDP checkpoint"""

Features:

Native HuggingFace model support with FSDP2 wrapping
Optional CPU offloading for memory efficiency
Compatible with gradient checkpointing
Supports AdamW optimizer

Source: slime/backends/fsdp_utils/actor.py:34

Placement Groups

create_placement_groups()

Create Ray placement groups for distributing actor, critic, and rollout engines across GPUs.

from slime.ray.placement_group import create_placement_groups

pgs = create_placement_groups(args)

args

Namespace

required

Must contain:

actor_num_nodes: Number of nodes for actor
actor_num_gpus_per_node: GPUs per node for actor
critic_num_nodes: Number of nodes for critic (if using)
critic_num_gpus_per_node: GPUs per node for critic
rollout_num_gpus: Total GPUs for rollout engines
colocate: Whether to colocate training and inference

Returns

dict

Dictionary with keys:

"actor": Tuple of (placement_group, bundle_indices, gpu_ids)
"critic": Tuple of (placement_group, bundle_indices, gpu_ids) or None
"rollout": Tuple of (placement_group, bundle_indices, gpu_ids)

Source: slime/ray/placement_group.py:79

Training Configuration

Key Arguments

Model Architecture:

--actor-num-nodes: Number of nodes for actor training
--actor-num-gpus-per-node: GPUs per node for actor
--critic-num-nodes: Number of nodes for critic (optional)
--use-critic: Enable critic model for PPO

Training Strategy:

--colocate: Colocate training and inference on same GPUs
--offload-train: Offload training model to CPU during rollout
--offload-rollout: Offload rollout model during training
--true-on-policy-mode: Enable strict on-policy training

Optimization:

--lr: Learning rate (default: 1e-6)
--clip-grad: Gradient clipping value (default: 1.0)
--global-batch-size: Global batch size across all ranks
--micro-batch-size: Micro batch size per GPU

See Also:

Arguments API - Complete argument reference
Rollout API - Rollout generation
Backends API - Training backend implementations

Core Modules

Utilities

Training API

Overview

Main Training Function

train()

Model Creation

create_training_models()

Training Model Classes

MegatronTrainRayActor

FSDPTrainRayActor

Placement Groups

create_placement_groups()

Training Configuration

Key Arguments

Build docs developers (and LLMs) love

Core Modules

Utilities

Documentation Index

​Overview

​Main Training Function

​train()

​Model Creation

​create_training_models()

​Training Model Classes

​MegatronTrainRayActor

​FSDPTrainRayActor

​Placement Groups

​create_placement_groups()

​Training Configuration

​Key Arguments

Build docs developers (and LLMs) love

Overview

Main Training Function

train()

Model Creation

create_training_models()

Training Model Classes

MegatronTrainRayActor

FSDPTrainRayActor

Placement Groups

create_placement_groups()

Training Configuration

Key Arguments