Skip to main content

Overview

The rollout API manages inference engines, generates model responses, computes rewards, and provides data to the training loop.

Rollout Manager

create_rollout_manager()

Create and initialize the rollout manager with SGLang inference engines.
from slime.ray.placement_group import create_rollout_manager

rollout_manager, num_rollout_per_epoch = create_rollout_manager(args, pg)
args
Namespace
required
Training arguments with rollout configuration
pg
tuple
required
Placement group tuple: (placement_group, bundle_indices, gpu_ids)
Returns
tuple[RolloutManager, int | None]
  • rollout_manager: Ray actor managing inference engines
  • num_rollout_per_epoch: Number of rollout steps per epoch (None if --num-rollout is set)
Source: slime/ray/placement_group.py:181

RolloutManager

Ray actor that manages multiple SGLang inference engines for parallel generation.
@ray.remote
class RolloutManager:
    def generate(self, rollout_id: int) -> ray.ObjectRef:
        """Generate rollout data for training"""
    
    def eval(self, rollout_id: int) -> ray.ObjectRef:
        """Run evaluation on eval datasets"""
    
    def onload_weights(self) -> None:
        """Load weights back to GPU from CPU"""
    
    def offload(self) -> None:
        """Offload weights to CPU to free GPU memory"""
    
    def dispose(self) -> None:
        """Cleanup and shutdown inference engines"""
Key Methods:
generate
method
Generate rollout data by sampling prompts and running inference.Parameters:
  • rollout_id (int): Current rollout step ID
Returns: Ray ObjectRef containing RolloutFnTrainOutput with samples and metricsBehavior:
  • Fetches prompts from data source
  • Generates responses via inference engines
  • Computes rewards using reward function
  • Applies dynamic filters (if configured)
  • Returns rollout_batch_size valid samples
eval
method
Run evaluation on configured eval datasets.Parameters:
  • rollout_id (int): Current rollout step for logging
Returns: Ray ObjectRef containing RolloutFnEvalOutput with evaluation results
Source: Referenced in train.py:17, implementation in slime/ray/rollout.py

Rollout Functions

generate_rollout()

Main rollout generation function that produces training samples.
from slime.rollout.sglang_rollout import generate_rollout

output = generate_rollout(args, rollout_id, data_source, evaluation=False)
args
Namespace
required
Training arguments with rollout parameters
rollout_id
int
required
Current rollout step ID for deterministic sampling
data_source
DataSource
required
Data source for fetching prompts (e.g., RolloutDataSourceWithBuffer)
evaluation
bool
default:"False"
Whether this is an evaluation rollout
Returns
RolloutFnTrainOutput | RolloutFnEvalOutput
For training: RolloutFnTrainOutput containing:
  • samples: List of sample groups (list[list[Sample]])
  • metrics: Optional metrics dictionary
For evaluation: RolloutFnEvalOutput containing:
  • data: Dict mapping dataset names to results
  • metrics: Optional metrics dictionary
Source: slime/rollout/sglang_rollout.py:563

generate()

Generate a single sample using SGLang inference engine.
async def generate(args: Namespace, sample: Sample, sampling_params: dict) -> Sample:
    """Generate response for a single sample"""
args
Namespace
required
Training arguments
sample
Sample
required
Sample object with prompt and metadata
sampling_params
dict
required
Sampling parameters:
{
    "temperature": 1.0,
    "top_p": 1.0,
    "top_k": -1,
    "max_new_tokens": 512,
    "stop": ["<|endoftext|>"],
    "stop_token_ids": [128001],
    "skip_special_tokens": False
}
Returns
Sample
Updated sample with:
  • response: Generated text
  • tokens: Full token sequence (prompt + response)
  • response_length: Number of generated tokens
  • rollout_log_probs: Log probabilities for each token
  • status: Sample.Status enum (COMPLETED, TRUNCATED, ABORTED)
Features:
  • Supports multi-turn generation via token continuation
  • Handles multimodal inputs (images, video)
  • Integrates with RadixTree middleware for prefix caching
  • Supports partial rollout with loss masking
Source: slime/rollout/sglang_rollout.py:108

Data Structures

RolloutFnTrainOutput

Output type for training rollout functions.
from slime.rollout.base_types import RolloutFnTrainOutput

@dataclass
class RolloutFnTrainOutput:
    samples: list[list[Sample]]  # rollout_batch_size groups of n_samples_per_prompt
    metrics: dict[str, Any] = None  # Optional metrics (e.g., filter stats)
Source: slime/rollout/base_types.py:8

RolloutFnEvalOutput

Output type for evaluation rollout functions.
from slime.rollout.base_types import RolloutFnEvalOutput

@dataclass
class RolloutFnEvalOutput:
    data: dict[str, dict[str, Any]]  # dataset_name -> {"rewards": [...], "samples": [...]}
    metrics: dict[str, Any] = None  # Optional metrics
Example data structure:
{
    "gsm8k": {
        "rewards": [1.0, 0.0, 1.0, ...],
        "truncated": [False, False, True, ...],
        "samples": [sample1, sample2, ...]
    },
    "math": {...}
}
Source: slime/rollout/base_types.py:14

Data Sources

RolloutDataSourceWithBuffer

Data source with buffer support for partial rollout and sample reuse.
from slime.rollout.data_source import RolloutDataSourceWithBuffer

data_source = RolloutDataSourceWithBuffer(args)
samples = data_source.get_samples(num_samples=32)
data_source.add_samples(aborted_samples)  # Add partial samples back to buffer
Key Methods:
get_samples
method
Retrieve sample groups from buffer or dataset.Parameters:
  • num_samples (int): Number of sample groups to retrieve
Returns: list[list[Sample]] - Sample groupsBehavior:
  1. First tries to get samples from buffer
  2. If buffer insufficient, samples from dataset
  3. Handles epoch transitions and shuffling
add_samples
method
Add sample groups back to buffer (e.g., partial rollout samples).Parameters:
  • samples (list[list[Sample]]): Sample groups to add
save
method
Save data source state to checkpoint.Parameters:
  • rollout_id (int): Current rollout ID
load
method
Load data source state from checkpoint.Parameters:
  • rollout_id (int): Rollout ID to load from
Source: slime/rollout/data_source.py:166

Rollout Configuration

Key Arguments

Inference Engine:
  • --rollout-num-gpus: Total GPUs for rollout engines
  • --rollout-num-gpus-per-engine: GPUs per engine (tensor parallel size)
  • --hf-checkpoint: HuggingFace checkpoint path
Generation Parameters:
  • --rollout-temperature: Sampling temperature (default: 1.0)
  • --rollout-top-p: Top-p sampling (default: 1.0)
  • --rollout-top-k: Top-k sampling (default: -1)
  • --rollout-max-response-len: Max generation length
  • --rollout-stop: Stop strings
  • --rollout-stop-token-ids: Stop token IDs
Data Configuration:
  • --rollout-batch-size: Samples per rollout step
  • --n-samples-per-prompt: Responses per prompt
  • --rollout-shuffle: Shuffle prompts
  • --rollout-seed: Random seed
Dynamic Sampling:
  • --over-sampling-batch-size: Granularity for sampling
  • --dynamic-sampling-filter-path: Filter function path
  • --partial-rollout: Enable partial rollout recycling
Custom Functions:
  • --rollout-function-path: Custom rollout function
  • --custom-generate-function-path: Custom generate function
  • --custom-rollout-log-function-path: Custom logging function
See Also:

Build docs developers (and LLMs) love