Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/THUDM/slime/llms.txt

Use this file to discover all available pages before exploring further.

Overview

The rollout API manages inference engines, generates model responses, computes rewards, and provides data to the training loop.

Rollout Manager

create_rollout_manager()

Create and initialize the rollout manager with SGLang inference engines.
from slime.ray.placement_group import create_rollout_manager

rollout_manager, num_rollout_per_epoch = create_rollout_manager(args, pg)
args
Namespace
required
Training arguments with rollout configuration
pg
tuple
required
Placement group tuple: (placement_group, bundle_indices, gpu_ids)
Returns
tuple[RolloutManager, int | None]
  • rollout_manager: Ray actor managing inference engines
  • num_rollout_per_epoch: Number of rollout steps per epoch (None if --num-rollout is set)
Source: slime/ray/placement_group.py:181

RolloutManager

Ray actor that manages multiple SGLang inference engines for parallel generation.
@ray.remote
class RolloutManager:
    def generate(self, rollout_id: int) -> ray.ObjectRef:
        """Generate rollout data for training"""
    
    def eval(self, rollout_id: int) -> ray.ObjectRef:
        """Run evaluation on eval datasets"""
    
    def onload_weights(self) -> None:
        """Load weights back to GPU from CPU"""
    
    def offload(self) -> None:
        """Offload weights to CPU to free GPU memory"""
    
    def dispose(self) -> None:
        """Cleanup and shutdown inference engines"""
Key Methods:
generate
method
Generate rollout data by sampling prompts and running inference.Parameters:
  • rollout_id (int): Current rollout step ID
Returns: Ray ObjectRef containing RolloutFnTrainOutput with samples and metricsBehavior:
  • Fetches prompts from data source
  • Generates responses via inference engines
  • Computes rewards using reward function
  • Applies dynamic filters (if configured)
  • Returns rollout_batch_size valid samples
eval
method
Run evaluation on configured eval datasets.Parameters:
  • rollout_id (int): Current rollout step for logging
Returns: Ray ObjectRef containing RolloutFnEvalOutput with evaluation results
Source: Referenced in train.py:17, implementation in slime/ray/rollout.py

Rollout Functions

generate_rollout()

Main rollout generation function that produces training samples.
from slime.rollout.sglang_rollout import generate_rollout

output = generate_rollout(args, rollout_id, data_source, evaluation=False)
args
Namespace
required
Training arguments with rollout parameters
rollout_id
int
required
Current rollout step ID for deterministic sampling
data_source
DataSource
required
Data source for fetching prompts (e.g., RolloutDataSourceWithBuffer)
evaluation
bool
default:"False"
Whether this is an evaluation rollout
Returns
RolloutFnTrainOutput | RolloutFnEvalOutput
For training: RolloutFnTrainOutput containing:
  • samples: List of sample groups (list[list[Sample]])
  • metrics: Optional metrics dictionary
For evaluation: RolloutFnEvalOutput containing:
  • data: Dict mapping dataset names to results
  • metrics: Optional metrics dictionary
Source: slime/rollout/sglang_rollout.py:563

generate()

Generate a single sample using SGLang inference engine.
async def generate(args: Namespace, sample: Sample, sampling_params: dict) -> Sample:
    """Generate response for a single sample"""
args
Namespace
required
Training arguments
sample
Sample
required
Sample object with prompt and metadata
sampling_params
dict
required
Sampling parameters:
{
    "temperature": 1.0,
    "top_p": 1.0,
    "top_k": -1,
    "max_new_tokens": 512,
    "stop": ["<|endoftext|>"],
    "stop_token_ids": [128001],
    "skip_special_tokens": False
}
Returns
Sample
Updated sample with:
  • response: Generated text
  • tokens: Full token sequence (prompt + response)
  • response_length: Number of generated tokens
  • rollout_log_probs: Log probabilities for each token
  • status: Sample.Status enum (COMPLETED, TRUNCATED, ABORTED)
Features:
  • Supports multi-turn generation via token continuation
  • Handles multimodal inputs (images, video)
  • Integrates with RadixTree middleware for prefix caching
  • Supports partial rollout with loss masking
Source: slime/rollout/sglang_rollout.py:108

Data Structures

RolloutFnTrainOutput

Output type for training rollout functions.
from slime.rollout.base_types import RolloutFnTrainOutput

@dataclass
class RolloutFnTrainOutput:
    samples: list[list[Sample]]  # rollout_batch_size groups of n_samples_per_prompt
    metrics: dict[str, Any] = None  # Optional metrics (e.g., filter stats)
Source: slime/rollout/base_types.py:8

RolloutFnEvalOutput

Output type for evaluation rollout functions.
from slime.rollout.base_types import RolloutFnEvalOutput

@dataclass
class RolloutFnEvalOutput:
    data: dict[str, dict[str, Any]]  # dataset_name -> {"rewards": [...], "samples": [...]}
    metrics: dict[str, Any] = None  # Optional metrics
Example data structure:
{
    "gsm8k": {
        "rewards": [1.0, 0.0, 1.0, ...],
        "truncated": [False, False, True, ...],
        "samples": [sample1, sample2, ...]
    },
    "math": {...}
}
Source: slime/rollout/base_types.py:14

Data Sources

RolloutDataSourceWithBuffer

Data source with buffer support for partial rollout and sample reuse.
from slime.rollout.data_source import RolloutDataSourceWithBuffer

data_source = RolloutDataSourceWithBuffer(args)
samples = data_source.get_samples(num_samples=32)
data_source.add_samples(aborted_samples)  # Add partial samples back to buffer
Key Methods:
get_samples
method
Retrieve sample groups from buffer or dataset.Parameters:
  • num_samples (int): Number of sample groups to retrieve
Returns: list[list[Sample]] - Sample groupsBehavior:
  1. First tries to get samples from buffer
  2. If buffer insufficient, samples from dataset
  3. Handles epoch transitions and shuffling
add_samples
method
Add sample groups back to buffer (e.g., partial rollout samples).Parameters:
  • samples (list[list[Sample]]): Sample groups to add
save
method
Save data source state to checkpoint.Parameters:
  • rollout_id (int): Current rollout ID
load
method
Load data source state from checkpoint.Parameters:
  • rollout_id (int): Rollout ID to load from
Source: slime/rollout/data_source.py:166

Rollout Configuration

Key Arguments

Inference Engine:
  • --rollout-num-gpus: Total GPUs for rollout engines
  • --rollout-num-gpus-per-engine: GPUs per engine (tensor parallel size)
  • --hf-checkpoint: HuggingFace checkpoint path
Generation Parameters:
  • --rollout-temperature: Sampling temperature (default: 1.0)
  • --rollout-top-p: Top-p sampling (default: 1.0)
  • --rollout-top-k: Top-k sampling (default: -1)
  • --rollout-max-response-len: Max generation length
  • --rollout-stop: Stop strings
  • --rollout-stop-token-ids: Stop token IDs
Data Configuration:
  • --rollout-batch-size: Samples per rollout step
  • --n-samples-per-prompt: Responses per prompt
  • --rollout-shuffle: Shuffle prompts
  • --rollout-seed: Random seed
Dynamic Sampling:
  • --over-sampling-batch-size: Granularity for sampling
  • --dynamic-sampling-filter-path: Filter function path
  • --partial-rollout: Enable partial rollout recycling
Custom Functions:
  • --rollout-function-path: Custom rollout function
  • --custom-generate-function-path: Custom generate function
  • --custom-rollout-log-function-path: Custom logging function
See Also:

Build docs developers (and LLMs) love