High-Level Architecture
slime is built around a three-module architecture that separates training, inference, and data management concerns. This design enables efficient RL scaling by connecting Megatron-LM for training with SGLang for high-throughput rollout generation.Core Modules
Training Module (Megatron)
The training module handles the main RL training process using Megatron-LM as the backend.Key Responsibilities:
- Parameter updates using actor/critic models
- Data consumption from the Data Buffer
- Weight synchronization to rollout engines
- Checkpoint saving and loading
train.py:64-94, the training loop follows this pattern:
- Tensor Parallelism (TP): Split model tensors across GPUs
- Pipeline Parallelism (PP): Split model layers across GPUs
- Data Parallelism (DP): Replicate model across GPUs
- Context Parallelism (CP): Split long sequences across GPUs
- Expert Parallelism (EP): For MoE models, split experts across GPUs
Rollout Module (SGLang + Router)
The rollout module generates new training data by running inference on the current policy.Key Responsibilities:
- High-throughput text generation using SGLang
- Multi-engine load balancing via sgl-router
- Reward model evaluation
- Dynamic sampling and filtering
placement_group.py:79-119, the rollout module uses Ray placement groups:
sgl-router to schedule requests across multiple SGLang servers:
- DP Size: Calculated as
rollout_num_gpus / rollout_num_gpus_per_engine - Load Balancing: Supports round-robin, consistent hashing, and custom policies
- Session Affinity: Maintains KV cache across multi-turn interactions
Data Buffer
The data buffer acts as a bridge between the rollout and training modules. Data Flow:Deployment Modes
Disaggregated Mode
Training and rollout use separate GPU pools:- Maximum throughput (training and rollout run in parallel)
- Better GPU utilization
- Easier scaling of individual components
Colocated Mode
Training and rollout share the same GPUs:- Reduced GPU requirements
- Lower memory transfer overhead
- Suitable for smaller models or limited GPU clusters
In colocated mode, set
--sglang-mem-fraction-static 0.8 to prevent GPU OOM, as Megatron occupies memory before offloading.Resource Management
slime uses Ray placement groups to manage GPU allocation:Multi-Node Training
For large-scale MoE models, slime supports multi-node distributed training:Related Topics
Training Loop
Learn about the Data Sampling → Weight Update cycle
Rollout & Reward
Understand rollout generation and reward models