Version History
v0.1.0 - Redefining High-Performance RL Training Frameworks
Release Date: January 2025 Blog Post: v0.1.0: Redefining High-Performance RL Training Frameworks This major release represents a significant milestone for slime as the RL training framework powering GLM-5, GLM-4.7, GLM-4.6, and GLM-4.5 from Z.ai.Key Features
High-Performance Training- Efficient training in various modes by connecting Megatron with SGLang
- Support for co-located and decoupled training and inference
- Dynamic batch sizing for optimal GPU utilization
- Data packing and variable-length processing enabled by default
- Arbitrary training data generation workflows
- Custom data generation interfaces
- Server-based generation engines
- Integration with SGLang for high-throughput rollout
- GLM series (GLM-5, GLM-4.7, GLM-4.6, GLM-4.5)
- Qwen3 series (Qwen3Next, Qwen3MoE, Qwen3)
- Qwen2.5 series
- DeepSeek V3 series (DeepSeek V3, V3.1, DeepSeek R1)
- Llama 3
- Speculative decoding in RL - Community collaboration for faster inference
- Low-precision training - fp8 rollout + bf16/fp8 training, int4 rollout + int4 QAT training
- Deterministic training - Reproducible training runs for research
Architecture
slime provides a three-component architecture:- Training (Megatron) - Main training process that reads from the Data Buffer and synchronizes parameters to rollout
- Rollout (SGLang + router) - Generates new data including rewards/verifier outputs
- Data Buffer - Bridge module managing prompt initialization, custom data, and rollout generation
Community Projects
Several notable projects have been built upon slime:- P1 - Physics Olympiad reasoning models trained with RL
- RLVE - Scaling LM RL with adaptive verifiable environments
- TritonForge - Agentic RL training for kernel generation
- APRIL - Accelerating RL training with active partial rollouts
- qqr - Scaling open-ended agents with ArenaRL & MCP
Configuration
slime supports three categories of arguments:- Megatron arguments - Full support for Megatron configuration
- SGLang arguments - All SGLang arguments with
--sglang-prefix - slime-specific arguments - Custom parameters for RL training workflows
Developer Tools
- Pre-commit hooks for code style consistency
- Comprehensive debugging guide
- Clear contribution guidelines focused on bug fixes and performance optimizations
Acknowledgements
Special thanks to the following projects and communities:- SGLang
- Megatron-LM
- mbridge
- OpenRLHF
- veRL
- Pai-Megatron-Patch