Skip to main content

Version History

v0.1.0 - Redefining High-Performance RL Training Frameworks

Release Date: January 2025 Blog Post: v0.1.0: Redefining High-Performance RL Training Frameworks This major release represents a significant milestone for slime as the RL training framework powering GLM-5, GLM-4.7, GLM-4.6, and GLM-4.5 from Z.ai.

Key Features

High-Performance Training
  • Efficient training in various modes by connecting Megatron with SGLang
  • Support for co-located and decoupled training and inference
  • Dynamic batch sizing for optimal GPU utilization
  • Data packing and variable-length processing enabled by default
Flexible Data Generation
  • Arbitrary training data generation workflows
  • Custom data generation interfaces
  • Server-based generation engines
  • Integration with SGLang for high-throughput rollout
Model Support
  • GLM series (GLM-5, GLM-4.7, GLM-4.6, GLM-4.5)
  • Qwen3 series (Qwen3Next, Qwen3MoE, Qwen3)
  • Qwen2.5 series
  • DeepSeek V3 series (DeepSeek V3, V3.1, DeepSeek R1)
  • Llama 3
Advanced Capabilities
  • Speculative decoding in RL - Community collaboration for faster inference
  • Low-precision training - fp8 rollout + bf16/fp8 training, int4 rollout + int4 QAT training
  • Deterministic training - Reproducible training runs for research

Architecture

slime provides a three-component architecture:
  1. Training (Megatron) - Main training process that reads from the Data Buffer and synchronizes parameters to rollout
  2. Rollout (SGLang + router) - Generates new data including rewards/verifier outputs
  3. Data Buffer - Bridge module managing prompt initialization, custom data, and rollout generation

Community Projects

Several notable projects have been built upon slime:
  • P1 - Physics Olympiad reasoning models trained with RL
  • RLVE - Scaling LM RL with adaptive verifiable environments
  • TritonForge - Agentic RL training for kernel generation
  • APRIL - Accelerating RL training with active partial rollouts
  • qqr - Scaling open-ended agents with ArenaRL & MCP

Configuration

slime supports three categories of arguments:
  1. Megatron arguments - Full support for Megatron configuration
  2. SGLang arguments - All SGLang arguments with --sglang- prefix
  3. slime-specific arguments - Custom parameters for RL training workflows

Developer Tools

  • Pre-commit hooks for code style consistency
  • Comprehensive debugging guide
  • Clear contribution guidelines focused on bug fixes and performance optimizations

Acknowledgements

Special thanks to the following projects and communities:
  • SGLang
  • Megatron-LM
  • mbridge
  • OpenRLHF
  • veRL
  • Pai-Megatron-Patch

Citation

To cite slime in your research:
@misc{slime_github,
  author       = {Zilin Zhu and Chengxing Xie and Xin Lv and slime Contributors},
  title        = {slime: An LLM post-training framework for RL Scaling},
  year         = {2025},
  howpublished = {\url{https://github.com/THUDM/slime}},
  note         = {GitHub repository. Corresponding author: Xin Lv},
  urldate      = {2025-06-19}
}

Future Releases

Check the GitHub repository for upcoming releases and development progress.

Build docs developers (and LLMs) love