Changelog - slime

Version History

v0.1.0 - Redefining High-Performance RL Training Frameworks

Release Date: January 2025 Blog Post: v0.1.0: Redefining High-Performance RL Training Frameworks This major release represents a significant milestone for slime as the RL training framework powering GLM-5, GLM-4.7, GLM-4.6, and GLM-4.5 from Z.ai.

Key Features

High-Performance Training

Efficient training in various modes by connecting Megatron with SGLang
Support for co-located and decoupled training and inference
Dynamic batch sizing for optimal GPU utilization
Data packing and variable-length processing enabled by default

Flexible Data Generation

Arbitrary training data generation workflows
Custom data generation interfaces
Server-based generation engines
Integration with SGLang for high-throughput rollout

Model Support

GLM series (GLM-5, GLM-4.7, GLM-4.6, GLM-4.5)
Qwen3 series (Qwen3Next, Qwen3MoE, Qwen3)
Qwen2.5 series
DeepSeek V3 series (DeepSeek V3, V3.1, DeepSeek R1)
Llama 3

Advanced Capabilities

Speculative decoding in RL - Community collaboration for faster inference
Low-precision training - fp8 rollout + bf16/fp8 training, int4 rollout + int4 QAT training
Deterministic training - Reproducible training runs for research

Architecture

slime provides a three-component architecture:

Training (Megatron) - Main training process that reads from the Data Buffer and synchronizes parameters to rollout
Rollout (SGLang + router) - Generates new data including rewards/verifier outputs
Data Buffer - Bridge module managing prompt initialization, custom data, and rollout generation

Community Projects

Several notable projects have been built upon slime:

P1 - Physics Olympiad reasoning models trained with RL
RLVE - Scaling LM RL with adaptive verifiable environments
TritonForge - Agentic RL training for kernel generation
APRIL - Accelerating RL training with active partial rollouts
qqr - Scaling open-ended agents with ArenaRL & MCP

Configuration

slime supports three categories of arguments:

Megatron arguments - Full support for Megatron configuration
SGLang arguments - All SGLang arguments with --sglang- prefix
slime-specific arguments - Custom parameters for RL training workflows

Developer Tools

Pre-commit hooks for code style consistency
Comprehensive debugging guide
Clear contribution guidelines focused on bug fixes and performance optimizations

Acknowledgements

Special thanks to the following projects and communities:

SGLang
Megatron-LM
mbridge
OpenRLHF
veRL
Pai-Megatron-Patch

Citation

To cite slime in your research:

@misc{slime_github,
  author       = {Zilin Zhu and Chengxing Xie and Xin Lv and slime Contributors},
  title        = {slime: An LLM post-training framework for RL Scaling},
  year         = {2025},
  howpublished = {\url{https://github.com/THUDM/slime}},
  note         = {GitHub repository. Corresponding author: Xin Lv},
  urldate      = {2025-06-19}
}

Future Releases

Check the GitHub repository for upcoming releases and development progress.

Community

Documentation Index

​Version History

​v0.1.0 - Redefining High-Performance RL Training Frameworks

​Key Features

​Architecture

​Community Projects

​Configuration

​Developer Tools

​Acknowledgements

​Citation

​Future Releases

Build docs developers (and LLMs) love