- High-Performance Training: Supports efficient training in various modes by connecting Megatron with SGLang
- Flexible Data Generation: Enables arbitrary training data generation workflows through custom data generation interfaces and server-based engines
Proven at Scale
slime is the RL framework behind production models including GLM-5, GLM-4.7, GLM-4.6, and GLM-4.5.Supported Models
Apart from models from Z.ai, slime supports:- Qwen3 series (Qwen3Next, Qwen3MoE, Qwen3), Qwen2.5 series
- DeepSeek V3 series (DeepSeek V3, V3.1, DeepSeek R1)
- Llama 3
Architecture Overview
slime consists of three core modules:Training (Megatron)
Handles the main training process, reads data from the Data Buffer, and synchronizes parameters to the rollout module
Rollout (SGLang + router)
Generates new data including rewards and verifier outputs, stores results in the Data Buffer
Data Buffer
Bridge module that manages prompt initialization, custom data, and rollout generation methods
Get Started
Installation
Set up slime using Docker or conda in minutes
Quick Start
Train your first model with a working example
Usage Guide
Learn about command-line parameters and configuration
API Reference
Explore the full API documentation
Key Features
Multiple RL Algorithms
slime supports various reinforcement learning algorithms:- GRPO (Group Relative Policy Optimization)
- GSPO (Group-wise Sparse Policy Optimization)
- Reinforce++ and Reinforce++ Baseline
- PPO (Proximal Policy Optimization)
Advanced Training Capabilities
- Dynamic Batching: Intelligently pack samples of varying lengths to maximize GPU utilization
- Colocated Training: Deploy training and inference on the same GPUs
- Multi-Node Support: Scale to hundreds of GPUs for large MoE models
- Mixed Precision: bf16 training with fp8 inference support
Flexible Data Generation
- Dynamic Sampling: Advanced sampling strategies for improved data diversity
- Partial Rollout: Cache and continue half-generated samples
- Custom Functions: Write custom generation and reward functions for complex scenarios
- Multi-Turn Support: Built-in support for agent scenarios with tool calling
Projects Built with slime
slime has powered several novel research projects and production systems:P1: Mastering Physics Olympiads
P1 is a family of open-source physics reasoning models trained entirely through reinforcement learning, delivering breakthrough performance in physics reasoning.RLVE: Scaling with Verifiable Environments
RLVE uses verifiable environments that procedurally generate problems with algorithmically verifiable rewards to scale RL for language models.TritonForge: Kernel Generation
TritonForge leverages slime’s capabilities to train LLMs that automatically generate optimized GPU kernels.APRIL: Accelerating RL Training
APRIL introduces system-level optimizations that integrate with slime to accelerate the rollout generation phase.Community and Support
GitHub
View source code and contribute
Documentation
Browse the full documentation
Contributions are welcome! Submit issues or pull requests to help improve slime.