Introduction to slime

slime is an LLM post-training framework for RL scaling, providing two core capabilities:

High-Performance Training: Supports efficient training in various modes by connecting Megatron with SGLang
Flexible Data Generation: Enables arbitrary training data generation workflows through custom data generation interfaces and server-based engines

Proven at Scale

slime is the RL framework behind production models including GLM-5, GLM-4.7, GLM-4.6, and GLM-4.5.

Supported Models

Apart from models from Z.ai, slime supports:

Qwen3 series (Qwen3Next, Qwen3MoE, Qwen3), Qwen2.5 series
DeepSeek V3 series (DeepSeek V3, V3.1, DeepSeek R1)
Llama 3

Architecture Overview

slime consists of three core modules:

Training (Megatron)

Handles the main training process, reads data from the Data Buffer, and synchronizes parameters to the rollout module

Rollout (SGLang + router)

Generates new data including rewards and verifier outputs, stores results in the Data Buffer

Data Buffer

Bridge module that manages prompt initialization, custom data, and rollout generation methods

Get Started

Installation

Set up slime using Docker or conda in minutes

Quick Start

Train your first model with a working example

Usage Guide

Learn about command-line parameters and configuration

API Reference

Explore the full API documentation

Key Features

Multiple RL Algorithms

slime supports various reinforcement learning algorithms:

GRPO (Group Relative Policy Optimization)
GSPO (Group-wise Sparse Policy Optimization)
Reinforce++ and Reinforce++ Baseline
PPO (Proximal Policy Optimization)

Advanced Training Capabilities

slime uses data packing methods and strictly ensures per-sample or per-token loss is correct. Enabling dynamic batch size will not affect loss calculation.

Dynamic Batching: Intelligently pack samples of varying lengths to maximize GPU utilization
Colocated Training: Deploy training and inference on the same GPUs
Multi-Node Support: Scale to hundreds of GPUs for large MoE models
Mixed Precision: bf16 training with fp8 inference support

Flexible Data Generation

Dynamic Sampling: Advanced sampling strategies for improved data diversity
Partial Rollout: Cache and continue half-generated samples
Custom Functions: Write custom generation and reward functions for complex scenarios
Multi-Turn Support: Built-in support for agent scenarios with tool calling

Projects Built with slime

slime has powered several novel research projects and production systems:

P1: Mastering Physics Olympiads

P1 is a family of open-source physics reasoning models trained entirely through reinforcement learning, delivering breakthrough performance in physics reasoning.

RLVE: Scaling with Verifiable Environments

RLVE uses verifiable environments that procedurally generate problems with algorithmically verifiable rewards to scale RL for language models.

TritonForge: Kernel Generation

TritonForge leverages slime’s capabilities to train LLMs that automatically generate optimized GPU kernels.

APRIL: Accelerating RL Training

APRIL introduces system-level optimizations that integrate with slime to accelerate the rollout generation phase.

Community and Support

GitHub

View source code and contribute

Documentation

Browse the full documentation

Contributions are welcome! Submit issues or pull requests to help improve slime.

Get Started

Core Concepts

Guides

Advanced

Platform Support

Introduction to slime

Proven at Scale

Supported Models

Architecture Overview

Training (Megatron)

Rollout (SGLang + router)

Data Buffer

Get Started

Installation

Quick Start

Usage Guide

API Reference

Key Features

Multiple RL Algorithms

Advanced Training Capabilities

Flexible Data Generation

Projects Built with slime

P1: Mastering Physics Olympiads

RLVE: Scaling with Verifiable Environments

TritonForge: Kernel Generation

APRIL: Accelerating RL Training

Community and Support

GitHub

Documentation

Build docs developers (and LLMs) love

Get Started

Core Concepts

Guides

Advanced

Platform Support

Documentation Index

​Proven at Scale

​Supported Models

​Architecture Overview

Training (Megatron)

Rollout (SGLang + router)

Data Buffer

​Get Started

Installation

Quick Start

Usage Guide

API Reference

​Key Features

​Multiple RL Algorithms

​Advanced Training Capabilities

​Flexible Data Generation

​Projects Built with slime

​P1: Mastering Physics Olympiads

​RLVE: Scaling with Verifiable Environments

​TritonForge: Kernel Generation

​APRIL: Accelerating RL Training

​Community and Support

GitHub

Documentation

Build docs developers (and LLMs) love

Proven at Scale

Supported Models

Architecture Overview

Get Started

Key Features

Multiple RL Algorithms

Advanced Training Capabilities

Flexible Data Generation

Projects Built with slime

P1: Mastering Physics Olympiads

RLVE: Scaling with Verifiable Environments

TritonForge: Kernel Generation

APRIL: Accelerating RL Training

Community and Support