Skip to main content
slime is an LLM post-training framework for RL scaling, providing two core capabilities:
  1. High-Performance Training: Supports efficient training in various modes by connecting Megatron with SGLang
  2. Flexible Data Generation: Enables arbitrary training data generation workflows through custom data generation interfaces and server-based engines

Proven at Scale

slime is the RL framework behind production models including GLM-5, GLM-4.7, GLM-4.6, and GLM-4.5.

Supported Models

Apart from models from Z.ai, slime supports:
  • Qwen3 series (Qwen3Next, Qwen3MoE, Qwen3), Qwen2.5 series
  • DeepSeek V3 series (DeepSeek V3, V3.1, DeepSeek R1)
  • Llama 3

Architecture Overview

slime consists of three core modules:

Training (Megatron)

Handles the main training process, reads data from the Data Buffer, and synchronizes parameters to the rollout module

Rollout (SGLang + router)

Generates new data including rewards and verifier outputs, stores results in the Data Buffer

Data Buffer

Bridge module that manages prompt initialization, custom data, and rollout generation methods

Get Started

Installation

Set up slime using Docker or conda in minutes

Quick Start

Train your first model with a working example

Usage Guide

Learn about command-line parameters and configuration

API Reference

Explore the full API documentation

Key Features

Multiple RL Algorithms

slime supports various reinforcement learning algorithms:
  • GRPO (Group Relative Policy Optimization)
  • GSPO (Group-wise Sparse Policy Optimization)
  • Reinforce++ and Reinforce++ Baseline
  • PPO (Proximal Policy Optimization)

Advanced Training Capabilities

slime uses data packing methods and strictly ensures per-sample or per-token loss is correct. Enabling dynamic batch size will not affect loss calculation.
  • Dynamic Batching: Intelligently pack samples of varying lengths to maximize GPU utilization
  • Colocated Training: Deploy training and inference on the same GPUs
  • Multi-Node Support: Scale to hundreds of GPUs for large MoE models
  • Mixed Precision: bf16 training with fp8 inference support

Flexible Data Generation

  • Dynamic Sampling: Advanced sampling strategies for improved data diversity
  • Partial Rollout: Cache and continue half-generated samples
  • Custom Functions: Write custom generation and reward functions for complex scenarios
  • Multi-Turn Support: Built-in support for agent scenarios with tool calling

Projects Built with slime

slime has powered several novel research projects and production systems:

P1: Mastering Physics Olympiads

P1 is a family of open-source physics reasoning models trained entirely through reinforcement learning, delivering breakthrough performance in physics reasoning.

RLVE: Scaling with Verifiable Environments

RLVE uses verifiable environments that procedurally generate problems with algorithmically verifiable rewards to scale RL for language models.

TritonForge: Kernel Generation

TritonForge leverages slime’s capabilities to train LLMs that automatically generate optimized GPU kernels.

APRIL: Accelerating RL Training

APRIL introduces system-level optimizations that integrate with slime to accelerate the rollout generation phase.

Community and Support

GitHub

View source code and contribute

Documentation

Browse the full documentation
Contributions are welcome! Submit issues or pull requests to help improve slime.

Build docs developers (and LLMs) love