Projects Built with slime

slime has powered several novel research projects and production systems, showcasing its versatility from physics reasoning to code generation and system optimization.

Featured Projects

P1: Physics Olympiad Mastery

Open-source physics reasoning models trained entirely through RL

RLVE: Verifiable Environments

Scaling LM RL with adaptive verifiable environments

TritonForge: Kernel Generation

Agentic RL training for optimized GPU kernel generation

APRIL: Accelerated Rollouts

System-level optimization for faster RL training

⚛️ P1: Mastering Physics Olympiads with Reinforcement Learning

P1 is a family of open-source physics reasoning models trained entirely through reinforcement learning.

Key Features

Multi-stage RL training algorithm with progressive reasoning enhancement
Adaptive learnability adjustment for optimal training dynamics
Stabilization mechanisms to ensure consistent performance
Breakthrough performance in open-source physics reasoning

How it Uses slime

P1 leverages slime as the RL post-training framework, utilizing its high-performance training capabilities to implement sophisticated multi-stage training algorithms that progressively enhance reasoning ability.

P1 demonstrates slime’s capability to train specialized reasoning models through pure RL, achieving state-of-the-art results in physics problem-solving.

Resources:

Project Website

📈 RLVE: Scaling LM RL with Adaptive Verifiable Environments

RLVE introduces an approach using verifiable environments that procedurally generate problems and provide algorithmically verifiable rewards to scale up RL for language models.

Key Features

400 verifiable environments for joint training
Procedural problem generation with algorithmic verification
Dynamic difficulty adaptation based on policy capabilities
Automatic reward verification without human annotation

How it Uses slime

RLVE builds upon slime’s flexible data generation capabilities to implement procedurally generated problems across hundreds of verifiable environments, with each environment dynamically adapting its problem difficulty distribution as training progresses.

RLVE showcases slime’s ability to scale RL training across diverse environments with automatic reward verification.

Resources:

GitHub Repository

⚡ TritonForge: Agentic RL Training Framework for Kernel Generation

TritonForge leverages slime’s SFT & RL capabilities to train LLMs that automatically generate optimized GPU kernels.

Key Features

Two-stage training approach (SFT followed by RL)
Multi-turn compilation feedback for iterative improvement
Automatic kernel optimization from PyTorch operations
High-performance Triton kernel generation

How it Uses slime

TritonForge utilizes both slime’s supervised fine-tuning and reinforcement learning capabilities. The framework employs multi-turn compilation feedback as the reward signal, enabling LLMs to learn from compilation results and generate increasingly optimized GPU kernels.

TritonForge demonstrates slime’s versatility in code generation tasks, particularly for performance-critical GPU kernel optimization.

Resources:

GitHub Repository

🚀 APRIL: Accelerating RL Training with Active Partial Rollouts

APRIL introduces a system-level optimization that seamlessly integrates with slime to accelerate the rollout generation phase in RL training.

Key Features

Active partial rollout management for efficiency
Intelligent request over-provisioning to reduce latency
Long-tail generation optimization addressing the 90%+ bottleneck
Seamless slime integration without code changes

How it Uses slime

APRIL integrates at the system level with slime’s rollout generation phase, intelligently managing partial completions to address the long-tail generation bottleneck that typically consumes over 90% of RL training time.

APRIL achieves significant speedups in RL training by optimizing slime’s rollout generation phase, demonstrating the framework’s extensibility for system-level optimizations.

Resources:

GitHub Repository

🏟️ qqr: Scaling Open-Ended Agents with ArenaRL & MCP

qqr (also known as hilichurl) is a lightweight extension for slime designed to evolve open-ended agents through tournament-based training.

Key Features

ArenaRL algorithm with tournament-based relative ranking
Model Context Protocol (MCP) integration for standardized tools
Multiple tournament formats (Seeded Single-Elimination, Round-Robin)
Discriminative collapse prevention through competitive evaluation
Decoupled tool environments for scalable agent evolution

How it Uses slime

qqr extends slime with lightweight modifications to implement the ArenaRL algorithm, leveraging slime’s high-throughput training capabilities to enable scalable, distributed evolution of agents in standardized tool environments.

qqr showcases slime’s extensibility for agentic training scenarios, combining tournament-based RL with standardized tool protocols for open-ended agent evolution.

Resources:

GitHub Repository

What These Projects Demonstrate

These projects showcase slime’s versatility across diverse domains:

Physics Reasoning (P1): Complex multi-stage RL training
Environment Scaling (RLVE): Adaptive multi-environment training
Code Generation (TritonForge): SFT + RL for kernel optimization
System Optimization (APRIL): Integration-friendly architecture
Agentic Training (qqr): Open-ended agent evolution with MCP

Whether you’re building research prototypes or production systems, slime provides a powerful foundation for RL-based post-training of language models.

Interested in adding your project to this list? Check out the contributing guide to learn how to contribute.

Training Examples

Use Cases

Projects Built with slime

Featured Projects

P1: Physics Olympiad Mastery

RLVE: Verifiable Environments

TritonForge: Kernel Generation

APRIL: Accelerated Rollouts

⚛️ P1: Mastering Physics Olympiads with Reinforcement Learning

Key Features

How it Uses slime

📈 RLVE: Scaling LM RL with Adaptive Verifiable Environments

Key Features

How it Uses slime

⚡ TritonForge: Agentic RL Training Framework for Kernel Generation

Key Features

How it Uses slime

🚀 APRIL: Accelerating RL Training with Active Partial Rollouts

Key Features

How it Uses slime

🏟️ qqr: Scaling Open-Ended Agents with ArenaRL & MCP

Key Features

How it Uses slime

What These Projects Demonstrate

Build docs developers (and LLMs) love

Training Examples

Use Cases

Documentation Index

​Featured Projects

P1: Physics Olympiad Mastery

RLVE: Verifiable Environments

TritonForge: Kernel Generation

APRIL: Accelerated Rollouts

​⚛️ P1: Mastering Physics Olympiads with Reinforcement Learning

​Key Features

​How it Uses slime

​📈 RLVE: Scaling LM RL with Adaptive Verifiable Environments

​Key Features

​How it Uses slime

​⚡ TritonForge: Agentic RL Training Framework for Kernel Generation

​Key Features

​How it Uses slime

​🚀 APRIL: Accelerating RL Training with Active Partial Rollouts

​Key Features

​How it Uses slime

​🏟️ qqr: Scaling Open-Ended Agents with ArenaRL & MCP

​Key Features

​How it Uses slime

​What These Projects Demonstrate

Build docs developers (and LLMs) love

Featured Projects

⚛️ P1: Mastering Physics Olympiads with Reinforcement Learning

Key Features

How it Uses slime

📈 RLVE: Scaling LM RL with Adaptive Verifiable Environments

Key Features

How it Uses slime

⚡ TritonForge: Agentic RL Training Framework for Kernel Generation

Key Features

How it Uses slime

🚀 APRIL: Accelerating RL Training with Active Partial Rollouts

Key Features

How it Uses slime

🏟️ qqr: Scaling Open-Ended Agents with ArenaRL & MCP

Key Features

How it Uses slime

What These Projects Demonstrate