Skip to main content
slime has powered several novel research projects and production systems, showcasing its versatility from physics reasoning to code generation and system optimization.

P1: Physics Olympiad Mastery

Open-source physics reasoning models trained entirely through RL

RLVE: Verifiable Environments

Scaling LM RL with adaptive verifiable environments

TritonForge: Kernel Generation

Agentic RL training for optimized GPU kernel generation

APRIL: Accelerated Rollouts

System-level optimization for faster RL training

⚛️ P1: Mastering Physics Olympiads with Reinforcement Learning

P1 is a family of open-source physics reasoning models trained entirely through reinforcement learning.

Key Features

  • Multi-stage RL training algorithm with progressive reasoning enhancement
  • Adaptive learnability adjustment for optimal training dynamics
  • Stabilization mechanisms to ensure consistent performance
  • Breakthrough performance in open-source physics reasoning

How it Uses slime

P1 leverages slime as the RL post-training framework, utilizing its high-performance training capabilities to implement sophisticated multi-stage training algorithms that progressively enhance reasoning ability.
P1 demonstrates slime’s capability to train specialized reasoning models through pure RL, achieving state-of-the-art results in physics problem-solving.
Resources:

📈 RLVE: Scaling LM RL with Adaptive Verifiable Environments

RLVE introduces an approach using verifiable environments that procedurally generate problems and provide algorithmically verifiable rewards to scale up RL for language models.

Key Features

  • 400 verifiable environments for joint training
  • Procedural problem generation with algorithmic verification
  • Dynamic difficulty adaptation based on policy capabilities
  • Automatic reward verification without human annotation

How it Uses slime

RLVE builds upon slime’s flexible data generation capabilities to implement procedurally generated problems across hundreds of verifiable environments, with each environment dynamically adapting its problem difficulty distribution as training progresses.
RLVE showcases slime’s ability to scale RL training across diverse environments with automatic reward verification.
Resources:

⚡ TritonForge: Agentic RL Training Framework for Kernel Generation

TritonForge leverages slime’s SFT & RL capabilities to train LLMs that automatically generate optimized GPU kernels.

Key Features

  • Two-stage training approach (SFT followed by RL)
  • Multi-turn compilation feedback for iterative improvement
  • Automatic kernel optimization from PyTorch operations
  • High-performance Triton kernel generation

How it Uses slime

TritonForge utilizes both slime’s supervised fine-tuning and reinforcement learning capabilities. The framework employs multi-turn compilation feedback as the reward signal, enabling LLMs to learn from compilation results and generate increasingly optimized GPU kernels.
TritonForge demonstrates slime’s versatility in code generation tasks, particularly for performance-critical GPU kernel optimization.
Resources:

🚀 APRIL: Accelerating RL Training with Active Partial Rollouts

APRIL introduces a system-level optimization that seamlessly integrates with slime to accelerate the rollout generation phase in RL training.

Key Features

  • Active partial rollout management for efficiency
  • Intelligent request over-provisioning to reduce latency
  • Long-tail generation optimization addressing the 90%+ bottleneck
  • Seamless slime integration without code changes

How it Uses slime

APRIL integrates at the system level with slime’s rollout generation phase, intelligently managing partial completions to address the long-tail generation bottleneck that typically consumes over 90% of RL training time.
APRIL achieves significant speedups in RL training by optimizing slime’s rollout generation phase, demonstrating the framework’s extensibility for system-level optimizations.
Resources:

🏟️ qqr: Scaling Open-Ended Agents with ArenaRL & MCP

qqr (also known as hilichurl) is a lightweight extension for slime designed to evolve open-ended agents through tournament-based training.

Key Features

  • ArenaRL algorithm with tournament-based relative ranking
  • Model Context Protocol (MCP) integration for standardized tools
  • Multiple tournament formats (Seeded Single-Elimination, Round-Robin)
  • Discriminative collapse prevention through competitive evaluation
  • Decoupled tool environments for scalable agent evolution

How it Uses slime

qqr extends slime with lightweight modifications to implement the ArenaRL algorithm, leveraging slime’s high-throughput training capabilities to enable scalable, distributed evolution of agents in standardized tool environments.
qqr showcases slime’s extensibility for agentic training scenarios, combining tournament-based RL with standardized tool protocols for open-ended agent evolution.
Resources:

What These Projects Demonstrate

These projects showcase slime’s versatility across diverse domains:
  • Physics Reasoning (P1): Complex multi-stage RL training
  • Environment Scaling (RLVE): Adaptive multi-environment training
  • Code Generation (TritonForge): SFT + RL for kernel optimization
  • System Optimization (APRIL): Integration-friendly architecture
  • Agentic Training (qqr): Open-ended agent evolution with MCP
Whether you’re building research prototypes or production systems, slime provides a powerful foundation for RL-based post-training of language models.
Interested in adding your project to this list? Check out the contributing guide to learn how to contribute.

Build docs developers (and LLMs) love