Featured Projects
P1: Physics Olympiad Mastery
Open-source physics reasoning models trained entirely through RL
RLVE: Verifiable Environments
Scaling LM RL with adaptive verifiable environments
TritonForge: Kernel Generation
Agentic RL training for optimized GPU kernel generation
APRIL: Accelerated Rollouts
System-level optimization for faster RL training
⚛️ P1: Mastering Physics Olympiads with Reinforcement Learning
P1 is a family of open-source physics reasoning models trained entirely through reinforcement learning.Key Features
- Multi-stage RL training algorithm with progressive reasoning enhancement
- Adaptive learnability adjustment for optimal training dynamics
- Stabilization mechanisms to ensure consistent performance
- Breakthrough performance in open-source physics reasoning
How it Uses slime
P1 leverages slime as the RL post-training framework, utilizing its high-performance training capabilities to implement sophisticated multi-stage training algorithms that progressively enhance reasoning ability.P1 demonstrates slime’s capability to train specialized reasoning models through pure RL, achieving state-of-the-art results in physics problem-solving.
📈 RLVE: Scaling LM RL with Adaptive Verifiable Environments
RLVE introduces an approach using verifiable environments that procedurally generate problems and provide algorithmically verifiable rewards to scale up RL for language models.Key Features
- 400 verifiable environments for joint training
- Procedural problem generation with algorithmic verification
- Dynamic difficulty adaptation based on policy capabilities
- Automatic reward verification without human annotation
How it Uses slime
RLVE builds upon slime’s flexible data generation capabilities to implement procedurally generated problems across hundreds of verifiable environments, with each environment dynamically adapting its problem difficulty distribution as training progresses.RLVE showcases slime’s ability to scale RL training across diverse environments with automatic reward verification.
⚡ TritonForge: Agentic RL Training Framework for Kernel Generation
TritonForge leverages slime’s SFT & RL capabilities to train LLMs that automatically generate optimized GPU kernels.Key Features
- Two-stage training approach (SFT followed by RL)
- Multi-turn compilation feedback for iterative improvement
- Automatic kernel optimization from PyTorch operations
- High-performance Triton kernel generation
How it Uses slime
TritonForge utilizes both slime’s supervised fine-tuning and reinforcement learning capabilities. The framework employs multi-turn compilation feedback as the reward signal, enabling LLMs to learn from compilation results and generate increasingly optimized GPU kernels.TritonForge demonstrates slime’s versatility in code generation tasks, particularly for performance-critical GPU kernel optimization.
🚀 APRIL: Accelerating RL Training with Active Partial Rollouts
APRIL introduces a system-level optimization that seamlessly integrates with slime to accelerate the rollout generation phase in RL training.Key Features
- Active partial rollout management for efficiency
- Intelligent request over-provisioning to reduce latency
- Long-tail generation optimization addressing the 90%+ bottleneck
- Seamless slime integration without code changes
How it Uses slime
APRIL integrates at the system level with slime’s rollout generation phase, intelligently managing partial completions to address the long-tail generation bottleneck that typically consumes over 90% of RL training time.APRIL achieves significant speedups in RL training by optimizing slime’s rollout generation phase, demonstrating the framework’s extensibility for system-level optimizations.
🏟️ qqr: Scaling Open-Ended Agents with ArenaRL & MCP
qqr (also known as hilichurl) is a lightweight extension for slime designed to evolve open-ended agents through tournament-based training.Key Features
- ArenaRL algorithm with tournament-based relative ranking
- Model Context Protocol (MCP) integration for standardized tools
- Multiple tournament formats (Seeded Single-Elimination, Round-Robin)
- Discriminative collapse prevention through competitive evaluation
- Decoupled tool environments for scalable agent evolution
How it Uses slime
qqr extends slime with lightweight modifications to implement the ArenaRL algorithm, leveraging slime’s high-throughput training capabilities to enable scalable, distributed evolution of agents in standardized tool environments.qqr showcases slime’s extensibility for agentic training scenarios, combining tournament-based RL with standardized tool protocols for open-ended agent evolution.
What These Projects Demonstrate
These projects showcase slime’s versatility across diverse domains:- Physics Reasoning (P1): Complex multi-stage RL training
- Environment Scaling (RLVE): Adaptive multi-environment training
- Code Generation (TritonForge): SFT + RL for kernel optimization
- System Optimization (APRIL): Integration-friendly architecture
- Agentic Training (qqr): Open-ended agent evolution with MCP