Hardware Requirements
slime supports multiple NVIDIA GPU hardware platforms:- H-Series (H100/H200): Official support with comprehensive CI testing and stable performance
- B200 Series: Fully supported with identical setup steps as H-series GPUs
Latest Docker images are compatible with both B-series and H-series GPUs without additional configuration.
Docker Installation (Recommended)
Docker Benefits
Pre-configured Environment
All dependencies including SGLang and Megatron patches pre-installed
Avoid Conflicts
Isolated environment prevents version conflicts with system packages
Quick Setup
Get started in minutes without manual dependency installation
Consistent Behavior
Guaranteed compatibility across different host systems
Conda Installation
For scenarios where Docker is not convenient, you can install using conda:Review the build script
The official build script provides a reference for conda installation:
This script includes all necessary steps to set up SGLang, Megatron, and other dependencies. You may need to adjust paths and versions for your environment.
Multi-Node Setup
For large-scale training with multiple nodes, you need to set up a Ray cluster.Start Ray head node
On the first node (node 0), start the Ray head:Replace
${MASTER_ADDR} with the IP address of node 0.Network Configuration
In complex network environments (Docker, SLURM), you may need to specify network interfaces:
AMD GPU Support
slime also supports AMD GPUs. For installation instructions specific to AMD hardware:AMD Usage Tutorial
View the complete AMD setup guide
Verify Installation
After installation, verify that slime is working correctly:Development Setup
If you plan to contribute to slime, set up pre-commit hooks:Troubleshooting
CUDA out of memory during co-located training
CUDA out of memory during co-located training
When running training and inference on the same GPUs, reduce SGLang’s memory usage:Megatron will offload after initialization to free memory for SGLang.
Wrong network interface selected
Wrong network interface selected
Explicitly set network interfaces using environment variables:
Megatron checkpoint conversion fails
Megatron checkpoint conversion fails
Ensure you’re using the correct model configuration:For models with custom vocab sizes, manually set
--vocab-size during conversion.Ray cluster connection issues
Ray cluster connection issues
Verify the head node is accessible:Check firewall rules allow ports 6379 (Redis) and 8265 (Dashboard).
Next Steps
Quick Start
Run your first training job
Usage Guide
Learn about configuration and parameters