Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/verl-project/verl/llms.txt

Use this file to discover all available pages before exploring further.

verl is designed to run on multiple hardware platforms through a unified plugin architecture. The primary development target is NVIDIA CUDA, with production-quality support for AMD ROCm and community-maintained support for Huawei Ascend NPUs. Additional platforms (Intel XPU, Cambricon MLU, MetaX) are supported via the external verl-hardware-plugin package as reference implementations.

NVIDIA GPUs (Primary Platform)

NVIDIA is verl’s primary development and testing platform. All features, backends, and algorithms are fully supported.Supported hardware: Any NVIDIA GPU with CUDA compute capability supported by CUDA ≥ 12.8
BackendStatus
FSDP✅ Full support
FSDP2✅ Full support
Megatron-LM✅ Full support
vLLM rollout✅ Full support
SGLang rollout✅ Full support
Multi-turn / Agentic✅ Full support
LoRA (PEFT)✅ Full support
Expert parallelism (MoE)✅ Full support

Quick Start

Follow the Installation guide for the standard installation. All example scripts in examples/ run on NVIDIA hardware without modification.

Large-Scale Multi-Node Training

verl supports multi-node training via Ray clusters for models up to 671B parameters (DeepSeek-671B, Qwen3-235B) using expert parallelism and pipeline parallelism. Coordinate nodes by initializing a Ray head node and joining worker nodes before launching the training script.

Multi-Chip Plugin Architecture

verl uses a two-layer plugin system to abstract hardware differences: Platform Plugin System (verl.plugin.platform) — hardware abstraction with auto-detection:
PlatformRegistry
  ├─ "nvidia"    → PlatformCUDA      (built-in)
  ├─ "huawei"    → PlatformNPU       (built-in)
  ├─ "intel"     → PlatformXPU       (verl-hardware-plugin)
  ├─ "cambricon" → PlatformMLU       (verl-hardware-plugin)
  └─ "metax"     → PlatformMetaX     (verl-hardware-plugin)
Engine Plugin System (verl.workers.engine.base) — chip-specific training engines:
EngineRegistry  (device, vendor) → Engine class
  ├─ ("cuda", None)      → FSDPEngineWithLMHead
  ├─ ("npu", None)       → FSDPNPUEngineWithLMHead
  ├─ ("cuda", "metax")   → FSDPMetaXEngineWithLMHead
  ├─ ("xpu", "intel")    → FSDPXPUEngineWithLMHead
  └─ ("mlu", "cambricon")→ FSDPMLUEngineWithLMHead

Auto-Detection and Override

Platform is auto-detected by probing is_available() on each registered platform. Override manually:
export VERL_PLATFORM=nvidia  # or "huawei", "intel", "cambricon", "metax"
export VERL_ENGINE_DEVICE=cuda
export VERL_ENGINE_VENDOR=metax

Loading Plugins

Plugins are discovered through two mechanisms:
# Option 1: setuptools entry_points (after pip install)
# Auto-discovered from "verl.plugins" entry_points group

# Option 2: environment variable for development
export VERL_USE_EXTERNAL_MODULES=verl_hardware_plugin

Adding a New Hardware Platform

Register a platform class with the @PlatformRegistry.register decorator:
from verl.plugin.platform import PlatformRegistry, PlatformBase

@PlatformRegistry.register(platform="my_vendor")
class PlatformMyDevice(PlatformBase):
    @property
    def device_name(self) -> str:
        return "my_device"  # torch device type string

    @property
    def vendor_name(self) -> str:
        return "my_vendor"
Register a corresponding engine:
from verl.workers.engine.base import EngineRegistry
from verl.workers.engine.fsdp_engine import FSDPEngineWithLMHead

@EngineRegistry.register(
    model_type="language_model",
    backend=["fsdp", "fsdp2"],
    device="my_device",
    vendor="my_vendor",
)
class FSDPMyVendorEngineWithLMHead(FSDPEngineWithLMHead):
    def initialize(self):
        super().initialize()
        # vendor-specific initialization
For a complete step-by-step guide, see the verl-hardware-plugin development guide.

Build docs developers (and LLMs) love