Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/verl-project/verl/llms.txt

Use this file to discover all available pages before exploring further.

verl supports a flexible set of training and inference backends, letting you pick the right combination for your workload — from rapid prototyping on a single node to scaled multi-node production runs. This page covers system requirements, the recommended Docker-based setup, pip installation from source, and a guide to choosing the right backends.

Requirements

verl requires Python >= 3.10 and CUDA >= 12.8. CUDA 12.8 or newer is strongly recommended. Older CUDA versions are not supported by the pre-built images or the stable install path.
DependencyMinimum version
Python3.10
CUDA12.8
cuDNN9.10.0

Choosing Your Backends

Before installing, decide which training and inference backends you need. The choice affects which Docker image or pip extras you select.

Training backends

  • FSDP / FSDP2 — The recommended backend for research and prototyping. Works with any model supported by Hugging Face Transformers. To use FSDP2, set strategy=fsdp2 in your Hydra config.
  • Megatron-LM — Recommended when you need maximum scalability across many nodes and GPUs. verl currently supports Megatron-LM v0.13.1. Both backends share the same unified worker layer.

Inference backends

  • vLLM — Stable and well-tested (vLLM 0.8.3 and later). Set VLLM_USE_V1=1 for optimal performance.
  • SGLang — Under extensive development; recommended for advanced multi-turn and agentic features. Refer to the SGLang Backend documentation for detailed setup steps.
  • HuggingFace TGI — Suitable for debugging and single-GPU exploration only.
vLLM 0.7.x releases have known instability issues with verl. Use vLLM 0.8.3 or a later release. Set the environment variable VLLM_USE_V1=1 for the best performance with supported models.

Installation


AMD GPU Support (ROCm)

For AMD MI300 GPUs with the ROCm platform, use the dedicated ROCm Dockerfile: Build the image:
docker build -f docker/Dockerfile.rocm -t verl-rocm .
Launch the container:
docker run --rm -it \
  --device /dev/dri \
  --device /dev/kfd \
  -p 8265:8265 \
  --group-add video \
  --cap-add SYS_PTRACE \
  --security-opt seccomp=unconfined \
  --privileged \
  -v $HOME/.ssh:/root/.ssh \
  -v $HOME:$HOME \
  --shm-size 128G \
  -w $PWD \
  verl-rocm \
  /bin/bash
If you need to run as a non-root user, add -e HOST_UID=$(id -u) and -e HOST_GID=$(id -g) to the launch command.
AMD GPU support currently covers FSDP as the training engine, with vLLM and SGLang as inference engines. Megatron-LM support for AMD is planned for a future release.

Build docs developers (and LLMs) love