Installing verl: Docker, pip, and Backend Setup

verl supports a flexible set of training and inference backends, letting you pick the right combination for your workload — from rapid prototyping on a single node to scaled multi-node production runs. This page covers system requirements, the recommended Docker-based setup, pip installation from source, and a guide to choosing the right backends.

Requirements

verl requires Python >= 3.10 and CUDA >= 12.8. CUDA 12.8 or newer is strongly recommended. Older CUDA versions are not supported by the pre-built images or the stable install path.

Dependency	Minimum version
Python	3.10
CUDA	12.8
cuDNN	9.10.0

Choosing Your Backends

Before installing, decide which training and inference backends you need. The choice affects which Docker image or pip extras you select.

Training backends

FSDP / FSDP2 — The recommended backend for research and prototyping. Works with any model supported by Hugging Face Transformers. To use FSDP2, set strategy=fsdp2 in your Hydra config.
Megatron-LM — Recommended when you need maximum scalability across many nodes and GPUs. verl currently supports Megatron-LM v0.13.1. Both backends share the same unified worker layer.

Inference backends

vLLM — Stable and well-tested (vLLM 0.8.3 and later). Set VLLM_USE_V1=1 for optimal performance.
SGLang — Under extensive development; recommended for advanced multi-turn and agentic features. Refer to the SGLang Backend documentation for detailed setup steps.
HuggingFace TGI — Suitable for debugging and single-GPU exploration only.

vLLM 0.7.x releases have known instability issues with verl. Use vLLM 0.8.3 or a later release. Set the environment variable VLLM_USE_V1=1 for the best performance with supported models.

Installation

Docker (Recommended)
pip from source

Docker is the fastest and most reliable way to get a fully working verl environment. Starting from v0.6.0, verl publishes application images on top of the official vLLM and SGLang base images. The application images add:

flash_attn
Megatron-LM
Apex
TransformerEngine
DeepEP

All pre-built images are available on Docker Hub at verlai/verl.Available image tags

Tag	Inference backend
`verlai/verl:vllm011.latest`	vLLM
`verlai/verl:sgl055.latest`	SGLang

The latest images used in CI are tracked in the GitHub workflows:

Dockerfiles

Pull the image

Choose the image that matches your preferred inference backend:

docker pull verlai/verl:vllm011.latest

Create and start the container

Replace <image:tag> with the image you pulled:

docker create --runtime=nvidia --gpus all --net=host --shm-size="10g" \
    --cap-add=SYS_ADMIN -v .:/workspace/verl \
    --name verl <image:tag> sleep infinity
docker start verl
docker exec -it verl bash

The flags do the following:

--runtime=nvidia --gpus all — expose all NVIDIA GPUs to the container
--net=host — use host networking (needed for multi-node Ray communication)
--shm-size="10g" — increase shared memory for distributed workloads
--cap-add=SYS_ADMIN — grant elevated privileges needed by some GPU drivers and profiling tools inside the container
-v .:/workspace/verl — mount your local checkout into the container

Install verl inside the container

The pre-built images already include all heavy dependencies. You only need to install verl itself:

git clone https://github.com/verl-project/verl && cd verl
pip3 install --no-deps -e .

If you need to switch between inference frameworks within the same container, install with optional extras instead:

# for vLLM support
pip3 install -e ".[vllm]"
# for SGLang support
pip3 install -e ".[sglang]"

If you cannot use the Docker images, you can install verl in a custom Python environment. The Dockerfiles contain more detail than the instructions below — consult Dockerfile.stable.vllm for reference.

Install CUDA 12.8+ and cuDNN 9.10+

CUDA 12.8 is the minimum supported version. Install it from NVIDIA’s CUDA archive:

wget https://developer.download.nvidia.com/compute/cuda/12.8.1/local_installers/cuda-repo-ubuntu2204-12-8-local_12.8.1-570.124.06-1_amd64.deb
dpkg -i cuda-repo-ubuntu2204-12-8-local_12.8.1-570.124.06-1_amd64.deb
cp /var/cuda-repo-ubuntu2204-12-8-local/cuda-*-keyring.gpg /usr/share/keyrings/
apt-get update
apt-get -y install cuda-toolkit-12-8
update-alternatives --set cuda /usr/local/cuda-12-8

Install cuDNN from NVIDIA’s cuDNN archive:

wget https://developer.download.nvidia.com/compute/cudnn/9.10.2/local_installers/cudnn-local-repo-ubuntu2204-9.10.2_1.0-1_amd64.deb
dpkg -i cudnn-local-repo-ubuntu2204-9.10.2_1.0-1_amd64.deb
cp /var/cudnn-local-repo-ubuntu2204-9.10.2/cudnn-*-keyring.gpg /usr/share/keyrings/
apt-get update
apt-get -y install cudnn-cuda-12

Create a conda environment

Use a fresh conda environment to avoid dependency conflicts. Note that inference frameworks like vLLM may override your PyTorch version if you are not careful.

conda create -n verl python=3.12
conda activate verl

Install inference and training dependencies

Use the provided install script. It installs vLLM, SGLang, and Megatron-LM core together with matching PyTorch versions:

# With Megatron-LM support
bash scripts/install_vllm_sglang_mcore.sh

# FSDP only (skips Megatron-LM)
USE_MEGATRON=0 bash scripts/install_vllm_sglang_mcore.sh

If errors occur, inspect the script and run the steps manually. The inference frameworks can silently downgrade PyTorch — install them first, then install verl.

Optional: NVIDIA Apex (needed only for Megatron-LM):

git clone https://github.com/NVIDIA/apex.git && cd apex
MAX_JOB=32 pip install -v --disable-pip-version-check --no-cache-dir \
    --no-build-isolation \
    --config-settings "--build-option=--cpp_ext" \
    --config-settings "--build-option=--cuda_ext" ./

Apex compilation is slow. Set MAX_JOBS to a reasonable value for your machine — too high a value can exhaust memory and hang the build.

Install verl

Clone the repository and install in editable mode:

git clone https://github.com/verl-project/verl.git
cd verl
pip install --no-deps -e .

Or with optional extras for your chosen inference backend:

pip install -e .[vllm]    # for vLLM
pip install -e .[sglang]  # for SGLang

Verify post-installation package versions

Some packages can be silently downgraded during installation. After setup, verify the following are at the expected versions:

torch and torch series packages
vLLM or SGLang
pyarrow (>= 19.0.0 required)
tensordict (>= 0.8.0, <= 0.10.0, not 0.9.0)
nvidia-cudnn-cu12 (for Megatron-LM backend)

If any are outdated, reinstall the affected package explicitly.

AMD GPU Support (ROCm)

For AMD MI300 GPUs with the ROCm platform, use the dedicated ROCm Dockerfile:

docker/Dockerfile.rocm

Build the image:

docker build -f docker/Dockerfile.rocm -t verl-rocm .

Launch the container:

docker run --rm -it \
  --device /dev/dri \
  --device /dev/kfd \
  -p 8265:8265 \
  --group-add video \
  --cap-add SYS_PTRACE \
  --security-opt seccomp=unconfined \
  --privileged \
  -v $HOME/.ssh:/root/.ssh \
  -v $HOME:$HOME \
  --shm-size 128G \
  -w $PWD \
  verl-rocm \
  /bin/bash

If you need to run as a non-root user, add -e HOST_UID=$(id -u) and -e HOST_GID=$(id -g) to the launch command.

AMD GPU support currently covers FSDP as the training engine, with vLLM and SGLang as inference engines. Megatron-LM support for AMD is planned for a future release.

Get Started

Core Concepts

Algorithms

Workers & Engines

Advanced Usage

Configuration & Reference

Installing verl: Docker, pip, and Backend Setup

Requirements

Choosing Your Backends

Training backends

Inference backends

Installation

AMD GPU Support (ROCm)

Build docs developers (and LLMs) love

Get Started

Core Concepts

Algorithms

Workers & Engines

Advanced Usage

Configuration & Reference

Documentation Index

​Requirements

​Choosing Your Backends

​Training backends

​Inference backends

​Installation

​AMD GPU Support (ROCm)

Build docs developers (and LLMs) love

Requirements

Choosing Your Backends

Training backends

Inference backends

Installation

AMD GPU Support (ROCm)