verl: RL Post-Training Framework for LLMs

verl (Volcano Engine Reinforcement Learning) is an open-source RL training library for large language models (LLMs), implementing the HybridFlow paper. It combines a hybrid programming model with state-of-the-art training and inference backends to make RL post-training both accessible and production-ready.

Installation

Set up verl with Docker or pip, choose your training and inference backends.

Quickstart

Run your first PPO training job on GSM8K in minutes.

HybridFlow Concepts

Understand the programming model that powers verl’s flexibility.

Algorithm Reference

Explore PPO, GRPO, DAPO, RLOO, and the full algorithm library.

Why verl?

verl is designed around three principles: flexibility, efficiency, and production-readiness. Flexible by design. The HybridFlow programming model lets you express complex RL training dataflows — rollout, advantage computation, policy updates — in a few lines of Python. Algorithms like PPO, GRPO, DAPO, RLOO, and ReMax are all first-class citizens. Extending to a new algorithm means implementing a new dataflow, not rewriting the infrastructure. High throughput. verl integrates SOTA training backends (FSDP, FSDP2, Megatron-LM) with SOTA inference backends (vLLM, SGLang). The 3D-HybridEngine reshards actor model weights between training and generation phases to eliminate memory redundancy and reduce communication overhead. Production-ready. verl scales from single-GPU experiments to clusters of hundreds of GPUs, supports models up to 671B parameters with expert parallelism, and runs on NVIDIA, AMD (ROCm), and Ascend hardware.

Key Features

Multiple RL Algorithms

PPO, GRPO, DAPO, RLOO, ReMax, REINFORCE++, SPIN, SPPO, and more — all configurable via YAML.

Flexible Training Backends

FSDP, FSDP2, and Megatron-LM for training, with automatic weight resharding between phases.

Best-in-Class Inference

vLLM and SGLang rollout backends with tensor parallelism, paged attention, and continuous batching.

Multi-turn & Agentic RL

Server-based async rollout, tool calling, LangGraph integration, and multi-turn conversation support.

VLM Support

Vision-language model RL with Qwen2.5-VL, Kimi-VL, and multi-modal reward functions.

Broad Hardware Support

NVIDIA GPUs, AMD ROCm (MI300X/MI325X/MI355X), and Ascend NPUs all supported.

Supported Algorithms

verl provides first-class implementations of the following RL algorithms:

Algorithm	Description
PPO	Proximal Policy Optimization with GAE, critic model, KL control
GRPO	Group Relative Policy Optimization — critic-free, group-based advantage
DAPO	Decoupled Clip and Dynamic Sampling Policy Optimization — SOTA open-source RL
RLOO	REINFORCE Leave-One-Out baseline
ReMax	Reward-maximizing baseline with greedy rollouts
REINFORCE++	Improved REINFORCE with variance reduction
SPIN	Self-play fine-tuning
SPPO	Self-play preference optimization
DrGRPO	GRPO variant eliminating length bias

Getting Started

Install verl

Pull the official Docker image or install from source. See Installation.

Prepare your dataset

Convert your dataset to Parquet format with prompt/answer fields. See Data Preparation.

Implement a reward function

Write a scoring function for your task, or use a pre-built one. See Reward Functions.

Launch training

Run PPO or GRPO with a YAML config. See Quickstart.

Community & Citation

verl is developed by the ByteDance Seed team and maintained by the verl community. It has been adopted by researchers at Alibaba, NVIDIA, UC Berkeley, Tsinghua University, and many others. If you use verl in your research, please cite HybridFlow: A Flexible and Efficient RLHF Framework. Join the community on GitHub, Slack, or WeChat.

Get Started

Core Concepts

Algorithms

Workers & Engines

Advanced Usage

Configuration & Reference

verl: RL Post-Training Framework for LLMs

Installation

Quickstart

HybridFlow Concepts

Algorithm Reference

Why verl?

Key Features

Multiple RL Algorithms

Flexible Training Backends

Best-in-Class Inference

Multi-turn & Agentic RL

VLM Support

Broad Hardware Support

Supported Algorithms

Getting Started

Community & Citation

Build docs developers (and LLMs) love

Get Started

Core Concepts

Algorithms

Workers & Engines

Advanced Usage

Configuration & Reference

Documentation Index

Installation

Quickstart

HybridFlow Concepts

Algorithm Reference

​Why verl?

​Key Features

Multiple RL Algorithms

Flexible Training Backends

Best-in-Class Inference

Multi-turn & Agentic RL

VLM Support

Broad Hardware Support

​Supported Algorithms

​Getting Started

​Community & Citation

Build docs developers (and LLMs) love

Why verl?

Key Features

Supported Algorithms

Getting Started

Community & Citation