TorchRL: PyTorch-Native Reinforcement Learning Library

TorchRL is an open-source reinforcement learning library built on PyTorch. It provides composable, reusable primitives for building RL systems — from local prototypes to distributed, multi-agent, and model-based workflows — all unified by the TensorDict data model.

Introduction

Learn TorchRL’s core philosophy, the TensorDict data model, and how the library fits together.

Quickstart

Train your first RL agent in minutes with a working end-to-end example.

Installation

Install TorchRL via pip, conda, or from source with optional CUDA wheels.

API Reference

Explore the full API — environments, collectors, buffers, modules, and objectives.

Core Building Blocks

TorchRL structures every RL interaction as a TensorDict passing through independent, swappable modules:

Environments

Native envs, third-party wrappers, vectorized containers, and transform pipelines.

Collectors

Single-process to distributed trajectory collectors with async weight sync.

Replay Buffers

Modular storage, prioritized sampling, HER, memmap, and offline datasets.

Modules & Policies

Actors, critics, recurrent modules, exploration strategies, and distributions.

Objectives & Losses

PPO, SAC, DQN, TD3, MAPPO, Dreamer, GRPO, and 15+ more loss modules.

TensorDict Model

The shared data model that makes every component composable.

Get Started in 4 Steps

Install TorchRL

pip install torchrl

Create an environment

from torchrl.envs import GymEnv, TransformedEnv, StepCounter

env = TransformedEnv(GymEnv("Pendulum-v1"), StepCounter(max_steps=200))
env.check_env_specs()

Build a policy and collect data

from tensordict.nn import TensorDictModule
from torch import nn
from torchrl.collectors import Collector

policy = TensorDictModule(
    nn.Sequential(nn.LazyLinear(64), nn.Tanh(), nn.Linear(64, 1)),
    in_keys=["observation"], out_keys=["action"],
)
collector = Collector(env, policy, frames_per_batch=1000, total_frames=50_000)

Train with a loss module

from torchrl.objectives import ClipPPOLoss
from torchrl.objectives.value import GAE

advantage = GAE(value_network=critic, gamma=0.99, lmbda=0.95)
loss = ClipPPOLoss(actor_network=policy, critic_network=critic)

for batch in collector:
    batch = advantage(batch)
    loss_vals = loss(batch)

Tutorials

First Training Loop

Build a complete PPO training loop from scratch, step by step.

Custom Environment

Wrap any simulation as a TorchRL environment with specs and transforms.

Multi-Agent Training

Train cooperative and competitive multi-agent systems with MAPPO and IPPO.

Recurrent Policies

Use LSTM and GRU modules with the recurrent state lifecycle.

Distributed Training

Scale data collection across multiple processes and machines.

LLM Post-Training

Fine-tune language models with GRPO using TorchRL’s LLM stack.

Getting Started

Core Concepts

Tutorials

Advanced Topics

TorchRL: PyTorch-Native Reinforcement Learning Library

Introduction

Quickstart

Installation

API Reference

Core Building Blocks

Environments

Collectors

Replay Buffers

Modules & Policies

Objectives & Losses

TensorDict Model

Get Started in 4 Steps

Tutorials

First Training Loop

Custom Environment

Multi-Agent Training

Recurrent Policies

Distributed Training

LLM Post-Training

Build docs developers (and LLMs) love

Getting Started

Core Concepts

Tutorials

Advanced Topics

Documentation Index

Introduction

Quickstart

Installation

API Reference

​Core Building Blocks

Environments

Collectors

Replay Buffers

Modules & Policies

Objectives & Losses

TensorDict Model

​Get Started in 4 Steps

​Tutorials

First Training Loop

Custom Environment

Multi-Agent Training

Recurrent Policies

Distributed Training

LLM Post-Training

Build docs developers (and LLMs) love

Core Building Blocks

Get Started in 4 Steps

Tutorials