Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/pytorch/rl/llms.txt

Use this file to discover all available pages before exploring further.

TorchRL is an open-source reinforcement learning library built on PyTorch. It provides composable, reusable primitives for building RL systems — from local prototypes to distributed, multi-agent, and model-based workflows — all unified by the TensorDict data model.

Introduction

Learn TorchRL’s core philosophy, the TensorDict data model, and how the library fits together.

Quickstart

Train your first RL agent in minutes with a working end-to-end example.

Installation

Install TorchRL via pip, conda, or from source with optional CUDA wheels.

API Reference

Explore the full API — environments, collectors, buffers, modules, and objectives.

Core Building Blocks

TorchRL structures every RL interaction as a TensorDict passing through independent, swappable modules:

Environments

Native envs, third-party wrappers, vectorized containers, and transform pipelines.

Collectors

Single-process to distributed trajectory collectors with async weight sync.

Replay Buffers

Modular storage, prioritized sampling, HER, memmap, and offline datasets.

Modules & Policies

Actors, critics, recurrent modules, exploration strategies, and distributions.

Objectives & Losses

PPO, SAC, DQN, TD3, MAPPO, Dreamer, GRPO, and 15+ more loss modules.

TensorDict Model

The shared data model that makes every component composable.

Get Started in 4 Steps

1

Install TorchRL

pip install torchrl
2

Create an environment

from torchrl.envs import GymEnv, TransformedEnv, StepCounter

env = TransformedEnv(GymEnv("Pendulum-v1"), StepCounter(max_steps=200))
env.check_env_specs()
3

Build a policy and collect data

from tensordict.nn import TensorDictModule
from torch import nn
from torchrl.collectors import Collector

policy = TensorDictModule(
    nn.Sequential(nn.LazyLinear(64), nn.Tanh(), nn.Linear(64, 1)),
    in_keys=["observation"], out_keys=["action"],
)
collector = Collector(env, policy, frames_per_batch=1000, total_frames=50_000)
4

Train with a loss module

from torchrl.objectives import ClipPPOLoss
from torchrl.objectives.value import GAE

advantage = GAE(value_network=critic, gamma=0.99, lmbda=0.95)
loss = ClipPPOLoss(actor_network=policy, critic_network=critic)

for batch in collector:
    batch = advantage(batch)
    loss_vals = loss(batch)

Tutorials

First Training Loop

Build a complete PPO training loop from scratch, step by step.

Custom Environment

Wrap any simulation as a TorchRL environment with specs and transforms.

Multi-Agent Training

Train cooperative and competitive multi-agent systems with MAPPO and IPPO.

Recurrent Policies

Use LSTM and GRU modules with the recurrent state lifecycle.

Distributed Training

Scale data collection across multiple processes and machines.

LLM Post-Training

Fine-tune language models with GRPO using TorchRL’s LLM stack.

Build docs developers (and LLMs) love