TorchRL is an open-source reinforcement learning library built on PyTorch. It provides composable, reusable primitives for building RL systems — from local prototypes to distributed, multi-agent, and model-based workflows — all unified by the TensorDict data model.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/pytorch/rl/llms.txt
Use this file to discover all available pages before exploring further.
Introduction
Learn TorchRL’s core philosophy, the TensorDict data model, and how the library fits together.
Quickstart
Train your first RL agent in minutes with a working end-to-end example.
Installation
Install TorchRL via pip, conda, or from source with optional CUDA wheels.
API Reference
Explore the full API — environments, collectors, buffers, modules, and objectives.
Core Building Blocks
TorchRL structures every RL interaction as a TensorDict passing through independent, swappable modules:Environments
Native envs, third-party wrappers, vectorized containers, and transform pipelines.
Collectors
Single-process to distributed trajectory collectors with async weight sync.
Replay Buffers
Modular storage, prioritized sampling, HER, memmap, and offline datasets.
Modules & Policies
Actors, critics, recurrent modules, exploration strategies, and distributions.
Objectives & Losses
PPO, SAC, DQN, TD3, MAPPO, Dreamer, GRPO, and 15+ more loss modules.
TensorDict Model
The shared data model that makes every component composable.
Get Started in 4 Steps
Tutorials
First Training Loop
Build a complete PPO training loop from scratch, step by step.
Custom Environment
Wrap any simulation as a TorchRL environment with specs and transforms.
Multi-Agent Training
Train cooperative and competitive multi-agent systems with MAPPO and IPPO.
Recurrent Policies
Use LSTM and GRU modules with the recurrent state lifecycle.
Distributed Training
Scale data collection across multiple processes and machines.
LLM Post-Training
Fine-tune language models with GRPO using TorchRL’s LLM stack.