TorchRL is built on a single unifying idea: every piece of data in the training loop — observations, actions, rewards, recurrent states, priorities, agent groupings — lives inside a TensorDict. TensorDict is a dictionary-like tensor container that supports PyTorch operations, device transfers, shared-memory storage, memmaps, lazy views, andDocumentation Index
Fetch the complete documentation index at: https://mintlify.com/pytorch/rl/llms.txt
Use this file to discover all available pages before exploring further.
nn.Module wrappers. Rather than passing parallel Python lists or positional tuples between components, TorchRL threads one structured object through the entire pipeline so that environments, collectors, replay buffers, and loss modules can all consume and produce the same type without any glue code.
What is TensorDict?
ATensorDict is essentially a dict[str | tuple[str, ...], Tensor] that knows its own batch dimensions and device. Every value shares the same leading batch shape; individual tensors may have additional trailing dimensions. The container ships with a full suite of PyTorch-like operations so that code that used to work on raw tensors works on TensorDicts with almost no changes.
TensorDict is a separate PyTorch library (
tensordict) that TorchRL depends on. It is maintained at github.com/pytorch/tensordict and can be used independently of TorchRL.Key operations
Because TensorDict mirrors the PyTorch tensor API, all of the following operations preserve the internal field structure and batch dimensions automatically.Nested keys and structured data
TorchRL uses nested keys — tuples of strings — to represent structured sub-fields. This single convention handles multi-agent data, recurrent hidden states, and next-step observations without any schema changes or special-casing in component code.("next", "observation"), ("next", "reward"), and ("next", "done") as a matter of convention. Loss modules read those same keys. Nothing needs to be told what shape is coming — the TensorDict carries it.
next state convention
Next observations and rewards always live under the
"next" sub-key, making
multi-step transitions, value bootstrapping, and n-step returns unambiguous.agent grouping
Multi-agent environments place per-agent data under a group key such as
"agents". Losses and modules can target the right sub-tree without changing
any other code.recurrent states
Hidden states for GRUs and LSTMs are stored by name so they survive replay
buffer round-trips and can be properly zero-initialised at episode starts.
custom fields
Any algorithm can attach task-specific tensors (e.g.
"advantage",
"td_error", "goal") and they flow through unchanged unless a transform
or loss removes them.TensorDict as the composability backbone
The reason TorchRL components compose so naturally is that each one only cares about the keys it declared — everything else is passed through untouched. The pipeline looks like this:A complete rollout example
The following snippet is taken directly from the TorchRL README. It shows a full rollout from aTransformedEnv into a batched TensorDict — no unpacking required.
rollout is a single TensorDict with batch size [32]. Every step’s observation, action, reward, done flag, and step count are aligned at the same index. You can index, slice, or stack it just like a tensor.
Batch operations on collected data
When a collector returns data, the same tensor-level operations work on the entire TensorDict:TensorDictModule: modules with explicit key contracts
TensorDictModule wraps any nn.Module with explicit in_keys and out_keys. This makes the data contract of every network layer visible at construction time rather than buried in a forward signature.
ProbabilisticActor, ValueOperator, loss modules — follow this same pattern. See the Modules & Policies page for details.