Use this file to discover all available pages before exploring further.
Every network in TorchRL is a standard PyTorch nn.Module wrapped with an explicit declaration of which TensorDict keys it reads and which it writes. This key contract makes the data flow of the entire pipeline visible at construction time, lets components be reconfigured without editing network code, and enables collectors, replay buffers, and loss modules to compose with any policy without knowing its architecture. The tensordict.nn package provides TensorDictModule, TensorDictSequential, and TensorDictModuleBase as the building blocks; TorchRL layers such as ProbabilisticActor and ValueOperator inherit from these.
TensorDictModule wraps any nn.Module with in_keys and out_keys. It reads the listed keys from an input TensorDict, calls the wrapped module’s forward, and writes the outputs back under the out_keys.
from tensordict.nn import TensorDictModulefrom torch import nn# A simple MLP that reads "observation" and writes "action".net = TensorDictModule( nn.Sequential(nn.LazyLinear(256), nn.Tanh(), nn.Linear(256, 2)), in_keys=["observation"], out_keys=["action"],)# Forward: td["action"] is populated in-place and returned.td = net(td)print(td["action"].shape) # [B, 2]
ProbabilisticActor combines a parameter network with a distribution class to produce stochastic actions. It is the standard way to build actors for PPO, SAC, REINFORCE, and any other algorithm that needs log-probabilities.
from tensordict.nn import TensorDictModulefrom tensordict.nn.distributions import NormalParamExtractorfrom torch import nnfrom torchrl.modules import ProbabilisticActor, TanhNormal# Step 1: a network that produces distribution parameters.params_net = TensorDictModule( nn.Sequential( nn.LazyLinear(256), nn.Tanh(), nn.Linear(256, 2), # outputs 2 values: loc and log_scale NormalParamExtractor(), # splits into "loc" and "scale" ), in_keys=["observation"], out_keys=["loc", "scale"],)# Step 2: wrap with ProbabilisticActor.actor = ProbabilisticActor( params_net, in_keys=["loc", "scale"], out_keys=["action"], distribution_class=TanhNormal, distribution_kwargs={"low": -1.0, "high": 1.0}, return_log_prob=True, # writes "sample_log_prob" to the TensorDict)# Forward: samples an action and writes log_prob.td = actor(td)print(td["action"].shape) # [B, action_dim]print(td["sample_log_prob"].shape) # [B]
Continuous actions in a bounded range (SAC, TD3, PPO)
IndependentNormal
Unbounded continuous actions
TruncatedNormal
Bounded normal with proper gradient through the boundary
TanhDelta
Deterministic policy wrapped in a TanhNormal for SAC with no noise
OneHotCategorical
Discrete actions (one-hot encoded)
MaskedCategorical
Discrete actions with an action mask
MaskedOneHotCategorical
One-hot discrete with masking
Delta
Deterministic action (Dirac delta)
NormalParamExtractor splits the last output dimension in half and applies softplus to the second half to produce a positive scale. This avoids having to build two separate output heads manually.
ValueOperator wraps a critic nn.Module with the conventional key contract for value functions. By default it reads "observation" and writes "state_value".
For algorithms that share parameters between actor and critic (e.g., A2C), TorchRL provides helper wrappers.
ActorCriticWrapper
ActorValueOperator
ActorCriticOperator
from torchrl.modules import ActorCriticWrapper# Independent actor and critic — no shared parameters.actor_critic = ActorCriticWrapper(actor, critic)td = actor_critic(td)# td["action"], td["sample_log_prob"], td["state_value"] are all written.
from torchrl.modules import ActorCriticOperator# Backbone shared between actor and a state-action critic.# Runs backbone once, then runs actor and critic heads sequentially.operator = ActorCriticOperator( common_operator=backbone, policy_operator=actor_head, value_operator=qvalue_head,)
TorchRL provides GRUModule and LSTMModule as TensorDictModuleBase subclasses. They read and write hidden states by name so they are compatible with replay buffers and collectors without any special handling.
Exploration modules are composable wrappers that inject noise or randomness at data-collection time and are automatically disabled during evaluation (via set_exploration_type).
EGreedyModule
AdditiveGaussianModule
OrnsteinUhlenbeckProcessModule
from torchrl.modules import EGreedyModule# ε-greedy for discrete actions.explorer = EGreedyModule( action_space=env.action_spec, annealing_num_steps=100_000, eps_init=1.0, eps_end=0.05,)policy_explore = TensorDictSequential(qvalue_actor, explorer)
from torchrl.modules import OrnsteinUhlenbeckProcessModule# Temporally correlated OU noise (classic DDPG exploration).ou = OrnsteinUhlenbeckProcessModule( action_spec=env.action_spec, theta=0.15, sigma=0.2, dt=0.01,)
Call set_exploration_type(ExplorationType.DETERMINISTIC) (or MEAN / MODE) before evaluation to disable all noise modules. The exploration mode is propagated to every module in the policy tree automatically.