TorchRL Environments and Transform Pipelines Guide
TorchRL environments use typed specs for observations, actions, and rewards. Learn EnvBase, TransformedEnv, vectorized containers, and the transform API.
Use this file to discover all available pages before exploring further.
Every environment in TorchRL derives from EnvBase, a PyTorch nn.Module subclass that communicates via TensorDict rather than raw tuples. Observations, actions, rewards, and done signals all live under named keys, and each field is described by a spec — a typed, bounded, or discrete descriptor that validates shapes, dtypes, and value ranges before a long training job starts. The same API works whether you are running a single-instance Pendulum environment, a 64-worker MuJoCo farm, or a custom game-engine simulator.
rollout() is a convenience method that runs a full trajectory, optionally driven by a policy. It returns a single TensorDict where the first dimension is time. Supports auto_reset=True for episodic tasks.
from torchrl.envs import check_env_specscheck_env_specs(env)# Raises if shapes, dtypes, or bounds in the specs do not match# what the environment actually produces at runtime.
check_env_specs() performs a fake rollout, compares every tensor against its registered spec, and reports any mismatch. Run this once after building a custom environment or adding transforms.
Specs tell TorchRL (and your code) exactly what to expect from an environment before a single step is taken. They are used to pre-allocate storage, validate outputs, and initialise policy networks lazily.
from torchrl.data import Bounded, Categorical, Composite, Unbounded# Continuous action in [-1, 1]^2action_spec = Bounded(low=-1.0, high=1.0, shape=(2,), dtype=torch.float32)# Discrete action: one of 4 choicesdiscrete_spec = Categorical(n=4)# Composite groups multiple specs under named keys.obs_spec = Composite( observation=Unbounded(shape=(8,), dtype=torch.float32), pixels=Bounded(low=0, high=255, shape=(3, 84, 84), dtype=torch.uint8),)
Every EnvBase subclass exposes four spec properties:
Property
What it describes
observation_spec
All observation fields (maps to full_observation_spec)
action_spec
The action field(s) the environment expects
reward_spec
The reward scalar or vector
done_spec
Done, terminated, and truncated flags
full_observation_spec, full_action_spec, full_reward_spec, and full_done_spec are the canonical Composite specs. The short-hand properties observation_spec, action_spec, reward_spec, and done_spec link to the leaf spec inside the composite for single-key environments.
TransformedEnv wraps any EnvBase with a stack of Transform objects. Transforms can preprocess observations, post-process rewards, convert dtypes, or inject priors. They are applied in order on every step() and reset() call, and each one participates in the spec system — adding or modifying specs so that downstream components always see the correct shapes.
Call env.transform to inspect or modify the transform stack. Individual transforms can be inserted or removed without re-wrapping the base environment.
RewardScaling — multiply/shift reward RewardClipping — clip to a range RewardSum — cumulative reward tracking BinarizeReward — convert to
Action transforms
ActionScaling — map to a different range ActionDiscretizer — discretize continuous actions FlattenAction — flatten multi-head actions ActionMask — mask out illegal actions
Episode & timing
StepCounter — count steps per episode TrajCounter — count completed trajectories FrameSkipTransform — repeat actions AutoResetTransform — auto-reset on done
Always call check_env_specs(env) after implementing a custom environment. Mismatches between what _step() returns and what the specs declare are one of the most common sources of silent training bugs.
SerialEnv runs N environments sequentially in a single process. Useful for testing or when environment stepping is cheap. The batch dimension of all returned TensorDicts gains a leading [N] axis.
ParallelEnv runs N environments in separate worker processes. It exposes the same API as SerialEnv but uses multiprocessing to execute environment steps in parallel, which is valuable when simulation is the throughput bottleneck.
ParallelEnv serializes the create_env_fn callable and sends it to worker processes. Lambdas that close over unpicklable objects (GPU tensors, open file handles) should be replaced with a picklable callable or use EnvCreator.
TorchRL includes ModelBasedEnvBase and DreamerEnv / WorldModelEnv for model-based RL workflows where the environment is a learned neural network. They share the same EnvBase API so policies and collectors work without changes.
from torchrl.envs import GymEnv, WorldModelEnvfrom torchrl.modules import WorldModel# WorldModelEnv wraps a WorldModel and a reference env as an EnvBase.# base_env is only used for its specs (action, reward, done) — it is not stepped.base_env = GymEnv("Pendulum-v1")imagined_env = WorldModelEnv( world_model=world_model, # a torchrl.modules.WorldModel instance base_env=base_env, # reference env for specs batch_size=[4],)rollout = imagined_env.rollout(max_steps=15, policy=actor)