TorchRL models every reinforcement-learning environment as a subclass ofDocumentation Index
Fetch the complete documentation index at: https://mintlify.com/pytorch/rl/llms.txt
Use this file to discover all available pages before exploring further.
EnvBase, which extends torch.nn.Module and exposes a consistent TensorDict-in / TensorDict-out interface. All observations, actions, rewards, and done flags are packed into TensorDict objects, making it straightforward to move data between devices, batch across environments, and compose transforms without any environment-specific boilerplate. This page covers the full public API of EnvBase, the companion spec classes that declare data shapes and dtypes, the GymLikeEnv adapter for gym-compatible backends, and the EnvCreator / EnvMetaData utilities used by vectorised environments.
Every method documented here lives in
torchrl.envs. You can import any symbol directly from the top-level package: from torchrl.envs import EnvBase, GymEnv, check_env_specs.EnvBase
EnvBase is the abstract base class for all TorchRL environments. It inherits from torch.nn.Module, so parameters and buffers follow standard PyTorch conventions. Concrete environments implement the private _reset and _step methods; the public reset / step wrappers add validation, spec-locking, and housekeeping that should never be overridden.
Constructor
Core Methods
reset(tensordict=None, *, set_state=None, **kwargs) → TensorDictBase
Resets the environment and returns a TensorDict populated with initial observations. The public reset should not be overridden — implement _reset in subclasses instead.
step(tensordict) → TensorDictBase
Executes one environment step. The input tensordict must contain the action under the key(s) declared by env.action_spec. Results land in the "next" sub-TensorDict of the returned object.
rollout(max_steps, policy=None, *, auto_reset=True, break_when_any_done=True, ...) → TensorDictBase
Runs a full trajectory (up to max_steps) and returns the collected data as a single stacked TensorDict with a time dimension.
rand_step(tensordict=None) → TensorDictBase
Sample a random action from action_spec and execute one step.
check_env_specs(*args, **kwargs)
Verify that the environment’s specs are internally consistent and that actual reset / step outputs match the declared specs. Raises on any mismatch.
close(*, raise_if_closed=True)
Release all resources held by the environment (file handles, subprocess workers, GPU memory). After calling close, the environment is marked as closed and subsequent calls to step or reset will raise.
fake_tensordict() → TensorDictBase
Return a zero-filled TensorDict whose structure exactly matches what reset / step would produce. Useful for spec validation and pre-allocating buffers.
Spec Attributes
TorchRL uses specs to declare the shape, dtype, domain, and device of every tensor in the environment interface. Specs areTensorSpec instances stored on the environment and locked after construction.
| Attribute | Type | Description |
|---|---|---|
observation_spec | Composite | Full specification of all observation tensors. Alias for full_observation_spec. |
action_spec | TensorSpec | Leaf spec when there is a single action tensor; otherwise full_action_spec. |
reward_spec | TensorSpec | Leaf spec when there is a single reward; otherwise full_reward_spec. |
done_spec | Composite | Composite spec containing at minimum "done" and "terminated" leaves. |
state_spec | Composite | Inputs that are not actions (e.g., hidden states). |
full_action_spec | Composite | Complete composite of all action entries. |
full_observation_spec | Composite | Complete composite of all observation entries. |
full_reward_spec | Composite | Complete composite of all reward entries. |
full_done_spec | Composite | Complete composite of all done entries. |
full_state_spec | Composite | Complete composite of all state inputs. |
Spec Classes
Specs describe the domain of each tensor in the environment. They live intorchrl.data but are re-exported through torchrl.envs.
Composite
A dictionary-like container that groups multiple named specs. Mirrors TensorDict for specs.
Bounded
A continuous or discrete spec with explicit lower and upper bounds.
Unbounded
A continuous spec with no range constraint. Used for most observations and rewards.
Categorical
An integer-valued spec representing a categorical action or observation with n possible values.
OneHot
Like Categorical but samples are one-hot encoded boolean tensors of shape (..., n).
GymLikeEnv
GymLikeEnv is an intermediate abstract class that sits between EnvBase and concrete wrappers around gym-style backends (Gymnasium, DM Control, etc.). It standardises how info dictionaries from the underlying environment are ingested.
- Implements
_stepby calling the underlying_gym_stepand unpacking(obs, reward, terminated, truncated, info)tuples. - Supports
set_info_dict_reader(info_dict_reader)to attach a customdefault_info_dict_readerthat mapsinfokeys into the outputTensorDict. frame_skipparameter repeats actions and accumulates rewards automatically.
default_info_dict_reader
A callable that maps selected keys from the environment info dict into the output TensorDict.
EnvMetaData
EnvMetaData is a lightweight serialisable snapshot of an environment’s specs, batch size, device, and a sample TensorDict. It is used internally by ParallelEnv and SerialEnv to propagate environment metadata to worker processes without instantiating the full environment in the main process.
tensordict, specs, batch_size, device, batch_locked, supports_set_state.
EnvCreator and get_env_metadata
EnvCreator wraps a callable environment factory so that it can be safely pickled and sent to subprocess workers. When the factory uses a VecNorm transform, EnvCreator also wires up the shared-memory pointers so all workers stay synchronised.
get_env_metadata(env_fn, **kwargs) constructs EnvMetaData from a factory without requiring EnvCreator: