TorchRL makes it straightforward to run many environment instances at once. All vectorised environment classes derive fromDocumentation Index
Fetch the complete documentation index at: https://mintlify.com/pytorch/rl/llms.txt
Use this file to discover all available pages before exploring further.
BatchedEnvBase, which itself extends EnvBase, so they honour the identical reset / step / rollout / check_env_specs interface. Observations, actions, rewards, and done flags are stacked along a leading batch dimension and returned as a single TensorDict, keeping downstream policy and training code agnostic to whether it is talking to one environment or a thousand.
All classes documented on this page are importable from
torchrl.envs:The create_env_fn Pattern
Every BatchedEnvBase subclass accepts a create_env_fn argument — a callable (or list of callables) that returns a new EnvBase instance each time it is invoked. Workers call this factory to instantiate their private copy of the environment without sharing state.
BatchedEnvBase
BatchedEnvBase is the abstract parent of SerialEnv and ParallelEnv. It initialises shared infrastructure — spec negotiation across workers, shared memory or memory-map buffers, and metadata caching — before the workers are started.
Constructor Parameters
SerialEnv
SerialEnv creates and steps all environment instances sequentially within the same process. It shares the BatchedEnvBase interface but incurs no IPC overhead, making it the right choice for lightweight environments, debugging, or when the GIL prevents true parallelism anyway.
TensorDict has a leading dimension of num_workers.
Key Characteristics
- No serialisation cost — environments live in the same process.
- Easy to debug — standard Python breakpoints and profilers work.
- No shared memory required — each env writes to its own tensor.
- Sequential execution — one environment completes before the next starts; no speedup on multi-core machines.
ParallelEnv
ParallelEnv spawns one subprocess per worker and exchanges data via shared memory (shared_memory=True) or memory-mapped files (memmap=True). The main process sends action TensorDicts to all workers simultaneously, waits for results, and stacks them.
Start Method
By default, TorchRL selects"spawn" on macOS / Windows and "fork" on Linux. Override with mp_start_method:
Worker Timeout
Workers that are idle for more thanBATCHED_PIPE_TIMEOUT seconds are considered dead and raise. Control this via the environment variable:
Configuring Parallel Execution After Construction
Useconfigure_parallel to adjust worker parameters before the environment is started (before the first reset / step):
EnvCreator
EnvCreator wraps an arbitrary callable so it can be safely pickled and sent to worker subprocesses. It is the recommended replacement for lambdas in multiprocessing contexts. When the factory builds a TransformedEnv with VecNorm, EnvCreator also wires up the shared-memory pointers so all workers share the same running statistics.
get_env_metadata
get_env_metadata constructs an EnvMetaData snapshot from a factory function without keeping the environment alive. Useful when you want to inspect specs before committing to launching workers.
Async Environment Pools
For use cases that need even finer control over execution scheduling, TorchRL provides three async pool classes:AsyncEnvPool
Abstract base for asynchronous environment pools. Manages a pool of workers that accept step requests without blocking and deliver results when ready.
ProcessorAsyncEnvPool
An AsyncEnvPool backed by a multiprocessing.Pool. Each worker runs in its own process.
ThreadingAsyncEnvPool
An AsyncEnvPool backed by a ThreadPoolExecutor. All workers run in the same process using threads. Best suited for I/O-bound or GIL-releasing environments (e.g., environments with native C extensions).
TensorDict Structure from Vectorised Envs
Whennum_workers=N, every TensorDict returned by reset or step has a leading batch dimension of N. Rollouts add an additional time dimension, producing shape [N, T].
"time" dimension name is attached to the last rollout dimension, enabling named-dim operations:
Partial Resets
Whenbreak_when_any_done=False is passed to rollout, done environments are reset automatically while others continue stepping. This partial-reset mode is the standard for on-policy data collection with vectorised environments.
Code Examples
- SerialEnv
- ParallelEnv
- Mixed Environments
Tips and Common Pitfalls
Always check specs first
Run
check_env_specs(make_env()) on a single environment before wrapping in ParallelEnv. Shared-memory buffer sizes are fixed at construction; a spec mismatch causes obscure worker crashes.Use EnvCreator for lambdas
Lambdas cannot be pickled by the default
pickle module. Wrap them in EnvCreator or define the factory at module level so multiprocessing can serialise it.fork vs spawn
On Linux,
mp_start_method="fork" is fastest. On macOS and Windows use "spawn". Never use "fork" with CUDA — it corrupts GPU state in child processes.Partial resets for data collection
Pass
break_when_any_done=False to rollout so individual done environments auto-reset while others continue, giving you a guaranteed [N, T] shaped batch every time.