A collector owns the execution loop that runs a policy inside one or more environments and returns batches of trajectory data as TensorDicts. Rather than writing your own rollout loop, you hand a collector a policy, an environment constructor, and a batch size — it handles stepping, resetting, device movement, weight synchronization, and trajectory packaging. The result is an iterable that emits one TensorDict per iteration, ready to be sent to a replay buffer or consumed directly by an on-policy loss.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/pytorch/rl/llms.txt
Use this file to discover all available pages before exploring further.
Why collectors exist
The alternative to collectors is a hand-rolled loop: callenv.step(), accumulate tensors, move them to the right device, handle episode boundaries, repeat in multiple processes. That loop is easy to get wrong — mismatched devices, missing done-flag handling, synchronization bugs in multiprocess code. Collectors encapsulate all of that so your training loop can focus on the learning update.
The Collector class
Collector is the single-process, single-environment collector. It is the simplest entry point and suitable for local development and on-policy algorithms.
total_frames must be divisible by frames_per_batch. Pass total_frames=-1 to create an endless collector that you break out of manually.Constructor arguments
| Argument | Description |
|---|---|
create_env_fn | Callable that returns an EnvBase instance, or an existing env |
policy | A TensorDictModule or any callable that accepts a TensorDict |
frames_per_batch | Number of transitions emitted per __next__ call |
total_frames | Total transitions before the collector is exhausted (-1 for infinite) |
device | Convenience device for both env and policy; overridden by env_device / policy_device |
env_device | Device on which environment steps are executed |
policy_device | Device on which the policy forward pass runs |
storing_device | Device on which the emitted TensorDict is stored |
max_frames_per_traj | Truncate episodes at this many steps |
compile_policy | Pass True or a dict of kwargs to torch.compile the policy |
cudagraph_policy | Wrap the policy in CUDA graphs for faster inference |
auto_register_policy_transforms | Register any env transforms on the policy automatically |
Using Collector in an on-policy loop
AsyncCollector
AsyncCollector runs the environment loop in a background thread while the main thread processes the previous batch. This overlaps simulation and learning for a modest throughput gain in single-environment settings.
MultiSyncCollector
MultiSyncCollector spawns N worker processes, each running its own copy of the environment. The main process waits for all workers to return a batch before yielding, then broadcasts updated policy weights back. This is the right choice for synchronous on-policy training with large batch sizes.
MultiAsyncCollector
MultiAsyncCollector also runs N worker processes but does not synchronize: workers return batches independently as soon as they are ready. The main process yields the first available batch, making off-policy training pipelines more efficient when simulation is slow and heterogeneous.
AsyncBatchedCollector
AsyncBatchedCollector is similar to MultiAsyncCollector but batches the outputs of multiple environments together before yielding, reducing overhead when each individual env is cheap.
Weight synchronization
All multi-worker collectors exposeupdate_policy_weights_(). Internally TorchRL uses a WeightUpdaterBase to copy parameters from the learner process to worker processes. Several updater implementations are available:
VanillaWeightUpdater
Copies parameters using shared memory or pickle. The default for
MultiSyncCollector and MultiAsyncCollector.MultiProcessedWeightUpdater
Uses a shared-memory tensor dict to broadcast weights without serialization
overhead. Good for large models.
RayWeightUpdater
Syncs weights across Ray workers for distributed training on a Ray cluster.
RemoteModuleWeightUpdater
Pushes weights to a remote
nn.Module over RPC, useful for parameter-server
training styles.Evaluator
Evaluator is a companion class that runs periodic evaluation rollouts (without exploration) in a separate process, reporting metrics without interrupting the main training loop.
Profiling collector workers
ProfileConfig lets you attach a PyTorch profiler to one or more collector workers, saving trace files for performance analysis. Call collector.enable_profile() after construction to activate profiling.
Choosing the right collector
Multi-worker, synchronous, on-policy
Use
MultiSyncCollector when you need a large synchronized batch from many
environments (PPO at scale).Multi-worker, asynchronous, off-policy
Use
MultiAsyncCollector for SAC / TD3 / DQN where workers keep running
while the learner updates.