BatchSizeTransform¶
- class torchrl.envs.transforms.BatchSizeTransform(*, batch_size: torch.Size | None = None, reshape_fn: Callable[[TensorDictBase], TensorDictBase] | None = None, reset_func: Callable[[TensorDictBase, TensorDictBase], TensorDictBase] | None = None, env_kwarg: bool = False)[源代码]¶
用于修改环境批大小的转换。
此转换有两种截然不同的用法:它可以用于设置非批次锁定(例如无状态)环境的批大小,以启用使用数据收集器进行数据收集。它还可以用于修改环境的批大小(例如压缩、解压缩或重塑)。
此转换会修改环境批大小以匹配提供的批大小。它期望父环境的批大小可扩展到提供的批大小。
- 关键字参数:
batch_size (torch.Size 或 等效项,可选) – 环境的新批大小。与
reshape_fn
互斥。reshape_fn (可调用对象,可选) –
用于修改环境批大小的可调用对象。与
batch_size
互斥。注意
目前,支持涉及
reshape
、flatten
、unflatten
、squeeze
和unsqueeze
的转换。如果需要其他重塑操作,请在 TorchRL github 上提交功能请求。reset_func (可调用对象,可选) – 生成重置 tensordict 的函数。签名必须匹配
Callable[[TensorDictBase, TensorDictBase], TensorDictBase]
,其中第一个输入参数是在调用reset()
时传递给环境的可选 tensordict,第二个参数是TransformedEnv.base_env.reset
的输出。如果env_kwarg=True
,它也可以支持可选的env
关键字参数。env_kwarg (布尔值,可选) – 如果为
True
,则reset_func
必须支持env
关键字参数。默认为False
。传递的 env 将是伴随其转换的 env。
示例
>>> # Changing the batch-size with a function >>> from torchrl.envs import GymEnv >>> base_env = GymEnv("CartPole-v1") >>> env = TransformedEnv(base_env, BatchSizeTransform(reshape_fn=lambda data: data.reshape(1, 1))) >>> env.rollout(4) >>> # Setting the shape of a stateless environment >>> class MyEnv(EnvBase): ... batch_locked = False ... def __init__(self): ... super().__init__() ... self.observation_spec = CompositeSpec(observation=UnboundedContinuousTensorSpec(3)) ... self.reward_spec = UnboundedContinuousTensorSpec(1) ... self.action_spec = UnboundedContinuousTensorSpec(1) ... ... def _reset(self, tensordict: TensorDictBase, **kwargs) -> TensorDictBase: ... tensordict_batch_size = tensordict.batch_size if tensordict is not None else torch.Size([]) ... result = self.observation_spec.rand(tensordict_batch_size) ... result.update(self.full_done_spec.zero(tensordict_batch_size)) ... return result ... ... def _step( ... self, ... tensordict: TensorDictBase, ... ) -> TensorDictBase: ... result = self.observation_spec.rand(tensordict.batch_size) ... result.update(self.full_done_spec.zero(tensordict.batch_size)) ... result.update(self.full_reward_spec.zero(tensordict.batch_size)) ... return result ... ... def _set_seed(self, seed: Optional[int]): ... pass ... >>> env = TransformedEnv(MyEnv(), BatchSizeTransform([5])) >>> assert env.batch_size == torch.Size([5]) >>> assert env.rollout(10).shape == torch.Size([5, 10])
reset_func
可以创建具有所需批大小的 tensordict,从而允许进行细粒度的重置调用>>> def reset_func(tensordict, tensordict_reset, env): ... result = env.observation_spec.rand() ... result.update(env.full_done_spec.zero()) ... assert result.batch_size != torch.Size([]) ... return result >>> env = TransformedEnv(MyEnv(), BatchSizeTransform([5], reset_func=reset_func, env_kwarg=True)) >>> print(env.rollout(2)) TensorDict( fields={ action: Tensor(shape=torch.Size([5, 2, 1]), device=cpu, dtype=torch.float32, is_shared=False), done: Tensor(shape=torch.Size([5, 2, 1]), device=cpu, dtype=torch.bool, is_shared=False), next: TensorDict( fields={ done: Tensor(shape=torch.Size([5, 2, 1]), device=cpu, dtype=torch.bool, is_shared=False), observation: Tensor(shape=torch.Size([5, 2, 3]), device=cpu, dtype=torch.float32, is_shared=False), reward: Tensor(shape=torch.Size([5, 2, 1]), device=cpu, dtype=torch.float32, is_shared=False), terminated: Tensor(shape=torch.Size([5, 2, 1]), device=cpu, dtype=torch.bool, is_shared=False)}, batch_size=torch.Size([5, 2]), device=None, is_shared=False), observation: Tensor(shape=torch.Size([5, 2, 3]), device=cpu, dtype=torch.float32, is_shared=False), terminated: Tensor(shape=torch.Size([5, 2, 1]), device=cpu, dtype=torch.bool, is_shared=False)}, batch_size=torch.Size([5, 2]), device=None, is_shared=False)
此转换可用于在数据收集器中部署非批次锁定环境
>>> from torchrl.collectors import SyncDataCollector >>> collector = SyncDataCollector(env, lambda td: env.rand_action(td), frames_per_batch=10, total_frames=-1) >>> for data in collector: ... print(data) ... break TensorDict( fields={ action: Tensor(shape=torch.Size([5, 2, 1]), device=cpu, dtype=torch.float32, is_shared=False), collector: TensorDict( fields={ traj_ids: Tensor(shape=torch.Size([5, 2]), device=cpu, dtype=torch.int64, is_shared=False)}, batch_size=torch.Size([5, 2]), device=None, is_shared=False), done: Tensor(shape=torch.Size([5, 2, 1]), device=cpu, dtype=torch.bool, is_shared=False), next: TensorDict( fields={ done: Tensor(shape=torch.Size([5, 2, 1]), device=cpu, dtype=torch.bool, is_shared=False), observation: Tensor(shape=torch.Size([5, 2, 3]), device=cpu, dtype=torch.float32, is_shared=False), reward: Tensor(shape=torch.Size([5, 2, 1]), device=cpu, dtype=torch.float32, is_shared=False), terminated: Tensor(shape=torch.Size([5, 2, 1]), device=cpu, dtype=torch.bool, is_shared=False)}, batch_size=torch.Size([5, 2]), device=None, is_shared=False), observation: Tensor(shape=torch.Size([5, 2, 3]), device=cpu, dtype=torch.float32, is_shared=False), terminated: Tensor(shape=torch.Size([5, 2, 1]), device=cpu, dtype=torch.bool, is_shared=False)}, batch_size=torch.Size([5, 2]), device=None, is_shared=False) >>> collector.shutdown()
- forward(tensordict: TensorDictBase) TensorDictBase ¶
读取输入 tensordict,并针对选定的键应用转换。
- transform_input_spec(input_spec: CompositeSpec) CompositeSpec [源代码]¶
转换输入规范,使生成的规范匹配转换映射。
- 参数:
input_spec (TensorSpec) – 转换前的规范
- 返回值:
转换后的预期规范
- transform_output_spec(output_spec: CompositeSpec) CompositeSpec [source]¶
转换输出规范,使结果规范与转换映射匹配。
此方法通常应保持不变。更改应使用
transform_observation_spec()
、transform_reward_spec()
和transformfull_done_spec()
实现。 :param output_spec: 变换前的规范 :type output_spec: TensorSpec- 返回值:
转换后的预期规范