JumanjiEnv¶
- torchrl.envs.JumanjiEnv(*args, **kwargs)[源代码]¶
使用环境名称构建的 Jumanji 环境包装器。
Jumanji 提供了一个基于 Jax 的向量化仿真框架。TorchRL 的包装器会为 Jax 到 Torch 的转换带来一些开销,但仍然可以在模拟轨迹之上构建计算图,允许通过回放进行反向传播。
GitHub:https://github.com/instadeepai/jumanji
文档:https://instadeepai.github.io/jumanji/
论文:https://arxiv.org/abs/2306.09884
- 参数:
env_name (str) – 要包装的环境名称。必须是
available_envs
的一部分。categorical_action_encoding (bool, 可选) – 如果为
True
,则将类别规范转换为 TorchRL 等效规范 (torchrl.data.DiscreteTensorSpec
),否则将使用独热编码 (torchrl.data.OneHotTensorSpec
)。默认值为False
。
- 关键字参数:
from_pixels (bool, 可选) – 尚未支持。
frame_skip (int, 可选) – 如果提供,则表示要重复相同操作的步数。返回的观察结果将是序列的最后观察结果,而奖励将是跨步数的奖励之和。
device (torch.device, 可选) – 如果提供,则为要将数据转换到的设备。默认值为
torch.device("cpu")
。batch_size (torch.Size, 可选) – 环境的批次大小。使用
jumanji
时,这表示向量化环境的数量。默认值为torch.Size([])
。allow_done_after_reset (bool, 可选) – 如果为
True
,则允许环境在调用reset()
后立即处于done
状态。默认值为False
。
- 变量:
available_envs – 可用于构建的环境
示例
>>> from torchrl.envs import JumanjiEnv >>> env = JumanjiEnv("Snake-v1") >>> env.set_seed(0) >>> td = env.reset() >>> td["action"] = env.action_spec.rand() >>> td = env.step(td) >>> print(td) TensorDict( fields={ action: Tensor(shape=torch.Size([]), device=cpu, dtype=torch.int32, is_shared=False), action_mask: Tensor(shape=torch.Size([4]), device=cpu, dtype=torch.bool, is_shared=False), done: Tensor(shape=torch.Size([1]), device=cpu, dtype=torch.bool, is_shared=False), grid: Tensor(shape=torch.Size([12, 12, 5]), device=cpu, dtype=torch.float32, is_shared=False), next: TensorDict( fields={ action_mask: Tensor(shape=torch.Size([4]), device=cpu, dtype=torch.bool, is_shared=False), done: Tensor(shape=torch.Size([1]), device=cpu, dtype=torch.bool, is_shared=False), grid: Tensor(shape=torch.Size([12, 12, 5]), device=cpu, dtype=torch.float32, is_shared=False), reward: Tensor(shape=torch.Size([1]), device=cpu, dtype=torch.float32, is_shared=False), state: TensorDict( fields={ action_mask: Tensor(shape=torch.Size([4]), device=cpu, dtype=torch.bool, is_shared=False), body: Tensor(shape=torch.Size([12, 12]), device=cpu, dtype=torch.bool, is_shared=False), body_state: Tensor(shape=torch.Size([12, 12]), device=cpu, dtype=torch.int32, is_shared=False), fruit_position: TensorDict( fields={ col: Tensor(shape=torch.Size([]), device=cpu, dtype=torch.int32, is_shared=False), row: Tensor(shape=torch.Size([]), device=cpu, dtype=torch.int32, is_shared=False)}, batch_size=torch.Size([]), device=cpu, is_shared=False), head_position: TensorDict( fields={ col: Tensor(shape=torch.Size([]), device=cpu, dtype=torch.int32, is_shared=False), row: Tensor(shape=torch.Size([]), device=cpu, dtype=torch.int32, is_shared=False)}, batch_size=torch.Size([]), device=cpu, is_shared=False), key: Tensor(shape=torch.Size([2]), device=cpu, dtype=torch.int32, is_shared=False), length: Tensor(shape=torch.Size([]), device=cpu, dtype=torch.int32, is_shared=False), step_count: Tensor(shape=torch.Size([]), device=cpu, dtype=torch.int32, is_shared=False), tail: Tensor(shape=torch.Size([12, 12]), device=cpu, dtype=torch.bool, is_shared=False)}, batch_size=torch.Size([]), device=cpu, is_shared=False), step_count: Tensor(shape=torch.Size([]), device=cpu, dtype=torch.int32, is_shared=False), terminated: Tensor(shape=torch.Size([1]), device=cpu, dtype=torch.bool, is_shared=False)}, batch_size=torch.Size([]), device=cpu, is_shared=False), state: TensorDict( fields={ action_mask: Tensor(shape=torch.Size([4]), device=cpu, dtype=torch.bool, is_shared=False), body: Tensor(shape=torch.Size([12, 12]), device=cpu, dtype=torch.bool, is_shared=False), body_state: Tensor(shape=torch.Size([12, 12]), device=cpu, dtype=torch.int32, is_shared=False), fruit_position: TensorDict( fields={ col: Tensor(shape=torch.Size([]), device=cpu, dtype=torch.int32, is_shared=False), row: Tensor(shape=torch.Size([]), device=cpu, dtype=torch.int32, is_shared=False)}, batch_size=torch.Size([]), device=cpu, is_shared=False), head_position: TensorDict( fields={ col: Tensor(shape=torch.Size([]), device=cpu, dtype=torch.int32, is_shared=False), row: Tensor(shape=torch.Size([]), device=cpu, dtype=torch.int32, is_shared=False)}, batch_size=torch.Size([]), device=cpu, is_shared=False), key: Tensor(shape=torch.Size([2]), device=cpu, dtype=torch.int32, is_shared=False), length: Tensor(shape=torch.Size([]), device=cpu, dtype=torch.int32, is_shared=False), step_count: Tensor(shape=torch.Size([]), device=cpu, dtype=torch.int32, is_shared=False), tail: Tensor(shape=torch.Size([12, 12]), device=cpu, dtype=torch.bool, is_shared=False)}, batch_size=torch.Size([]), device=cpu, is_shared=False), step_count: Tensor(shape=torch.Size([]), device=cpu, dtype=torch.int32, is_shared=False), terminated: Tensor(shape=torch.Size([1]), device=cpu, dtype=torch.bool, is_shared=False)}, batch_size=torch.Size([]), device=cpu, is_shared=False) >>> print(env.available_envs) ['Game2048-v1', 'Maze-v0', 'Cleaner-v0', 'CVRP-v1', 'MultiCVRP-v0', 'Minesweeper-v0', 'RubiksCube-v0', 'Knapsack-v1', 'Sudoku-v0', 'Snake-v1', 'TSP-v1', 'Connector-v2', 'MMST-v0', 'GraphColoring-v0', 'RubiksCube-partly-scrambled-v0', 'RobotWarehouse-v0', 'Tetris-v0', 'BinPack-v2', 'Sudoku-very-easy-v0', 'JobShop-v0']
为了利用 Jumanji,通常会同时执行多个环境。
>>> from torchrl.envs import JumanjiEnv >>> env = JumanjiEnv("Snake-v1", batch_size=[10]) >>> env.set_seed(0) >>> td = env.reset() >>> td["action"] = env.action_spec.rand() >>> td = env.step(td)
在以下示例中,我们将迭代地测试不同的批次大小并报告简短回放的执行时间
示例
>>> from torch.utils.benchmark import Timer >>> for batch_size in [4, 16, 128]: ... timer = Timer( ... ''' ... env.rollout(100) ... ''', ... setup=f''' ... from torchrl.envs import JumanjiEnv ... env = JumanjiEnv('Snake-v1', batch_size=[{batch_size}]) ... env.set_seed(0) ... env.rollout(2) ... ''') ... print(batch_size, timer.timeit(number=10)) 4 <torch.utils.benchmark.utils.common.Measurement object at 0x1fca91910> env.rollout(100) setup: [...] Median: 122.40 ms 2 measurements, 1 runs per measurement, 1 thread 16 <torch.utils.benchmark.utils.common.Measurement object at 0x1ff9baee0> env.rollout(100) setup: [...] Median: 134.39 ms 2 measurements, 1 runs per measurement, 1 thread 128 <torch.utils.benchmark.utils.common.Measurement object at 0x1ff9ba7c0> env.rollout(100) setup: [...] Median: 172.31 ms 2 measurements, 1 runs per measurement, 1 thread