OpenSpielWrapper¶
- torchrl.envs.OpenSpielWrapper(*args, **kwargs)[来源]¶
Google DeepMind OpenSpiel 环境封装器。
GitHub: https://github.com/google-deepmind/open_spiel
文档: https://openspiel.readthedocs.io/en/latest/index.html
- 参数:
env (pyspiel.State) – 要封装的游戏。
- 关键字参数:
device (torch.device, 可选) – 如果提供,数据将被转换到该设备。默认为
None
。batch_size (torch.Size, 可选) – 环境的批量大小。默认为
torch.Size([])
。allow_done_after_reset (bool, 可选) – 如果为
True
,则允许环境在调用reset()
后立即结束。默认为False
。group_map (MarlGroupMapType 或 Dict[str, List[str]]], 可选) – 如何在 tensordicts 中分组智能体用于输入/输出。详情请参阅
MarlGroupMapType
。默认为ALL_IN_ONE_GROUP
。categorical_actions (bool, 可选) – 如果为
True
,分类动作规格将被转换为等效的 TorchRL 类型 (torchrl.data.Categorical
),否则将使用独热编码 (torchrl.data.OneHot
)。默认为False
。return_state (bool, 可选) – 如果为
True
,reset()
和step()
的输出将包含“state”。可以将该 state 传递给reset()
以重置到该特定 state,而不是重置到初始 state。默认为False
。
- 变量:
available_envs – 可构建的环境
示例
>>> import pyspiel >>> from torchrl.envs import OpenSpielWrapper >>> from tensordict import TensorDict >>> base_env = pyspiel.load_game('chess').new_initial_state() >>> env = OpenSpielWrapper(base_env, return_state=True) >>> td = env.reset() >>> td = env.step(env.full_action_spec.rand()) >>> print(td) TensorDict( fields={ agents: TensorDict( fields={ action: Tensor(shape=torch.Size([2, 4672]), device=cpu, dtype=torch.int64, is_shared=False)}, batch_size=torch.Size([]), device=None, is_shared=False), next: TensorDict( fields={ agents: TensorDict( fields={ observation: Tensor(shape=torch.Size([2, 20, 8, 8]), device=cpu, dtype=torch.float32, is_shared=False), reward: Tensor(shape=torch.Size([2, 1]), device=cpu, dtype=torch.float32, is_shared=False)}, batch_size=torch.Size([2]), device=None, is_shared=False), current_player: Tensor(shape=torch.Size([]), device=cpu, dtype=torch.int32, is_shared=False), done: Tensor(shape=torch.Size([1]), device=cpu, dtype=torch.bool, is_shared=False), state: NonTensorData(data=FEN: rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1 3009 , batch_size=torch.Size([]), device=None), terminated: Tensor(shape=torch.Size([1]), device=cpu, dtype=torch.bool, is_shared=False)}, batch_size=torch.Size([]), device=None, is_shared=False)}, batch_size=torch.Size([]), device=None, is_shared=False) >>> print(env.available_envs) ['2048', 'add_noise', 'amazons', 'backgammon', ...]
reset()
可以恢复特定的 state,而不是初始 state,前提是return_state=True
。>>> import pyspiel >>> from torchrl.envs import OpenSpielWrapper >>> from tensordict import TensorDict >>> base_env = pyspiel.load_game('chess').new_initial_state() >>> env = OpenSpielWrapper(base_env, return_state=True) >>> td = env.reset() >>> td = env.step(env.full_action_spec.rand()) >>> td_restore = td["next"] >>> td = env.step(env.full_action_spec.rand()) >>> # Current state is not equal `td_restore` >>> (td["next"] == td_restore).all() False >>> td = env.reset(td_restore) >>> # After resetting, now the current state is equal to `td_restore` >>> (td == td_restore).all() True