DdpgMlpQNet¶
- class torchrl.modules.DdpgMlpQNet(mlp_net_kwargs_net1: dict | None = None, mlp_net_kwargs_net2: dict | None = None, device: DEVICE_TYPING | None = None)[source]¶
DDPG Q 值 MLP 类。
在“CONTINUOUS CONTROL WITH DEEP REINFORCEMENT LEARNING” 中介绍,https://arxiv.org/pdf/1509.02971.pdf
DDPG Q 值网络将观察结果和动作作为输入,并从中返回一个标量。由于动作在观察结果之后被集成,因此创建了两个网络。
- 参数:
mlp_net_kwargs_net1 (dict, optional) –
MLP 的 kwargs。默认值为
>>> { ... 'in_features': None, ... 'out_features': 400, ... 'depth': 0, ... 'num_cells': [], ... 'activation_class': nn.ELU, ... 'bias_last_layer': True, ... 'activate_last_layer': True, ... }
mlp_net_kwargs_net2 –
默认值为
>>> { ... 'in_features': None, ... 'out_features': 1, ... 'depth': 1, ... 'num_cells': [300, ], ... 'activation_class': nn.ELU, ... 'bias_last_layer': True, ... }
device (torch.device, optional) – 创建模块的设备。
示例
>>> import torch >>> from torchrl.modules import DdpgMlpQNet >>> net = DdpgMlpQNet() >>> print(net) DdpgMlpQNet( (mlp1): MLP( (0): LazyLinear(in_features=0, out_features=400, bias=True) (1): ELU(alpha=1.0) ) (mlp2): MLP( (0): LazyLinear(in_features=0, out_features=300, bias=True) (1): ELU(alpha=1.0) (2): Linear(in_features=300, out_features=1, bias=True) ) ) >>> obs = torch.zeros(1, 32) >>> action = torch.zeros(1, 4) >>> value = net(obs, action) >>> print(value.shape) torch.Size([1, 1])