DdpgCnnQNet¶
- class torchrl.modules.DdpgCnnQNet(conv_net_kwargs: dict | None = None, mlp_net_kwargs: dict | None = None, use_avg_pooling: bool = True, device: DEVICE_TYPING | None = None)[source]¶
DDPG 卷积 Q 值类。
在“使用深度强化学习进行连续控制”中提出,https://arxiv.org/pdf/1509.02971.pdf
DDPG Q 值网络以观察结果和动作作为输入,并从中返回一个标量。
- 参数:
conv_net_kwargs (dict, 可选) –
卷积网络的关键字参数。默认为
>>> { ... 'in_features': None, ... "num_cells": [32, 64, 128], ... "kernel_sizes": [8, 4, 3], ... "strides": [4, 2, 1], ... "paddings": [0, 0, 1], ... 'activation_class': nn.ELU, ... 'norm_class': None, ... 'aggregator_class': nn.AdaptiveAvgPool2d, ... 'aggregator_kwargs': {}, ... 'squeeze_output': True, ... }
mlp_net_kwargs (dict, 可选) –
MLP 的关键字参数。默认为
>>> { ... 'in_features': None, ... 'out_features': 1, ... 'depth': 2, ... 'num_cells': 200, ... 'activation_class': nn.ELU, ... 'bias_last_layer': True, ... }
use_avg_pooling (bool, 可选) – 如果
True
,则使用AvgPooling
层来聚合输出。默认为True
。device (torch.device, 可选) – 创建模块的设备。
示例
>>> from torchrl.modules import DdpgCnnQNet >>> import torch >>> net = DdpgCnnQNet() >>> print(net) DdpgCnnQNet( (convnet): ConvNet( (0): LazyConv2d(0, 32, kernel_size=(8, 8), stride=(4, 4)) (1): ELU(alpha=1.0) (2): Conv2d(32, 64, kernel_size=(4, 4), stride=(2, 2)) (3): ELU(alpha=1.0) (4): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (5): ELU(alpha=1.0) (6): AdaptiveAvgPool2d(output_size=(1, 1)) (7): Squeeze2dLayer() ) (mlp): MLP( (0): LazyLinear(in_features=0, out_features=200, bias=True) (1): ELU(alpha=1.0) (2): Linear(in_features=200, out_features=200, bias=True) (3): ELU(alpha=1.0) (4): Linear(in_features=200, out_features=1, bias=True) ) ) >>> obs = torch.zeros(1, 3, 64, 64) >>> action = torch.zeros(1, 4) >>> value = net(obs, action) >>> print(value.shape) torch.Size([1, 1])