DdpgCnnActor¶

class torchrl.modules.DdpgCnnActor(action_dim: int, conv_net_kwargs: dict | None = None, mlp_net_kwargs: dict | None = None, use_avg_pooling: bool = False, device: DEVICE_TYPING | None = None)[source]¶

DDPG 卷积 Actor 类。

出自“CONTINUOUS CONTROL WITH DEEP REINFORCEMENT LEARNING”一文，https://arxiv.org/pdf/1509.02971.pdf

DDPG 卷积 Actor 以观测值（观察到的像素经过一些简单转换）作为输入，从中返回一个动作向量，以及可用于价值估计的观测值嵌入。应训练它以最大化 DDPG Q 值网络返回的价值。

参数:

action_dim (int) – 动作向量的长度。

conv_net_kwargs (dict 或 dict 列表, 可选) –

ConvNet 的 kwargs。默认为

>>> {
...     'in_features': None,
...     "num_cells": [32, 64, 64],
...     "kernel_sizes": [8, 4, 3],
...     "strides": [4, 2, 1],
...     "paddings": [0, 0, 1],
...     'activation_class': torch.nn.ELU,
...     'norm_class': None,
...     'aggregator_class': SquashDims,
...     'aggregator_kwargs': {"ndims_in": 3},
...     'squeeze_output': True,
... }  #

mlp_net_kwargs –

MLP 的 kwargs。默认为

>>> {
...     'in_features': None,
...     'out_features': action_dim,
...     'depth': 2,
...     'num_cells': 200,
...     'activation_class': nn.ELU,
...     'bias_last_layer': True,
... }

use_avg_pooling (bool, 可选) – 如果为 True，则使用 AvgPooling 层来聚合输出。默认为 False。
device (torch.device, 可选) – 创建模块的设备。

示例

>>> import torch
>>> from torchrl.modules import DdpgCnnActor
>>> actor = DdpgCnnActor(action_dim=4)
>>> print(actor)
DdpgCnnActor(
  (convnet): ConvNet(
    (0): LazyConv2d(0, 32, kernel_size=(8, 8), stride=(4, 4))
    (1): ELU(alpha=1.0)
    (2): Conv2d(32, 64, kernel_size=(4, 4), stride=(2, 2))
    (3): ELU(alpha=1.0)
    (4): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (5): ELU(alpha=1.0)
    (6): SquashDims()
  )
  (mlp): MLP(
    (0): LazyLinear(in_features=0, out_features=200, bias=True)
    (1): ELU(alpha=1.0)
    (2): Linear(in_features=200, out_features=200, bias=True)
    (3): ELU(alpha=1.0)
    (4): Linear(in_features=200, out_features=4, bias=True)
  )
)
>>> obs = torch.randn(10, 3, 64, 64)
>>> action, hidden = actor(obs)
>>> print(action.shape)
torch.Size([10, 4])
>>> print(hidden.shape)
torch.Size([10, 2304])

forward(observation: Tensor) → Tuple[Tensor, Tensor][source]¶

定义每次调用时执行的计算。

应由所有子类覆盖。

注意

虽然前向传播（forward pass）的逻辑需要在该函数中定义，但之后应调用 Module 实例而不是直接调用此函数，因为前者会处理已注册的钩子（hooks），而后者会静默忽略它们。

DdpgCnnActor¶

文档

教程

资源