• 文档 >
  • 使用预训练模型
快捷方式

使用预训练模型

本教程介绍如何在 TorchRL 中使用预训练模型。

在本教程结束时,您将能够使用预训练模型进行高效的图像表示,并对其进行微调。

TorchRL 提供预训练模型,这些模型用作转换或策略的组件。由于语义相同,它们可以在一种或另一种情况下互换使用。在本教程中,我们将使用 R3M (https://arxiv.org/abs/2203.12601),但其他模型(例如 VIP)也能同样好地工作。

import torch.cuda
from tensordict.nn import TensorDictSequential
from torch import nn
from torchrl.envs import R3MTransform, TransformedEnv
from torchrl.envs.libs.gym import GymEnv
from torchrl.modules import Actor

is_fork = multiprocessing.get_start_method() == "fork"
device = (
    torch.device(0)
    if torch.cuda.is_available() and not is_fork
    else torch.device("cpu")
)

首先让我们创建一个环境。为了简单起见,我们将使用一个常见的 gym 环境。在实践中,这将在更具挑战性的具身 AI 上下文中发挥作用(例如,查看我们的 Habitat 包装器)。

base_env = GymEnv("Ant-v4", from_pixels=True, device=device)

让我们获取我们的预训练模型。我们通过 download=True 标志请求模型的预训练版本。默认情况下,此标志被禁用。接下来,我们将把我们的转换附加到环境中。在实践中,将发生的事情是,收集的每一批数据都将通过转换,并在输出张量字典中映射到“r3m_vec”条目。然后,我们的策略(包含一个单层 MLP)将读取此向量并计算相应的动作。

r3m = R3MTransform(
    "resnet50",
    in_keys=["pixels"],
    download=True,
)
env_transformed = TransformedEnv(base_env, r3m)
net = nn.Sequential(
    nn.LazyLinear(128, device=device),
    nn.Tanh(),
    nn.Linear(128, base_env.action_spec.shape[-1], device=device),
)
policy = Actor(net, in_keys=["r3m_vec"])
Downloading: "https://pytorch.s3.amazonaws.com/models/rl/r3m/r3m_50.pt" to /root/.cache/torch/hub/checkpoints/r3m_50.pt

  0%|          | 0.00/374M [00:00<?, ?B/s]
  4%|▍         | 14.9M/374M [00:00<00:02, 139MB/s]
  8%|▊         | 28.2M/374M [00:00<00:03, 110MB/s]
 10%|█         | 39.1M/374M [00:00<00:05, 62.8MB/s]
 13%|█▎        | 49.2M/374M [00:00<00:06, 53.1MB/s]
 17%|█▋        | 64.0M/374M [00:01<00:05, 57.3MB/s]
 19%|█▊        | 70.1M/374M [00:01<00:05, 58.5MB/s]
 22%|██▏       | 82.0M/374M [00:01<00:04, 64.2MB/s]
 26%|██▌       | 97.8M/374M [00:01<00:04, 66.2MB/s]
 28%|██▊       | 104M/374M [00:01<00:04, 62.9MB/s]
 31%|███       | 115M/374M [00:01<00:03, 73.4MB/s]
 35%|███▍      | 130M/374M [00:01<00:02, 90.4MB/s]
 37%|███▋      | 140M/374M [00:02<00:03, 81.2MB/s]
 40%|███▉      | 148M/374M [00:02<00:02, 82.4MB/s]
 44%|████▎     | 163M/374M [00:02<00:02, 99.8MB/s]
 46%|████▋     | 173M/374M [00:02<00:02, 93.8MB/s]
 49%|████▉     | 183M/374M [00:02<00:03, 66.3MB/s]
 52%|█████▏    | 195M/374M [00:02<00:02, 66.7MB/s]
 54%|█████▍    | 202M/374M [00:03<00:03, 56.0MB/s]
 57%|█████▋    | 212M/374M [00:03<00:03, 55.2MB/s]
 58%|█████▊    | 218M/374M [00:03<00:02, 55.6MB/s]
 61%|██████▏   | 229M/374M [00:03<00:02, 53.2MB/s]
 66%|██████▌   | 246M/374M [00:03<00:02, 65.2MB/s]
 70%|██████▉   | 262M/374M [00:03<00:01, 77.8MB/s]
 72%|███████▏  | 269M/374M [00:04<00:01, 67.7MB/s]
 74%|███████▍  | 277M/374M [00:04<00:01, 65.5MB/s]
 76%|███████▌  | 284M/374M [00:04<00:01, 64.1MB/s]
 78%|███████▊  | 293M/374M [00:04<00:01, 61.6MB/s]
 80%|████████  | 299M/374M [00:04<00:01, 60.8MB/s]
 83%|████████▎ | 310M/374M [00:04<00:01, 64.2MB/s]
 84%|████████▍ | 316M/374M [00:04<00:01, 55.8MB/s]
 88%|████████▊ | 328M/374M [00:05<00:00, 59.0MB/s]
 92%|█████████▏| 342M/374M [00:05<00:00, 69.8MB/s]
 94%|█████████▎| 350M/374M [00:05<00:00, 72.2MB/s]
 96%|█████████▋| 360M/374M [00:05<00:00, 71.1MB/s]
100%|█████████▉| 373M/374M [00:05<00:00, 84.5MB/s]
100%|██████████| 374M/374M [00:05<00:00, 68.9MB/s]

让我们检查一下策略的参数数量

print("number of params:", len(list(policy.parameters())))
number of params: 4

我们收集一个 32 步的回滚,并打印其输出

rollout = env_transformed.rollout(32, policy)
print("rollout with transform:", rollout)
rollout with transform: TensorDict(
    fields={
        action: Tensor(shape=torch.Size([32, 8]), device=cpu, dtype=torch.float32, is_shared=False),
        done: Tensor(shape=torch.Size([32, 1]), device=cpu, dtype=torch.bool, is_shared=False),
        next: TensorDict(
            fields={
                done: Tensor(shape=torch.Size([32, 1]), device=cpu, dtype=torch.bool, is_shared=False),
                r3m_vec: Tensor(shape=torch.Size([32, 2048]), device=cpu, dtype=torch.float32, is_shared=False),
                reward: Tensor(shape=torch.Size([32, 1]), device=cpu, dtype=torch.float32, is_shared=False),
                terminated: Tensor(shape=torch.Size([32, 1]), device=cpu, dtype=torch.bool, is_shared=False),
                truncated: Tensor(shape=torch.Size([32, 1]), device=cpu, dtype=torch.bool, is_shared=False)},
            batch_size=torch.Size([32]),
            device=cpu,
            is_shared=False),
        r3m_vec: Tensor(shape=torch.Size([32, 2048]), device=cpu, dtype=torch.float32, is_shared=False),
        terminated: Tensor(shape=torch.Size([32, 1]), device=cpu, dtype=torch.bool, is_shared=False),
        truncated: Tensor(shape=torch.Size([32, 1]), device=cpu, dtype=torch.bool, is_shared=False)},
    batch_size=torch.Size([32]),
    device=cpu,
    is_shared=False)

为了微调,我们在将参数设为可训练后将转换集成到策略中。在实践中,可能明智的做法是将此限制在参数的子集(例如 MLP 的最后一层)。

r3m.train()
policy = TensorDictSequential(r3m, policy)
print("number of params after r3m is integrated:", len(list(policy.parameters())))
number of params after r3m is integrated: 163

同样,我们使用 R3M 收集一个回滚。输出结构略有改变,因为现在环境返回像素(而不是嵌入)。嵌入“r3m_vec”是策略的中间结果。

rollout = base_env.rollout(32, policy)
print("rollout, fine tuning:", rollout)
rollout, fine tuning: TensorDict(
    fields={
        action: Tensor(shape=torch.Size([32, 8]), device=cpu, dtype=torch.float32, is_shared=False),
        done: Tensor(shape=torch.Size([32, 1]), device=cpu, dtype=torch.bool, is_shared=False),
        next: TensorDict(
            fields={
                done: Tensor(shape=torch.Size([32, 1]), device=cpu, dtype=torch.bool, is_shared=False),
                pixels: Tensor(shape=torch.Size([32, 480, 480, 3]), device=cpu, dtype=torch.uint8, is_shared=False),
                reward: Tensor(shape=torch.Size([32, 1]), device=cpu, dtype=torch.float32, is_shared=False),
                terminated: Tensor(shape=torch.Size([32, 1]), device=cpu, dtype=torch.bool, is_shared=False),
                truncated: Tensor(shape=torch.Size([32, 1]), device=cpu, dtype=torch.bool, is_shared=False)},
            batch_size=torch.Size([32]),
            device=cpu,
            is_shared=False),
        r3m_vec: Tensor(shape=torch.Size([32, 2048]), device=cpu, dtype=torch.float32, is_shared=False),
        terminated: Tensor(shape=torch.Size([32, 1]), device=cpu, dtype=torch.bool, is_shared=False),
        truncated: Tensor(shape=torch.Size([32, 1]), device=cpu, dtype=torch.bool, is_shared=False)},
    batch_size=torch.Size([32]),
    device=cpu,
    is_shared=False)

我们能够轻松地将转换从环境交换到策略,这是因为它们都表现得像 TensorDictModule:它们有一组“in_keys”和“out_keys”,这使得在不同上下文中轻松读取和写入输出变得容易。

为了结束本教程,让我们看看如何使用 R3M 读取存储在回放缓冲区中的图像(例如,在离线 RL 上下文中)。首先,让我们构建我们的数据集

from torchrl.data import LazyMemmapStorage, ReplayBuffer

storage = LazyMemmapStorage(1000)
rb = ReplayBuffer(storage=storage, transform=r3m)

现在我们可以收集数据(出于我们的目的而进行随机回滚)并用它填充回放缓冲区

total = 0
while total < 1000:
    tensordict = base_env.rollout(1000)
    rb.extend(tensordict)
    total += tensordict.numel()

让我们检查一下回放缓冲区存储的样子。它不应包含“r3m_vec”条目,因为我们尚未使用它

print("stored data:", storage._storage)
stored data: TensorDict(
    fields={
        action: MemoryMappedTensor(shape=torch.Size([1000, 8]), device=cpu, dtype=torch.float32, is_shared=False),
        done: MemoryMappedTensor(shape=torch.Size([1000, 1]), device=cpu, dtype=torch.bool, is_shared=False),
        next: TensorDict(
            fields={
                done: MemoryMappedTensor(shape=torch.Size([1000, 1]), device=cpu, dtype=torch.bool, is_shared=False),
                pixels: MemoryMappedTensor(shape=torch.Size([1000, 480, 480, 3]), device=cpu, dtype=torch.uint8, is_shared=False),
                reward: MemoryMappedTensor(shape=torch.Size([1000, 1]), device=cpu, dtype=torch.float32, is_shared=False),
                terminated: MemoryMappedTensor(shape=torch.Size([1000, 1]), device=cpu, dtype=torch.bool, is_shared=False),
                truncated: MemoryMappedTensor(shape=torch.Size([1000, 1]), device=cpu, dtype=torch.bool, is_shared=False)},
            batch_size=torch.Size([1000]),
            device=cpu,
            is_shared=False),
        pixels: MemoryMappedTensor(shape=torch.Size([1000, 480, 480, 3]), device=cpu, dtype=torch.uint8, is_shared=False),
        terminated: MemoryMappedTensor(shape=torch.Size([1000, 1]), device=cpu, dtype=torch.bool, is_shared=False),
        truncated: MemoryMappedTensor(shape=torch.Size([1000, 1]), device=cpu, dtype=torch.bool, is_shared=False)},
    batch_size=torch.Size([1000]),
    device=cpu,
    is_shared=False)

采样时,数据将通过 R3M 转换,从而提供我们想要处理的数据。这样,我们就可以在由图像组成的数据集上离线训练算法

batch = rb.sample(32)
print("data after sampling:", batch)
data after sampling: TensorDict(
    fields={
        action: Tensor(shape=torch.Size([32, 8]), device=cpu, dtype=torch.float32, is_shared=False),
        done: Tensor(shape=torch.Size([32, 1]), device=cpu, dtype=torch.bool, is_shared=False),
        next: TensorDict(
            fields={
                done: Tensor(shape=torch.Size([32, 1]), device=cpu, dtype=torch.bool, is_shared=False),
                pixels: Tensor(shape=torch.Size([32, 480, 480, 3]), device=cpu, dtype=torch.uint8, is_shared=False),
                reward: Tensor(shape=torch.Size([32, 1]), device=cpu, dtype=torch.float32, is_shared=False),
                terminated: Tensor(shape=torch.Size([32, 1]), device=cpu, dtype=torch.bool, is_shared=False),
                truncated: Tensor(shape=torch.Size([32, 1]), device=cpu, dtype=torch.bool, is_shared=False)},
            batch_size=torch.Size([32]),
            device=cpu,
            is_shared=False),
        r3m_vec: Tensor(shape=torch.Size([32, 2048]), device=cpu, dtype=torch.float32, is_shared=False),
        terminated: Tensor(shape=torch.Size([32, 1]), device=cpu, dtype=torch.bool, is_shared=False),
        truncated: Tensor(shape=torch.Size([32, 1]), device=cpu, dtype=torch.bool, is_shared=False)},
    batch_size=torch.Size([32]),
    device=cpu,
    is_shared=False)

脚本总运行时间:(0 分钟 52.259 秒)

估计内存使用量:2602 MB

Sphinx-Gallery 生成的图库

文档

访问 PyTorch 的全面开发者文档

查看文档

教程

获取面向初学者和高级开发者的深入教程

查看教程

资源

查找开发资源并获得问题解答

查看资源