• 教程 >
  • PyTorch TensorBoard 支持
快捷方式

简介 || 张量 || 自动微分 || 构建模型 || TensorBoard 支持 || 训练模型 || 模型理解

PyTorch TensorBoard 支持

按照以下视频或在 youtube 上观看。

开始之前

要运行本教程,您需要安装 PyTorch、TorchVision、Matplotlib 和 TensorBoard。

使用 conda

conda install pytorch torchvision -c pytorch
conda install matplotlib tensorboard

使用 pip

pip install torch torchvision matplotlib tensorboard

安装完依赖项后,在安装它们的 Python 环境中重新启动此笔记本。

简介

在本笔记本中,我们将使用 LeNet-5 的变体针对 Fashion-MNIST 数据集进行训练。Fashion-MNIST 是一组描绘各种服装的图像块,包含十个类别标签,表示描绘的服装类型。

# PyTorch model and training necessities
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim

# Image datasets and image manipulation
import torchvision
import torchvision.transforms as transforms

# Image display
import matplotlib.pyplot as plt
import numpy as np

# PyTorch TensorBoard support
from torch.utils.tensorboard import SummaryWriter

# In case you are using an environment that has TensorFlow installed,
# such as Google Colab, uncomment the following code to avoid
# a bug with saving embeddings to your TensorBoard directory

# import tensorflow as tf
# import tensorboard as tb
# tf.io.gfile = tb.compat.tensorflow_stub.io.gfile

在 TensorBoard 中显示图像

让我们首先将数据集中的示例图像添加到 TensorBoard 中

# Gather datasets and prepare them for consumption
transform = transforms.Compose(
    [transforms.ToTensor(),
    transforms.Normalize((0.5,), (0.5,))])

# Store separate training and validations splits in ./data
training_set = torchvision.datasets.FashionMNIST('./data',
    download=True,
    train=True,
    transform=transform)
validation_set = torchvision.datasets.FashionMNIST('./data',
    download=True,
    train=False,
    transform=transform)

training_loader = torch.utils.data.DataLoader(training_set,
                                              batch_size=4,
                                              shuffle=True,
                                              num_workers=2)


validation_loader = torch.utils.data.DataLoader(validation_set,
                                                batch_size=4,
                                                shuffle=False,
                                                num_workers=2)

# Class labels
classes = ('T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat',
        'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle Boot')

# Helper function for inline image display
def matplotlib_imshow(img, one_channel=False):
    if one_channel:
        img = img.mean(dim=0)
    img = img / 2 + 0.5     # unnormalize
    npimg = img.numpy()
    if one_channel:
        plt.imshow(npimg, cmap="Greys")
    else:
        plt.imshow(np.transpose(npimg, (1, 2, 0)))

# Extract a batch of 4 images
dataiter = iter(training_loader)
images, labels = next(dataiter)

# Create a grid from the images and show them
img_grid = torchvision.utils.make_grid(images)
matplotlib_imshow(img_grid, one_channel=True)
tensorboardyt tutorial
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-images-idx3-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-images-idx3-ubyte.gz to ./data/FashionMNIST/raw/train-images-idx3-ubyte.gz

  0%|          | 0.00/26.4M [00:00<?, ?B/s]
  0%|          | 65.5k/26.4M [00:00<01:12, 361kB/s]
  1%|          | 197k/26.4M [00:00<00:45, 575kB/s]
  3%|3         | 852k/26.4M [00:00<00:13, 1.96MB/s]
 13%|#2        | 3.38M/26.4M [00:00<00:03, 6.68MB/s]
 36%|###6      | 9.57M/26.4M [00:00<00:01, 16.6MB/s]
 60%|#####9    | 15.7M/26.4M [00:01<00:00, 22.5MB/s]
 83%|########2 | 21.9M/26.4M [00:01<00:00, 26.2MB/s]
100%|##########| 26.4M/26.4M [00:01<00:00, 19.3MB/s]
Extracting ./data/FashionMNIST/raw/train-images-idx3-ubyte.gz to ./data/FashionMNIST/raw

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-labels-idx1-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-labels-idx1-ubyte.gz to ./data/FashionMNIST/raw/train-labels-idx1-ubyte.gz

  0%|          | 0.00/29.5k [00:00<?, ?B/s]
100%|##########| 29.5k/29.5k [00:00<00:00, 326kB/s]
Extracting ./data/FashionMNIST/raw/train-labels-idx1-ubyte.gz to ./data/FashionMNIST/raw

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-images-idx3-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-images-idx3-ubyte.gz to ./data/FashionMNIST/raw/t10k-images-idx3-ubyte.gz

  0%|          | 0.00/4.42M [00:00<?, ?B/s]
  1%|1         | 65.5k/4.42M [00:00<00:12, 361kB/s]
  5%|5         | 229k/4.42M [00:00<00:06, 678kB/s]
 13%|#3        | 590k/4.42M [00:00<00:03, 1.27MB/s]
 21%|##1       | 950k/4.42M [00:00<00:02, 1.55MB/s]
 31%|###1      | 1.38M/4.42M [00:00<00:01, 1.83MB/s]
 42%|####2     | 1.87M/4.42M [00:01<00:01, 2.12MB/s]
 54%|#####4    | 2.39M/4.42M [00:01<00:00, 2.37MB/s]
 67%|######6   | 2.95M/4.42M [00:01<00:00, 2.59MB/s]
 81%|########  | 3.57M/4.42M [00:01<00:00, 2.85MB/s]
 96%|#########6| 4.26M/4.42M [00:01<00:00, 3.14MB/s]
100%|##########| 4.42M/4.42M [00:01<00:00, 2.42MB/s]
Extracting ./data/FashionMNIST/raw/t10k-images-idx3-ubyte.gz to ./data/FashionMNIST/raw

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-labels-idx1-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-labels-idx1-ubyte.gz to ./data/FashionMNIST/raw/t10k-labels-idx1-ubyte.gz

  0%|          | 0.00/5.15k [00:00<?, ?B/s]
100%|##########| 5.15k/5.15k [00:00<00:00, 31.5MB/s]
Extracting ./data/FashionMNIST/raw/t10k-labels-idx1-ubyte.gz to ./data/FashionMNIST/raw

在上面,我们使用 TorchVision 和 Matplotlib 创建了输入数据的小批量的视觉网格。下面,我们使用 add_image() 调用在 SummaryWriter 上记录要供 TensorBoard 使用的图像,并且我们还调用 flush() 以确保它立即写入磁盘。

# Default log_dir argument is "runs" - but it's good to be specific
# torch.utils.tensorboard.SummaryWriter is imported above
writer = SummaryWriter('runs/fashion_mnist_experiment_1')

# Write image data to TensorBoard log dir
writer.add_image('Four Fashion-MNIST Images', img_grid)
writer.flush()

# To view, start TensorBoard on the command line with:
#   tensorboard --logdir=runs
# ...and open a browser tab to http://localhost:6006/

如果在命令行启动 TensorBoard 并在新浏览器选项卡中打开它(通常在 localhost:6006),你应该可以在 IMAGES 选项卡下看到图像网格。

将标量绘制成图表以可视化训练

TensorBoard 可用于跟踪训练的进度和有效性。下面,我们将运行一个训练循环,跟踪一些指标,并将数据保存以供 TensorBoard 使用。

让我们定义一个模型来对我们的图像块进行分类,以及用于训练的优化器和损失函数。

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(1, 6, 5)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.fc1 = nn.Linear(16 * 4 * 4, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = x.view(-1, 16 * 4 * 4)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x


net = Net()
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)

现在让我们训练一个 epoch,并在每 1000 个批次后评估训练集与验证集的损失。

print(len(validation_loader))
for epoch in range(1):  # loop over the dataset multiple times
    running_loss = 0.0

    for i, data in enumerate(training_loader, 0):
        # basic training loop
        inputs, labels = data
        optimizer.zero_grad()
        outputs = net(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

        running_loss += loss.item()
        if i % 1000 == 999:    # Every 1000 mini-batches...
            print('Batch {}'.format(i + 1))
            # Check against the validation set
            running_vloss = 0.0

            # In evaluation mode some model specific operations can be omitted eg. dropout layer
            net.train(False) # Switching to evaluation mode, eg. turning off regularisation
            for j, vdata in enumerate(validation_loader, 0):
                vinputs, vlabels = vdata
                voutputs = net(vinputs)
                vloss = criterion(voutputs, vlabels)
                running_vloss += vloss.item()
            net.train(True) # Switching back to training mode, eg. turning on regularisation

            avg_loss = running_loss / 1000
            avg_vloss = running_vloss / len(validation_loader)

            # Log the running loss averaged per batch
            writer.add_scalars('Training vs. Validation Loss',
                            { 'Training' : avg_loss, 'Validation' : avg_vloss },
                            epoch * len(training_loader) + i)

            running_loss = 0.0
print('Finished Training')

writer.flush()
2500
Batch 1000
Batch 2000
Batch 3000
Batch 4000
Batch 5000
Batch 6000
Batch 7000
Batch 8000
Batch 9000
Batch 10000
Batch 11000
Batch 12000
Batch 13000
Batch 14000
Batch 15000
Finished Training

切换到已打开的 TensorBoard 并查看 SCALARS 选项卡。

可视化你的模型

TensorBoard 也可用于检查模型中的数据流。为此,请使用模型和样本输入调用 add_graph() 方法。

# Again, grab a single mini-batch of images
dataiter = iter(training_loader)
images, labels = next(dataiter)

# add_graph() will trace the sample input through your model,
# and render it as a graph.
writer.add_graph(net, images)
writer.flush()

切换到 TensorBoard 后,你应该会看到一个 GRAPHS 选项卡。双击“NET”节点以查看模型中的层和数据流。

使用嵌入可视化你的数据集

我们正在使用的 28x28 图像块可以建模为 784 维向量(28 * 28 = 784)。将其投影到低维表示可能很有启发性。 add_embedding() 方法会将一组数据投影到方差最高的三个维度,并将其显示为交互式 3D 图表。 add_embedding() 方法通过投影到方差最高的三个维度来自动执行此操作。

下面,我们将获取数据样本并生成此类嵌入。

# Select a random subset of data and corresponding labels
def select_n_random(data, labels, n=100):
    assert len(data) == len(labels)

    perm = torch.randperm(len(data))
    return data[perm][:n], labels[perm][:n]

# Extract a random subset of data
images, labels = select_n_random(training_set.data, training_set.targets)

# get the class labels for each image
class_labels = [classes[label] for label in labels]

# log embeddings
features = images.view(-1, 28 * 28)
writer.add_embedding(features,
                    metadata=class_labels,
                    label_img=images.unsqueeze(1))
writer.flush()
writer.close()

现在,如果你切换到 TensorBoard 并选择 PROJECTOR 选项卡,你应该会看到投影的 3D 表示。你可以旋转和缩放模型。以大尺度和小尺度检查它,并查看你是否可以在投影数据和标签聚类中发现模式。

为了获得更好的可见性,建议:

  • 从左侧的“Color by”下拉菜单中选择“label”。

  • 切换顶部“夜间模式”图标,将浅色图像置于深色背景上。

其他资源

有关更多信息,请查看:

脚本的总运行时间:(2 分 40.879 秒)

由 Sphinx-Gallery 生成的图库

文档

访问 PyTorch 的全面开发者文档

查看文档

教程

获取针对初学者和高级开发人员的深入教程

查看教程

资源

查找开发资源并获得问题的解答

查看资源