注意

点击此处下载完整的示例代码

(可选) 将 PyTorch 模型导出到 ONNX 并使用 ONNX Runtime 运行¶

创建日期: 2019年7月17日 | 最后更新: 2024年7月17日 | 最后验证: 2024年11月5日

注意

截至 PyTorch 2.1 版本，ONNX Exporter 有两个版本。

torch.onnx.dynamo_export 是基于 TorchDynamo 技术的新版 (仍在 Beta 阶段) 导出器，随 PyTorch 2.0 发布。
torch.onnx.export 基于 TorchScript 后端，自 PyTorch 1.2.0 版本起可用。

在本教程中，我们将介绍如何使用基于 TorchScript 后端的 torch.onnx.export ONNX 导出器，将 PyTorch 中定义的模型转换为 ONNX 格式。

导出的模型将使用 ONNX Runtime 执行。ONNX Runtime 是一个专注于性能的 ONNX 模型推理引擎，可在多种平台和硬件上高效推理 (包括 Windows、Linux 和 Mac，支持 CPU 和 GPU)。如此处所述，ONNX Runtime 已证明可显著提升多种模型的性能。

对于本教程，你需要安装 ONNX 和 ONNX Runtime。你可以通过以下方式获取 ONNX 和 ONNX Runtime 的二进制构建：

%%bash
pip install onnx onnxruntime

ONNX Runtime 建议使用最新的稳定版运行时库来搭配 PyTorch。

# Some standard imports
import numpy as np

from torch import nn
import torch.utils.model_zoo as model_zoo
import torch.onnx

超分辨率是一种提高图像、视频分辨率的技术，广泛应用于图像处理或视频编辑。在本教程中，我们将使用一个小的超分辨率模型。

首先，在 PyTorch 中创建一个 SuperResolution 模型。该模型使用Shi 等人发表的“Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network”中描述的高效亚像素卷积层，用于按上采样因子提高图像分辨率。该模型期望将图像 YCbCr 的 Y 分量作为输入，并输出超分辨率的上采样 Y 分量。

该模型直接来自 PyTorch 示例，未做修改。

# Super Resolution model definition in PyTorch
import torch.nn as nn
import torch.nn.init as init


class SuperResolutionNet(nn.Module):
    def __init__(self, upscale_factor, inplace=False):
        super(SuperResolutionNet, self).__init__()

        self.relu = nn.ReLU(inplace=inplace)
        self.conv1 = nn.Conv2d(1, 64, (5, 5), (1, 1), (2, 2))
        self.conv2 = nn.Conv2d(64, 64, (3, 3), (1, 1), (1, 1))
        self.conv3 = nn.Conv2d(64, 32, (3, 3), (1, 1), (1, 1))
        self.conv4 = nn.Conv2d(32, upscale_factor ** 2, (3, 3), (1, 1), (1, 1))
        self.pixel_shuffle = nn.PixelShuffle(upscale_factor)

        self._initialize_weights()

    def forward(self, x):
        x = self.relu(self.conv1(x))
        x = self.relu(self.conv2(x))
        x = self.relu(self.conv3(x))
        x = self.pixel_shuffle(self.conv4(x))
        return x

    def _initialize_weights(self):
        init.orthogonal_(self.conv1.weight, init.calculate_gain('relu'))
        init.orthogonal_(self.conv2.weight, init.calculate_gain('relu'))
        init.orthogonal_(self.conv3.weight, init.calculate_gain('relu'))
        init.orthogonal_(self.conv4.weight)

# Create the super-resolution model by using the above model definition.
torch_model = SuperResolutionNet(upscale_factor=3)

通常情况下，你现在会训练这个模型；但是，对于本教程，我们将改为下载一些预训练权重。请注意，这个模型没有经过充分训练以达到高精度，此处仅用于演示目的。

在导出模型之前，务必调用 torch_model.eval() 或 torch_model.train(False)，将模型切换到推理模式。这是必需的，因为 dropout 或 batchnorm 等操作符在推理模式和训练模式下的行为不同。

# Load pretrained model weights
model_url = 'https://s3.amazonaws.com/pytorch/test_data/export/superres_epoch100-44c6958e.pth'
batch_size = 64    # just a random number

# Initialize model with the pretrained weights
map_location = lambda storage, loc: storage
if torch.cuda.is_available():
    map_location = None
torch_model.load_state_dict(model_zoo.load_url(model_url, map_location=map_location))

# set the model to inference mode
torch_model.eval()

在 PyTorch 中导出模型可以通过追踪 (tracing) 或脚本化 (scripting) 完成。本教程将使用通过追踪导出的模型作为示例。要导出模型，我们调用 torch.onnx.export() 函数。这将执行模型，并记录用于计算输出的操作符的追踪信息。由于 export 会运行模型，我们需要提供一个输入张量 x。其中的值可以是随机的，只要类型和大小正确即可。请注意，除非指定为动态轴，否则导出后的 ONNX 图中所有输入维度的输入大小将是固定的。在此示例中，我们以批量大小为 1 的输入导出模型，然后在 torch.onnx.export() 的 dynamic_axes 参数中将第一个维度指定为动态。因此，导出的模型将接受大小为 [batch_size, 1, 224, 224] 的输入，其中 batch_size 可以是可变的。

要了解更多关于 PyTorch 导出接口的详细信息，请查阅torch.onnx 文档。

# Input to the model
x = torch.randn(batch_size, 1, 224, 224, requires_grad=True)
torch_out = torch_model(x)

# Export the model
torch.onnx.export(torch_model,               # model being run
                  x,                         # model input (or a tuple for multiple inputs)
                  "super_resolution.onnx",   # where to save the model (can be a file or file-like object)
                  export_params=True,        # store the trained parameter weights inside the model file
                  opset_version=10,          # the ONNX version to export the model to
                  do_constant_folding=True,  # whether to execute constant folding for optimization
                  input_names = ['input'],   # the model's input names
                  output_names = ['output'], # the model's output names
                  dynamic_axes={'input' : {0 : 'batch_size'},    # variable length axes
                                'output' : {0 : 'batch_size'}})

我们还计算了模型输出 torch_out，我们将使用它来验证导出的模型在 ONNX Runtime 中运行时是否计算出相同的值。

在使用 ONNX Runtime 验证模型输出之前，我们将使用 ONNX API 检查 ONNX 模型。首先，onnx.load("super_resolution.onnx") 将加载保存的模型，并输出一个 onnx.ModelProto 结构（这是一种用于捆绑机器学习模型的顶级文件/容器格式。更多信息请参阅onnx.proto 文档）。然后，onnx.checker.check_model(onnx_model) 将验证模型结构并确认模型具有有效的 schema。ONNX 图的有效性通过检查模型的版本、图结构以及节点及其输入和输出来验证。

import onnx

onnx_model = onnx.load("super_resolution.onnx")
onnx.checker.check_model(onnx_model)

现在，让我们使用 ONNX Runtime 的 Python API 计算输出。这部分通常可以在单独的进程或另一台机器上完成，但我们将继续在同一个进程中进行，以便验证 ONNX Runtime 和 PyTorch 对网络计算出相同的值。

为了使用 ONNX Runtime 运行模型，我们需要为模型创建一个推理会话 (inference session)，并使用选定的配置参数（此处我们使用默认配置）。会话创建后，我们使用 run() API 评估模型。此调用的输出是一个列表，其中包含 ONNX Runtime 计算的模型输出。

import onnxruntime

ort_session = onnxruntime.InferenceSession("super_resolution.onnx", providers=["CPUExecutionProvider"])

def to_numpy(tensor):
    return tensor.detach().cpu().numpy() if tensor.requires_grad else tensor.cpu().numpy()

# compute ONNX Runtime output prediction
ort_inputs = {ort_session.get_inputs()[0].name: to_numpy(x)}
ort_outs = ort_session.run(None, ort_inputs)

# compare ONNX Runtime and PyTorch results
np.testing.assert_allclose(to_numpy(torch_out), ort_outs[0], rtol=1e-03, atol=1e-05)

print("Exported model has been tested with ONNXRuntime, and the result looks good!")

我们应该看到 PyTorch 和 ONNX Runtime 运行的输出在给定精度 (rtol=1e-03 和 atol=1e-05) 下数值匹配。顺带一提，如果它们不匹配，则 ONNX 导出器存在问题，届时请联系我们。

模型间的性能对比¶

由于 ONNX 模型针对推理速度进行了优化，因此在 ONNX 模型上运行相同数据应比在原生 PyTorch 模型上运行带来高达 2 倍的性能提升。批量越大，提升越明显。

import time

x = torch.randn(batch_size, 1, 224, 224, requires_grad=True)

start = time.time()
torch_out = torch_model(x)
end = time.time()
print(f"Inference of Pytorch model used {end - start} seconds")

ort_inputs = {ort_session.get_inputs()[0].name: to_numpy(x)}
start = time.time()
ort_outs = ort_session.run(None, ort_inputs)
end = time.time()
print(f"Inference of ONNX model used {end - start} seconds")

使用 ONNX Runtime 在图像上运行模型¶

到目前为止，我们已经从 PyTorch 导出了一个模型，并展示了如何使用虚拟张量作为输入在 ONNX Runtime 中加载和运行它。

对于本教程，我们将使用一张广泛使用的著名猫咪图片，如下所示：

首先，让我们加载图片，并使用标准的 Python PIL 库对其进行预处理。请注意，这种预处理是训练/测试神经网络时处理数据的标准做法。

我们首先将图片调整大小以匹配模型的输入尺寸 (224x224)。然后，我们将图片分解为其 Y、Cb 和 Cr 分量。这些分量分别代表灰度图像 (Y) 和蓝差 (Cb) 与红差 (Cr) 色度分量。由于人眼对 Y 分量更敏感，我们对这个分量感兴趣，也将对其进行变换。提取 Y 分量后，我们将其转换为张量，作为我们模型的输入。

from PIL import Image
import torchvision.transforms as transforms

img = Image.open("./_static/img/cat.jpg")

resize = transforms.Resize([224, 224])
img = resize(img)

img_ycbcr = img.convert('YCbCr')
img_y, img_cb, img_cr = img_ycbcr.split()

to_tensor = transforms.ToTensor()
img_y = to_tensor(img_y)
img_y.unsqueeze_(0)

现在，下一步是获取代表灰度调整大小猫咪图片的张量，并如前所述在 ONNX Runtime 中运行超分辨率模型。

ort_inputs = {ort_session.get_inputs()[0].name: to_numpy(img_y)}
ort_outs = ort_session.run(None, ort_inputs)
img_out_y = ort_outs[0]

此时，模型的输出是一个张量。现在，我们将处理模型的输出，从输出张量重建最终输出图片，并保存图片。后处理步骤参考了 PyTorch 超分辨率模型此处的实现。

img_out_y = Image.fromarray(np.uint8((img_out_y[0] * 255.0).clip(0, 255)[0]), mode='L')

# get the output image follow post-processing step from PyTorch implementation
final_img = Image.merge(
    "YCbCr", [
        img_out_y,
        img_cb.resize(img_out_y.size, Image.BICUBIC),
        img_cr.resize(img_out_y.size, Image.BICUBIC),
    ]).convert("RGB")

# Save the image, we will compare this with the output image from mobile device
final_img.save("./_static/img/cat_superres_with_ort.jpg")

# Save resized original image (without super-resolution)
img = transforms.Resize([img_out_y.size[0], img_out_y.size[1]])(img)
img.save("cat_resized.jpg")

以下是两张图片的对比：

低分辨率图片

超分辨率处理后的图片

ONNX Runtime 是一个跨平台引擎，你可以在多种平台以及 CPU 和 GPU 上运行它。

ONNX Runtime 也可以部署到云端，使用 Azure Machine Learning Services 进行模型推理。更多信息请参阅此处。

更多关于 ONNX Runtime 性能的信息请参阅此处。

更多关于 ONNX Runtime 的信息请参阅此处。

脚本总运行时间：(0 分钟 0.000 秒)

由 Sphinx-Gallery 生成的示例集

(可选) 将 PyTorch 模型导出到 ONNX 并使用 ONNX Runtime 运行¶

模型间的性能对比¶

使用 ONNX Runtime 在图像上运行模型¶

文档

教程

资源