(Beta) 使用 FX 构建卷积层/批量归一化层融合器¶

创建时间：2021 年 3 月 4 日 | 最后更新：2024 年 1 月 16 日 | 最后验证：2024 年 11 月 5 日

作者：Horace He

在本教程中，我们将使用 FX（一个用于对 PyTorch 进行可组合函数转换的工具包）来完成以下任务

在数据依赖关系中查找卷积层/批量归一化层模式。
对于在 1) 中找到的模式，将批量归一化统计信息合并到卷积权重中。

请注意，此优化仅适用于处于推理模式（即 mode.eval()）下的模型

我们将构建此处存在的融合器：https://github.com/pytorch/pytorch/blob/orig/release/1.8/torch/fx/experimental/fuser.py

首先，让我们导入一些模块（稍后将在代码中使用所有这些模块）。

from typing import Type, Dict, Any, Tuple, Iterable
import copy
import torch.fx as fx
import torch
import torch.nn as nn

在本教程中，我们将创建一个由卷积层和批量归一化层组成的模型。请注意，此模型包含一些巧妙的组件 - 一些卷积层/批量归一化层模式隐藏在 Sequential 中，并且其中一个 BatchNorms 被另一个 Module 包装。

class WrappedBatchNorm(nn.Module):
    def __init__(self):
        super().__init__()
        self.mod = nn.BatchNorm2d(1)
    def forward(self, x):
        return self.mod(x)

class M(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(1, 1, 1)
        self.bn1 = nn.BatchNorm2d(1)
        self.conv2 = nn.Conv2d(1, 1, 1)
        self.nested = nn.Sequential(
            nn.BatchNorm2d(1),
            nn.Conv2d(1, 1, 1),
        )
        self.wrapped = WrappedBatchNorm()

    def forward(self, x):
        x = self.conv1(x)
        x = self.bn1(x)
        x = self.conv2(x)
        x = self.nested(x)
        x = self.wrapped(x)
        return x

model = M()

model.eval()

融合卷积层与批量归一化层¶

在 PyTorch 中尝试自动融合卷积层和批量归一化层的主要挑战之一是 PyTorch 不提供轻松访问计算图的方法。FX 通过符号化跟踪实际调用的操作来解决此问题，这样我们就可以跟踪通过 forward 调用、嵌套在 Sequential 模块中或包装在用户定义模块中的计算。

traced_model = torch.fx.symbolic_trace(model)
print(traced_model.graph)

这为我们提供了模型的图表示。请注意，隐藏在 Sequential 中的模块以及包装的 Module 都已内联到图中。这是默认的抽象级别，但可以由 Pass 编写器配置。更多信息可以在 FX 概览中找到：https://pytorch.ac.cn/docs/master/fx.html#module-torch.fx

融合卷积层与批量归一化层¶

与其他一些融合不同，卷积层与批量归一化层的融合不需要任何新算子。相反，由于批量归一化在推理过程中包含逐点加法和乘法，这些操作可以“烘焙”到前一个卷积的权重中。这使我们能够完全从模型中移除批量归一化层！请阅读 https://nenadmarkus.com/p/fusing-batchnorm-and-conv/ 了解更多详细信息。这里的代码为了清晰起见，复制自 https://github.com/pytorch/pytorch/blob/orig/release/1.8/torch/nn/utils/fusion.py。

def fuse_conv_bn_eval(conv, bn):
    """
    Given a conv Module `A` and an batch_norm module `B`, returns a conv
    module `C` such that C(x) == B(A(x)) in inference mode.
    """
    assert(not (conv.training or bn.training)), "Fusion only for eval!"
    fused_conv = copy.deepcopy(conv)

    fused_conv.weight, fused_conv.bias = \
        fuse_conv_bn_weights(fused_conv.weight, fused_conv.bias,
                             bn.running_mean, bn.running_var, bn.eps, bn.weight, bn.bias)

    return fused_conv

def fuse_conv_bn_weights(conv_w, conv_b, bn_rm, bn_rv, bn_eps, bn_w, bn_b):
    if conv_b is None:
        conv_b = torch.zeros_like(bn_rm)
    if bn_w is None:
        bn_w = torch.ones_like(bn_rm)
    if bn_b is None:
        bn_b = torch.zeros_like(bn_rm)
    bn_var_rsqrt = torch.rsqrt(bn_rv + bn_eps)

    conv_w = conv_w * (bn_w * bn_var_rsqrt).reshape([-1] + [1] * (len(conv_w.shape) - 1))
    conv_b = (conv_b - bn_rm) * bn_var_rsqrt * bn_w + bn_b

    return torch.nn.Parameter(conv_w), torch.nn.Parameter(conv_b)

FX 融合 Pass¶

现在我们已经有了计算图以及融合卷积层和批量归一化层的方法，剩下的就是遍历 FX 图并应用所需的融合。

def _parent_name(target : str) -> Tuple[str, str]:
    """
    Splits a ``qualname`` into parent path and last atom.
    For example, `foo.bar.baz` -> (`foo.bar`, `baz`)
    """
    *parent, name = target.rsplit('.', 1)
    return parent[0] if parent else '', name

def replace_node_module(node: fx.Node, modules: Dict[str, Any], new_module: torch.nn.Module):
    assert(isinstance(node.target, str))
    parent_name, name = _parent_name(node.target)
    setattr(modules[parent_name], name, new_module)


def fuse(model: torch.nn.Module) -> torch.nn.Module:
    model = copy.deepcopy(model)
    # The first step of most FX passes is to symbolically trace our model to
    # obtain a `GraphModule`. This is a representation of our original model
    # that is functionally identical to our original model, except that we now
    # also have a graph representation of our forward pass.
    fx_model: fx.GraphModule = fx.symbolic_trace(model)
    modules = dict(fx_model.named_modules())

    # The primary representation for working with FX are the `Graph` and the
    # `Node`. Each `GraphModule` has a `Graph` associated with it - this
    # `Graph` is also what generates `GraphModule.code`.
    # The `Graph` itself is represented as a list of `Node` objects. Thus, to
    # iterate through all of the operations in our graph, we iterate over each
    # `Node` in our `Graph`.
    for node in fx_model.graph.nodes:
        # The FX IR contains several types of nodes, which generally represent
        # call sites to modules, functions, or methods. The type of node is
        # determined by `Node.op`.
        if node.op != 'call_module': # If our current node isn't calling a Module then we can ignore it.
            continue
        # For call sites, `Node.target` represents the module/function/method
        # that's being called. Here, we check `Node.target` to see if it's a
        # batch norm module, and then check `Node.args[0].target` to see if the
        # input `Node` is a convolution.
        if type(modules[node.target]) is nn.BatchNorm2d and type(modules[node.args[0].target]) is nn.Conv2d:
            if len(node.args[0].users) > 1:  # Output of conv is used by other nodes
                continue
            conv = modules[node.args[0].target]
            bn = modules[node.target]
            fused_conv = fuse_conv_bn_eval(conv, bn)
            replace_node_module(node.args[0], modules, fused_conv)
            # As we've folded the batch nor into the conv, we need to replace all uses
            # of the batch norm with the conv.
            node.replace_all_uses_with(node.args[0])
            # Now that all uses of the batch norm have been replaced, we can
            # safely remove the batch norm.
            fx_model.graph.erase_node(node)
    fx_model.graph.lint()
    # After we've modified our graph, we need to recompile our graph in order
    # to keep the generated code in sync.
    fx_model.recompile()
    return fx_model

注意

出于演示目的，我们在此处做了一些简化，例如仅匹配 2D 卷积。查看 https://github.com/pytorch/pytorch/blob/master/torch/fx/experimental/fuser.py 获取更可用的 Pass。

测试我们的融合 Pass¶

现在，我们可以在初始玩具模型上运行此融合 Pass，并验证我们的结果是否一致。此外，我们可以打印出融合模型的代码，并验证不再存在批量归一化层。

fused_model = fuse(model)
print(fused_model.code)
inp = torch.randn(5, 1, 1, 1)
torch.testing.assert_allclose(fused_model(inp), model(inp))

在 ResNet18 上基准测试我们的融合¶

我们可以在像 ResNet18 这样的大型模型上测试我们的融合 Pass，看看此 Pass 能在多大程度上提高推理性能。

import torchvision.models as models
import time

rn18 = models.resnet18()
rn18.eval()

inp = torch.randn(10, 3, 224, 224)
output = rn18(inp)

def benchmark(model, iters=20):
    for _ in range(10):
        model(inp)
    begin = time.time()
    for _ in range(iters):
        model(inp)
    return str(time.time()-begin)

fused_rn18 = fuse(rn18)
print("Unfused time: ", benchmark(rn18))
print("Fused time: ", benchmark(fused_rn18))

正如我们之前所见，FX 转换的输出是（“可 TorchScript 化”）的 PyTorch 代码，我们可以轻松地对输出进行 jit.script，以便进一步提高性能。通过这种方式，FX 模型转换可以与 TorchScript 无缝组合。

jit_rn18 = torch.jit.script(fused_rn18)
print("jit time: ", benchmark(jit_rn18))


############
# Conclusion
# ----------
# As we can see, using FX we can easily write static graph transformations on
# PyTorch code.
#
# Since FX is still in beta, we would be happy to hear any
# feedback you have about using it. Please feel free to use the
# PyTorch Forums (https://discuss.pytorch.org/) and the issue tracker
# (https://github.com/pytorch/pytorch/issues) to provide any feedback
# you might have.

脚本总运行时间：( 0 分钟 0.000 秒)

Gallery 由 Sphinx-Gallery 生成

(Beta) 使用 FX 构建卷积层/批量归一化层融合器¶

融合卷积层与批量归一化层¶

融合卷积层与批量归一化层¶

FX 融合 Pass¶

测试我们的融合 Pass¶

在 ResNet18 上基准测试我们的融合¶

文档

教程

资源