torch.export¶

警告

此特性是一个正在积极开发中的原型，将来*会*有重大变更。

概述¶

torch.export.export() 接受一个 torch.nn.Module 并生成一个代表函数中张量计算的跟踪图（Ahead-of-Time，AOT 方式），该跟踪图随后可以以不同的输出执行或序列化。

import torch
from torch.export import export

class Mod(torch.nn.Module):
    def forward(self, x: torch.Tensor, y: torch.Tensor) -> torch.Tensor:
        a = torch.sin(x)
        b = torch.cos(y)
        return a + b

example_args = (torch.randn(10, 10), torch.randn(10, 10))

exported_program: torch.export.ExportedProgram = export(
    Mod(), args=example_args
)
print(exported_program)

ExportedProgram:
    class GraphModule(torch.nn.Module):
        def forward(self, x: "f32[10, 10]", y: "f32[10, 10]"):
            # code: a = torch.sin(x)
            sin: "f32[10, 10]" = torch.ops.aten.sin.default(x)

            # code: b = torch.cos(y)
            cos: "f32[10, 10]" = torch.ops.aten.cos.default(y)

            # code: return a + b
            add: f32[10, 10] = torch.ops.aten.add.Tensor(sin, cos)
            return (add,)

    Graph signature:
        ExportGraphSignature(
            input_specs=[
                InputSpec(
                    kind=<InputKind.USER_INPUT: 1>,
                    arg=TensorArgument(name='x'),
                    target=None,
                    persistent=None
                ),
                InputSpec(
                    kind=<InputKind.USER_INPUT: 1>,
                    arg=TensorArgument(name='y'),
                    target=None,
                    persistent=None
                )
            ],
            output_specs=[
                OutputSpec(
                    kind=<OutputKind.USER_OUTPUT: 1>,
                    arg=TensorArgument(name='add'),
                    target=None
                )
            ]
        )
    Range constraints: {}

torch.export 生成一个具有以下不变量的清晰中间表示 (IR)。关于 IR 的更多规范可以在这里找到。

稳健性：它保证是原始程序的稳健表示，并保持与原始程序相同的调用约定。
标准化：图中没有 Python 语义。原始程序中的子模块被内联，形成一个完全扁平化的计算图。
图属性：该图是纯函数式的，这意味着它不包含具有副作用的操作，例如变异或别名。它不会变异任何中间值、参数或缓冲区。
元数据：图包含在跟踪期间捕获的元数据，例如来自用户代码的堆栈跟踪。

在底层，torch.export 利用了以下最新技术：

TorchDynamo (torch._dynamo) 是一个内部 API，它使用 CPython 的 Frame Evaluation API 来安全地跟踪 PyTorch 图。这大大改进了图捕获体验，需要重写的代码少得多才能完全跟踪 PyTorch 代码。
AOT Autograd 提供一个函数化的 PyTorch 图，并确保该图被分解/降级到 ATen 操作符集。
Torch FX (torch.fx) 是图的底层表示，允许灵活的基于 Python 的转换。

现有框架¶

torch.compile() 也利用了与 torch.export 相同的 PT2 技术栈，但有一些不同：

JIT vs. AOT: torch.compile() 是一个 JIT（Just-In-Time，即时）编译器，不旨在用于在部署之外生成编译好的工件。
部分 vs. 完全图捕获：当 torch.compile() 遇到模型中无法跟踪的部分时，它会“图中断”并回退到在 eager Python 运行时中执行程序。相比之下，torch.export 的目标是获得 PyTorch 模型的完整图表示，因此当遇到无法跟踪的内容时，它会出错。由于 torch.export 生成的完整图与任何 Python 特性或运行时分离，因此该图可以被保存、加载并在不同的环境和语言中运行。
可用性权衡：由于 torch.compile() 可以在遇到无法跟踪的内容时回退到 Python 运行时，它更加灵活。torch.export 则要求用户提供更多信息或重写其代码以使其可跟踪。

与 torch.fx.symbolic_trace() 相比，torch.export 使用 TorchDynamo 进行跟踪，它在 Python 字节码层面操作，使其能够跟踪不受 Python 操作符重载支持限制的任意 Python 结构。此外，torch.export 精确跟踪张量元数据，因此基于张量形状等条件的控制流不会导致跟踪失败。通常情况下，torch.export 预计可以在更多用户程序上工作，并生成更底层的图（在 torch.ops.aten 操作符层面）。注意，用户仍然可以将 torch.fx.symbolic_trace() 用作 torch.export 之前的预处理步骤。

与 torch.jit.script() 相比，torch.export 不捕获 Python 控制流或数据结构，但它支持比 TorchScript 更多的 Python 语言特性（因为它更容易全面覆盖 Python 字节码）。生成的图更简单，只有直线控制流（除了显式控制流操作符）。

与 torch.jit.trace() 相比，torch.export 是稳健的：它能够跟踪对大小执行整数计算的代码，并记录所有必要的副作用条件，以证明特定跟踪对其他输入有效。

导出 PyTorch 模型¶

一个例子¶

主要入口点是 torch.export.export()，它接受一个可调用对象（torch.nn.Module、函数或方法）和示例输入，并将计算图捕获到 torch.export.ExportedProgram 中。例如：

import torch
from torch.export import export

# Simple module for demonstration
class M(torch.nn.Module):
    def __init__(self) -> None:
        super().__init__()
        self.conv = torch.nn.Conv2d(
            in_channels=3, out_channels=16, kernel_size=3, padding=1
        )
        self.relu = torch.nn.ReLU()
        self.maxpool = torch.nn.MaxPool2d(kernel_size=3)

    def forward(self, x: torch.Tensor, *, constant=None) -> torch.Tensor:
        a = self.conv(x)
        a.add_(constant)
        return self.maxpool(self.relu(a))

example_args = (torch.randn(1, 3, 256, 256),)
example_kwargs = {"constant": torch.ones(1, 16, 256, 256)}

exported_program: torch.export.ExportedProgram = export(
    M(), args=example_args, kwargs=example_kwargs
)
print(exported_program)

ExportedProgram:
    class GraphModule(torch.nn.Module):
    def forward(self, p_conv_weight: "f32[16, 3, 3, 3]", p_conv_bias: "f32[16]", x: "f32[1, 3, 256, 256]", constant: "f32[1, 16, 256, 256]"):
            # code: a = self.conv(x)
            conv2d: "f32[1, 16, 256, 256]" = torch.ops.aten.conv2d.default(x, p_conv_weight, p_conv_bias, [1, 1], [1, 1])

            # code: a.add_(constant)
            add_: "f32[1, 16, 256, 256]" = torch.ops.aten.add_.Tensor(conv2d, constant)

            # code: return self.maxpool(self.relu(a))
            relu: "f32[1, 16, 256, 256]" = torch.ops.aten.relu.default(add_)
            max_pool2d: "f32[1, 16, 85, 85]" = torch.ops.aten.max_pool2d.default(relu, [3, 3], [3, 3])
            return (max_pool2d,)

Graph signature:
    ExportGraphSignature(
        input_specs=[
            InputSpec(
                kind=<InputKind.PARAMETER: 2>,
                arg=TensorArgument(name='p_conv_weight'),
                target='conv.weight',
                persistent=None
            ),
            InputSpec(
                kind=<InputKind.PARAMETER: 2>,
                arg=TensorArgument(name='p_conv_bias'),
                target='conv.bias',
                persistent=None
            ),
            InputSpec(
                kind=<InputKind.USER_INPUT: 1>,
                arg=TensorArgument(name='x'),
                target=None,
                persistent=None
            ),
            InputSpec(
                kind=<InputKind.USER_INPUT: 1>,
                arg=TensorArgument(name='constant'),
                target=None,
                persistent=None
            )
        ],
        output_specs=[
            OutputSpec(
                kind=<OutputKind.USER_OUTPUT: 1>,
                arg=TensorArgument(name='max_pool2d'),
                target=None
            )
        ]
    )
Range constraints: {}

检查 ExportedProgram，我们可以注意到以下几点：

torch.fx.Graph 包含原始程序的计算图，以及原始代码的记录，便于调试。
图中仅包含在此处找到的 torch.ops.aten 操作符和自定义操作符，并且完全函数式，不包含任何原地操作符，例如 torch.add_。
参数（conv 的权重和偏差）被提升为图的输入，导致图中没有 get_attr 节点，这在 torch.fx.symbolic_trace() 的结果中曾存在。
torch.export.ExportGraphSignature 建模了输入和输出签名，并指定哪些输入是参数。
图中每个节点生成的张量的最终形状和数据类型都已注明。例如，convolution 节点将生成一个数据类型为 torch.float32，形状为 (1, 16, 256, 256) 的张量。

非严格导出¶

在 PyTorch 2.3 中，我们引入了一种新的跟踪模式，称为非严格模式 (non-strict mode)。它仍在进行加固，因此如果您遇到任何问题，请将其提交到 Github 并带有“oncall: export”标签。

在*非严格模式*下，我们使用 Python 解释器跟踪程序。您的代码将完全按照 eager 模式下的方式执行；唯一的区别是所有 Tensor 对象将被 ProxyTensor 替换，后者将记录所有操作到图中。

在*严格模式*下（目前是默认设置），我们首先使用 TorchDynamo（一个字节码分析引擎）跟踪程序。TorchDynamo 实际上不执行您的 Python 代码。相反，它对代码进行符号分析并根据结果构建图。这种分析使得 torch.export 能够提供更强的安全性保证，但并非所有 Python 代码都受支持。

一个可能需要使用非严格模式的例子是，如果您遇到了不易解决的 TorchDynamo 不支持的特性，并且您知道该 Python 代码并非计算所必需。例如：

import contextlib
import torch

class ContextManager():
    def __init__(self):
        self.count = 0
    def __enter__(self):
        self.count += 1
    def __exit__(self, exc_type, exc_value, traceback):
        self.count -= 1

class M(torch.nn.Module):
    def forward(self, x):
        with ContextManager():
            return x.sin() + x.cos()

export(M(), (torch.ones(3, 3),), strict=False)  # Non-strict traces successfully
export(M(), (torch.ones(3, 3),))  # Strict mode fails with torch._dynamo.exc.Unsupported: ContextManager

在此示例中，使用非严格模式（通过 strict=False 标志）的首次调用成功跟踪，而使用严格模式（默认）的第二次调用则失败，因为 TorchDynamo 不支持上下文管理器。一种选择是重写代码（参见torch.export 的限制），但考虑到上下文管理器不影响模型中的张量计算，我们可以选择非严格模式的结果。

用于训练和推理的导出¶

在 PyTorch 2.5 中，我们引入了一个名为 export_for_training() 的新 API。它仍在进行加固，因此如果您遇到任何问题，请将其提交到 Github 并带有“oncall: export”标签。

在此 API 中，我们生成最通用的 IR，其中包含所有 ATen 操作符（包括函数式和非函数式），可用于在 eager PyTorch Autograd 中进行训练。此 API 旨在用于 eager 训练用例，例如 PT2 量化，并且很快将成为 torch.export.export 的默认 IR。要进一步了解此更改背后的动机，请参阅 https://dev-discuss.pytorch.org/t/why-pytorch-does-not-need-a-new-standardized-operator-set/2206

当此 API 与 run_decompositions() 结合使用时，您应该能够获得具有任何期望分解行为的推理 IR。

以下是一些示例：

class ConvBatchnorm(torch.nn.Module):
    def __init__(self) -> None:
        super().__init__()
        self.conv = torch.nn.Conv2d(1, 3, 1, 1)
        self.bn = torch.nn.BatchNorm2d(3)

    def forward(self, x):
        x = self.conv(x)
        x = self.bn(x)
        return (x,)

mod = ConvBatchnorm()
inp = torch.randn(1, 1, 3, 3)

ep_for_training = torch.export.export_for_training(mod, (inp,))
print(ep_for_training)

ExportedProgram:
    class GraphModule(torch.nn.Module):
        def forward(self, p_conv_weight: "f32[3, 1, 1, 1]", p_conv_bias: "f32[3]", p_bn_weight: "f32[3]", p_bn_bias: "f32[3]", b_bn_running_mean: "f32[3]", b_bn_running_var: "f32[3]", b_bn_num_batches_tracked: "i64[]", x: "f32[1, 1, 3, 3]"):
            conv2d: "f32[1, 3, 3, 3]" = torch.ops.aten.conv2d.default(x, p_conv_weight, p_conv_bias)
            add_: "i64[]" = torch.ops.aten.add_.Tensor(b_bn_num_batches_tracked, 1)
            batch_norm: "f32[1, 3, 3, 3]" = torch.ops.aten.batch_norm.default(conv2d, p_bn_weight, p_bn_bias, b_bn_running_mean, b_bn_running_var, True, 0.1, 1e-05, True)
            return (batch_norm,)

从上面的输出可以看出，除了图中的操作符之外，export_for_training() 生成的 ExportedProgram 与 export() 生成的基本相同。您可以看到我们以最通用的形式捕获了 batch_norm。此操作符是非函数式的，在运行推理时将被降级为不同的操作符。

您还可以通过 run_decompositions() 并进行任意自定义，从此 IR 转换为推理 IR。

# Lower to core aten inference IR, but keep conv2d
decomp_table = torch.export.default_decompositions()
del decomp_table[torch.ops.aten.conv2d.default]
ep_for_inference = ep_for_training.run_decompositions(decomp_table)

print(ep_for_inference)

ExportedProgram:
    class GraphModule(torch.nn.Module):
        def forward(self, p_conv_weight: "f32[3, 1, 1, 1]", p_conv_bias: "f32[3]", p_bn_weight: "f32[3]", p_bn_bias: "f32[3]", b_bn_running_mean: "f32[3]", b_bn_running_var: "f32[3]", b_bn_num_batches_tracked: "i64[]", x: "f32[1, 1, 3, 3]"):
            conv2d: "f32[1, 3, 3, 3]" = torch.ops.aten.conv2d.default(x, p_conv_weight, p_conv_bias)
            add: "i64[]" = torch.ops.aten.add.Tensor(b_bn_num_batches_tracked, 1)
            _native_batch_norm_legit_functional = torch.ops.aten._native_batch_norm_legit_functional.default(conv2d, p_bn_weight, p_bn_bias, b_bn_running_mean, b_bn_running_var, True, 0.1, 1e-05)
            getitem: "f32[1, 3, 3, 3]" = _native_batch_norm_legit_functional[0]
            getitem_3: "f32[3]" = _native_batch_norm_legit_functional[3]
            getitem_4: "f32[3]" = _native_batch_norm_legit_functional[4]
            return (getitem_3, getitem_4, add, getitem)

在这里您可以看到，我们保留了 IR 中的 conv2d 操作符，同时分解了其余部分。现在，该 IR 是一个函数式 IR，包含核心的 aten 操作符，除了 conv2d。

您还可以通过直接注册您选择的分解行为来实现更多自定义。

您还可以通过直接注册自定义分解行为来实现更多自定义

# Lower to core aten inference IR, but customize conv2d
decomp_table = torch.export.default_decompositions()

def my_awesome_custom_conv2d_function(x, weight, bias, stride=[1, 1], padding=[0, 0], dilation=[1, 1], groups=1):
    return 2 * torch.ops.aten.convolution(x, weight, bias, stride, padding, dilation, False, [0, 0], groups)

decomp_table[torch.ops.aten.conv2d.default] = my_awesome_conv2d_function
ep_for_inference = ep_for_training.run_decompositions(decomp_table)

print(ep_for_inference)

ExportedProgram:
    class GraphModule(torch.nn.Module):
        def forward(self, p_conv_weight: "f32[3, 1, 1, 1]", p_conv_bias: "f32[3]", p_bn_weight: "f32[3]", p_bn_bias: "f32[3]", b_bn_running_mean: "f32[3]", b_bn_running_var: "f32[3]", b_bn_num_batches_tracked: "i64[]", x: "f32[1, 1, 3, 3]"):
            convolution: "f32[1, 3, 3, 3]" = torch.ops.aten.convolution.default(x, p_conv_weight, p_conv_bias, [1, 1], [0, 0], [1, 1], False, [0, 0], 1)
            mul: "f32[1, 3, 3, 3]" = torch.ops.aten.mul.Tensor(convolution, 2)
            add: "i64[]" = torch.ops.aten.add.Tensor(b_bn_num_batches_tracked, 1)
            _native_batch_norm_legit_functional = torch.ops.aten._native_batch_norm_legit_functional.default(mul, p_bn_weight, p_bn_bias, b_bn_running_mean, b_bn_running_var, True, 0.1, 1e-05)
            getitem: "f32[1, 3, 3, 3]" = _native_batch_norm_legit_functional[0]
            getitem_3: "f32[3]" = _native_batch_norm_legit_functional[3]
            getitem_4: "f32[3]" = _native_batch_norm_legit_functional[4];
            return (getitem_3, getitem_4, add, getitem)

表达动态性¶

默认情况下，torch.export 将假定所有输入形状都是静态的，并针对这些维度特化导出的程序。然而，某些维度，例如批处理维度，可以是动态的并且在每次运行时不同。这些维度必须通过使用 torch.export.Dim() API 创建它们，并通过 dynamic_shapes 参数将其传递给 torch.export.export() 来指定。例如：

import torch
from torch.export import Dim, export

class M(torch.nn.Module):
    def __init__(self):
        super().__init__()

        self.branch1 = torch.nn.Sequential(
            torch.nn.Linear(64, 32), torch.nn.ReLU()
        )
        self.branch2 = torch.nn.Sequential(
            torch.nn.Linear(128, 64), torch.nn.ReLU()
        )
        self.buffer = torch.ones(32)

    def forward(self, x1, x2):
        out1 = self.branch1(x1)
        out2 = self.branch2(x2)
        return (out1 + self.buffer, out2)

example_args = (torch.randn(32, 64), torch.randn(32, 128))

# Create a dynamic batch size
batch = Dim("batch")
# Specify that the first dimension of each input is that batch size
dynamic_shapes = {"x1": {0: batch}, "x2": {0: batch}}

exported_program: torch.export.ExportedProgram = export(
    M(), args=example_args, dynamic_shapes=dynamic_shapes
)
print(exported_program)

ExportedProgram:
class GraphModule(torch.nn.Module):
    def forward(self, p_branch1_0_weight: "f32[32, 64]", p_branch1_0_bias: "f32[32]", p_branch2_0_weight: "f32[64, 128]", p_branch2_0_bias: "f32[64]", c_buffer: "f32[32]", x1: "f32[s0, 64]", x2: "f32[s0, 128]"):

         # code: out1 = self.branch1(x1)
        linear: "f32[s0, 32]" = torch.ops.aten.linear.default(x1, p_branch1_0_weight, p_branch1_0_bias)
        relu: "f32[s0, 32]" = torch.ops.aten.relu.default(linear)

         # code: out2 = self.branch2(x2)
        linear_1: "f32[s0, 64]" = torch.ops.aten.linear.default(x2, p_branch2_0_weight, p_branch2_0_bias)
        relu_1: "f32[s0, 64]" = torch.ops.aten.relu.default(linear_1)

         # code: return (out1 + self.buffer, out2)
        add: "f32[s0, 32]" = torch.ops.aten.add.Tensor(relu, c_buffer)
        return (add, relu_1)

Range constraints: {s0: VR[0, int_oo]}

一些额外需要注意的地方：

通过 torch.export.Dim() API 和 dynamic_shapes 参数，我们将每个输入的第一个维度指定为动态的。查看输入 x1 和 x2，它们的符号形状是 (s0, 64) 和 (s0, 128)，而不是我们作为示例输入传递的 (32, 64) 和 (32, 128) 形状张量。s0 是一个符号，表示此维度可以是一系列值。
exported_program.range_constraints 描述了图中出现的每个符号的范围。在本例中，我们看到 s0 的范围是 [0, int_oo]。出于难以在此解释的技术原因，它们被假定不为 0 或 1。这并非一个错误，也不一定意味着导出的程序对于维度 0 或 1 将无法工作。有关此主题的深入讨论，请参阅 The 0/1 Specialization Problem。

我们还可以指定输入形状之间更具表达性的关系，例如一对形状可能相差一，一个形状可能是另一个的两倍，或者一个形状是偶数。例如：

class M(torch.nn.Module):
    def forward(self, x, y):
        return x + y[1:]

x, y = torch.randn(5), torch.randn(6)
dimx = torch.export.Dim("dimx", min=3, max=6)
dimy = dimx + 1

exported_program = torch.export.export(
    M(), (x, y), dynamic_shapes=({0: dimx}, {0: dimy}),
)
print(exported_program)

ExportedProgram:
class GraphModule(torch.nn.Module):
    def forward(self, x: "f32[s0]", y: "f32[s0 + 1]"):
        # code: return x + y[1:]
        slice_1: "f32[s0]" = torch.ops.aten.slice.Tensor(y, 0, 1, 9223372036854775807)
        add: "f32[s0]" = torch.ops.aten.add.Tensor(x, slice_1)
        return (add,)

Range constraints: {s0: VR[3, 6], s0 + 1: VR[4, 7]}

一些需要注意的地方：

通过为第一个输入指定 {0: dimx}，我们看到第一个输入的最终形状现在是动态的，为 [s0]。现在通过为第二个输入指定 {0: dimy}，我们看到第二个输入的最终形状也是动态的。然而，因为我们表达了 dimy = dimx + 1，所以 y 的形状没有包含新的符号，而是与 x 中使用的符号 s0 表示。我们可以看到 dimy = dimx + 1 的关系通过 s0 + 1 显示。
查看范围约束，我们看到 s0 的范围是 [3, 6]，这是最初指定的；我们可以看到 s0 + 1 的求解范围是 [4, 7]。

序列化¶

要保存 ExportedProgram，用户可以使用 torch.export.save() 和 torch.export.load() API。一个约定是使用 .pt2 文件扩展名保存 ExportedProgram。

一个例子：

import torch
import io

class MyModule(torch.nn.Module):
    def forward(self, x):
        return x + 10

exported_program = torch.export.export(MyModule(), torch.randn(5))

torch.export.save(exported_program, 'exported_program.pt2')
saved_exported_program = torch.export.load('exported_program.pt2')

特化¶

理解 torch.export 行为的一个关键概念是静态值和动态值之间的区别。

一个动态值是每次运行时可以改变的值。它们的行为类似于 Python 函数的普通参数——您可以为参数传递不同的值，并期望您的函数做正确的事情。张量*数据*被视为动态的。

一个静态值是在导出时固定的值，并且不能在导出程序的执行之间改变。当在跟踪期间遇到该值时，导出器会将其视为常量并将其硬编码到图中。

当执行操作（例如 x + y）且所有输入都是静态的时，操作的输出将直接硬编码到图中，并且操作本身将不会出现在图中（即它将被常量折叠）。

当一个值被硬编码到图中时，我们称该图已针对该值进行了*特化*。

以下值为静态：

输入张量形状¶

默认情况下，torch.export 将跟踪程序时特化输入张量的形状，除非通过 torch.export 的 dynamic_shapes 参数将某个维度指定为动态。这意味着如果存在依赖于形状的控制流，torch.export 将特化到使用给定示例输入时所采取的分支。例如：

import torch
from torch.export import export

class Mod(torch.nn.Module):
    def forward(self, x):
        if x.shape[0] > 5:
            return x + 1
        else:
            return x - 1

example_inputs = (torch.rand(10, 2),)
exported_program = export(Mod(), example_inputs)
print(exported_program)

ExportedProgram:
class GraphModule(torch.nn.Module):
    def forward(self, x: "f32[10, 2]"):
        # code: return x + 1
        add: "f32[10, 2]" = torch.ops.aten.add.Tensor(x, 1)
        return (add,)

条件表达式 (x.shape[0] > 5) 不会出现在 ExportedProgram 中，因为示例输入的静态形状是 (10, 2)。由于 torch.export 会特化到输入的静态形状，因此 else 分支 (x - 1) 将永远不会被执行到。为了在跟踪图中保留基于张量形状的动态分支行为，需要使用 torch.export.Dim() 指定输入张量 (x.shape[0]) 的维度是动态的，并且需要重写源代码。

请注意，作为模块状态一部分的张量（例如参数和缓冲区）总是具有静态形状。

Python 原语¶

torch.export 还会特化 Python 原语，例如 int, float, bool 和 str。然而，它们确实有动态变体，例如 SymInt, SymFloat 和 SymBool。

例如：

import torch
from torch.export import export

class Mod(torch.nn.Module):
    def forward(self, x: torch.Tensor, const: int, times: int):
        for i in range(times):
            x = x + const
        return x

example_inputs = (torch.rand(2, 2), 1, 3)
exported_program = export(Mod(), example_inputs)
print(exported_program)

ExportedProgram:
    class GraphModule(torch.nn.Module):
        def forward(self, x: "f32[2, 2]", const, times):
            # code: x = x + const
            add: "f32[2, 2]" = torch.ops.aten.add.Tensor(x, 1)
            add_1: "f32[2, 2]" = torch.ops.aten.add.Tensor(add, 1)
            add_2: "f32[2, 2]" = torch.ops.aten.add.Tensor(add_1, 1)
            return (add_2,)

因为整数是特殊化的，torch.ops.aten.add.Tensor 操作都使用硬编码常量 1 计算，而不是 const。如果在运行时，用户为 const 传入一个与导出时使用的值 1 不同的值（例如 2），这将导致错误。此外，for 循环中使用的 times 迭代器也通过 3 次重复的 torch.ops.aten.add.Tensor 调用在图中被“内联”了，并且输入 times 从未使用。

Python 容器¶

Python 容器（List、Dict、NamedTuple 等）被认为是具有静态结构的。

`torch.export` 的限制¶

图中断¶

`torch.export` 是一种一次性捕获 PyTorch 程序计算图的过程，由于几乎不可能支持跟踪所有 PyTorch 和 Python 特性，它最终可能会遇到程序中无法跟踪的部分。对于 torch.compile，不支持的操作会导致“图中断”，并且不支持的操作将使用默认的 Python 求值方式运行。相比之下，torch.export 将要求用户提供额外信息或重写部分代码使其可跟踪。由于跟踪是基于在 Python 字节码级别进行求值的 TorchDynamo，与以前的跟踪框架相比，所需的重写将显著减少。

遇到图中断时，ExportDB 是一个很好的资源，可以了解支持和不支持的程序类型，以及重写程序使其可跟踪的方法。

绕过这些图中断的一种选择是使用非严格导出（non-strict export）

数据/形状相关的控制流¶

当形状未被特殊化时，在数据相关的控制流（如 if x.shape[0] > 2）上也可能遇到图中断，因为跟踪编译器不可能在不为组合爆炸式的路径数量生成代码的情况下处理这种情况。在这种情况下，用户需要使用特殊的控制流操作符重写其代码。目前，我们支持 torch.cond 来表达类似 if-else 的控制流（更多内容即将推出！）。

操作符缺失 Fake/Meta/Abstract 内核¶

在跟踪时，所有操作符都需要一个 FakeTensor 内核（也称为 meta 内核、abstract impl）。这用于推断此操作符的输入/输出形状。

更多详情请参见 torch.library.register_fake()。

不幸的是，如果您的模型使用了尚无 FakeTensor 内核实现的 ATen 操作符，请提交一个 issue。

API 参考¶

torch.export.export(mod, args, kwargs=None, *, dynamic_shapes=None, strict=True, preserve_module_call_signature=())[source][source]¶

`export()` 接受任何 `nn.Module` 以及示例输入，并以前向编译 (AOT) 的方式生成一个表示函数中仅包含张量计算的跟踪图，该图随后可以用于不同的输入进行执行或序列化。跟踪图 (1) 在功能性 ATen 操作符集合中生成规范化的操作符（以及任何用户指定的自定义操作符），(2) 消除了所有 Python 控制流和数据结构（某些情况除外），并且 (3) 记录了证明这种规范化和控制流消除对于未来的输入是健全所需的一组形状约束。

健全性保证

在跟踪过程中，`export()` 会记录用户程序和底层 PyTorch 操作符内核做出的与形状相关的假设。只有当这些假设成立时，输出的 ExportedProgram 才被认为是有效的。

跟踪对输入张量的形状（而非值）做出假设。为了使 export() 成功，这些假设必须在图捕获时进行验证。具体而言

对输入张量静态形状的假设无需额外工作即可自动验证。
对输入张量动态形状的假设需要通过使用 Dim() API 构建动态维度，并通过 dynamic_shapes 参数将它们与示例输入关联起来进行显式指定。

如果任何假设无法验证，将引发致命错误。发生这种情况时，错误消息将包含验证假设所需的规范建议修复。例如，export() 可能会建议对动态维度 dim0_x 的定义进行以下修复，该维度可能出现在与输入 x 关联的形状中，之前被定义为 Dim("dim0_x")

dim = Dim("dim0_x", max=5)

此示例意味着生成的代码要求输入 x 的维度 0 小于或等于 5 才能有效。您可以检查动态维度定义的建议修复，然后将其逐字复制到您的代码中，而无需更改对 export() 的调用中的 dynamic_shapes 参数。

参数

mod (Module) – 将跟踪此模块的 forward 方法。
args (tuple[Any, ...]) – 示例位置参数输入。
kwargs (Optional[dict[str, Any]]) – 可选的示例关键字输入。
dynamic_shapes (Optional[Union[dict[str, Any], tuple[Any], list[Any]]]) –
一个可选参数，其类型应为：1) 一个字典，将 f 的参数名映射到其动态形状规范；2) 一个元组，按原始顺序指定每个输入的动态形状规范。如果您正在为关键字参数指定动态性，则需要按照原始函数签名中定义的顺序传递它们。

张量参数的动态形状可以指定为 (1) 一个字典，将动态维度索引映射到 Dim() 类型，此字典中无需包含静态维度索引，但如果包含，则应映射到 None；或 (2) 一个 Dim() 类型或 None 的元组/列表，其中 Dim() 类型对应于动态维度，静态维度由 None 表示。作为字典或张量元组/列表的参数，则通过使用包含规范的映射或序列进行递归指定。
strict (bool) – 启用时（默认），导出函数将通过 TorchDynamo 跟踪程序，这将确保生成图的健全性。否则，导出的程序将不会验证图中隐含的假设，可能导致原始模型与导出模型之间行为不一致。当用户需要解决跟踪器中的错误，或者只是想逐步在其模型中启用安全性时，此选项很有用。请注意，这不会影响生成的 IR 规范有所不同，无论此处传递什么值，模型都将以相同的方式序列化。警告：此选项是实验性的，请自行承担风险使用。

返回值

包含跟踪的可调用对象的 ExportedProgram。

返回类型

ExportedProgram

可接受的输入/输出类型

可接受的输入（对于 args 和 kwargs）和输出类型包括

基本类型，即 torch.Tensor、int、float、bool 和 str。
数据类 (Dataclasses)，但必须先调用 register_dataclass() 进行注册。
包含上述所有类型的嵌套数据结构，包括 dict、list、tuple、namedtuple 和 OrderedDict。

torch.export.save(ep, f, *, extra_files=None, opset_version=None, pickle_protocol=2)[source][source]¶

警告

正在积极开发中，保存的文件在 PyTorch 的新版本中可能无法使用。

将一个 ExportedProgram 保存到文件类对象中。然后可以使用 Python API torch.export.load 加载它。

参数

ep (ExportedProgram) – 要保存的导出程序。
f (str | os.PathLike[str] | IO[bytes]) – 实现 write 和 flush) 或包含文件名的字符串。
extra_files (Optional[Dict[str, Any]]) – 文件名到内容的映射，这些内容将作为 f 的一部分存储。
opset_version (Optional[Dict[str, int]]) – 将 opset 名称映射到此 opset 版本的一个映射
pickle_protocol (int) – 可以指定以覆盖默认协议

示例

import torch
import io

class MyModule(torch.nn.Module):
    def forward(self, x):
        return x + 10

ep = torch.export.export(MyModule(), (torch.randn(5),))

# Save to file
torch.export.save(ep, 'exported_program.pt2')

# Save to io.BytesIO buffer
buffer = io.BytesIO()
torch.export.save(ep, buffer)

# Save with extra files
extra_files = {'foo.txt': b'bar'.decode('utf-8')}
torch.export.save(ep, 'exported_program.pt2', extra_files=extra_files)

torch.export.load(f, *, extra_files=None, expected_opset_version=None)[source][source]¶

警告

正在积极开发中，保存的文件在 PyTorch 的新版本中可能无法使用。

加载之前使用 torch.export.save 保存的 ExportedProgram。

参数

f (str | os.PathLike[str] | IO[bytes]) – 一个文件类对象（必须实现 write 和 flush）或包含文件名的字符串。
extra_files (Optional[Dict[str, Any]]) – 此映射中给定的额外文件名将被加载，其内容将存储在提供的映射中。
expected_opset_version (Optional[Dict[str, int]]) – 将 opset 名称映射到预期 opset 版本的一个映射

返回值

一个 ExportedProgram 对象

返回类型

ExportedProgram

示例

import torch
import io

# Load ExportedProgram from file
ep = torch.export.load('exported_program.pt2')

# Load ExportedProgram from io.BytesIO object
with open('exported_program.pt2', 'rb') as f:
    buffer = io.BytesIO(f.read())
buffer.seek(0)
ep = torch.export.load(buffer)

# Load with extra files.
extra_files = {'foo.txt': ''}  # values will be replaced with data
ep = torch.export.load('exported_program.pt2', extra_files=extra_files)
print(extra_files['foo.txt'])
print(ep(torch.randn(5)))

torch.export.register_dataclass(cls, *, serialized_type_name=None)[source][source]¶

将一个数据类注册为 torch.export.export() 的有效输入/输出类型。

参数

cls (type[Any]) – 要注册的数据类类型
serialized_type_name (Optional[str]) – 数据类的序列化名称。这
this (如果您想要序列化包含此数据类的 pytree TreeSpec，则此参数是必需的。) –
dataclass. –

示例

import torch
from dataclasses import dataclass

@dataclass
class InputDataClass:
    feature: torch.Tensor
    bias: int

@dataclass
class OutputDataClass:
    res: torch.Tensor

torch.export.register_dataclass(InputDataClass)
torch.export.register_dataclass(OutputDataClass)

class Mod(torch.nn.Module):
    def forward(self, x: InputDataClass) -> OutputDataClass:
        res = x.feature + x.bias
        return OutputDataClass(res=res)

ep = torch.export.export(Mod(), (InputDataClass(torch.ones(2, 2), 1), ))
print(ep)

torch.export.dynamic_shapes.Dim(name, *, min=None, max=None)[source][source]¶

Dim() 构建一个类型，类似于带范围的命名符号整数。它可以用来描述动态张量维度的多个可能值。请注意，同一张量的不同动态维度，或不同张量的动态维度，可以使用相同的类型来描述。

参数

name (str) – 用于调试的人类可读名称。
min (Optional[int]) – 给定符号的最小可能值（包含）。
max (Optional[int]) – 给定符号的最大可能值（包含）。

返回值

可用于张量动态形状规范的类型。

torch.export.exported_program.default_decompositions()[source][source]¶

这是默认的分解表，其中包含将所有 ATEN 操作符分解为核心 aten opset 的规则。将此 API 与 run_decompositions() 一起使用。

返回类型: CustomDecompTable

torch.export.dims(*names, min=None, max=None)[source][source]¶

用于创建多个 Dim() 类型的实用工具。

返回值: 一个包含 Dim() 类型的元组。
返回类型: tuple[torch.export.dynamic_shapes._Dim, …]

class torch.export.dynamic_shapes.ShapesCollection[source][source]¶

`dynamic_shapes` 的构建器。用于将动态形状规范分配给出现在输入中的张量。

这在 args() 是嵌套输入结构时特别有用，此时索引输入张量比在 dynamic_shapes() 规范中复制 args() 的结构更容易。

示例

args = ({"x": tensor_x, "others": [tensor_y, tensor_z]})

dim = torch.export.Dim(...)
dynamic_shapes = torch.export.ShapesCollection()
dynamic_shapes[tensor_x] = (dim, dim + 1, 8)
dynamic_shapes[tensor_y] = {0: dim * 2}
# This is equivalent to the following (now auto-generated):
# dynamic_shapes = {"x": (dim, dim + 1, 8), "others": [{0: dim * 2}, None]}

torch.export(..., args, dynamic_shapes=dynamic_shapes)

dynamic_shapes(m, args, kwargs=None)[source][source]¶: 根据 args() 和 kwargs() 生成 dynamic_shapes() 的 pytree 结构。

torch.export.dynamic_shapes.refine_dynamic_shapes_from_suggested_fixes(msg, dynamic_shapes)[source][source]¶

使用 dynamic_shapes() 导出时，如果规范与从模型跟踪中推断出的约束不匹配，则导出可能会因 ConstraintViolation 错误而失败。错误消息可能会提供建议修复——可以对 dynamic_shapes() 进行的更改，以便成功导出。

ConstraintViolation 错误消息示例

Suggested fixes:

    dim = Dim('dim', min=3, max=6)  # this just refines the dim's range
    dim = 4  # this specializes to a constant
    dy = dx + 1  # dy was specified as an independent dim, but is actually tied to dx with this relation

这是一个辅助函数，它接受 ConstraintViolation 错误消息和原始的 dynamic_shapes() 规范，并返回一个包含建议修复的新 dynamic_shapes() 规范。

示例用法

try:
    ep = export(mod, args, dynamic_shapes=dynamic_shapes)
except torch._dynamo.exc.UserError as exc:
    new_shapes = refine_dynamic_shapes_from_suggested_fixes(
        exc.msg, dynamic_shapes
    )
    ep = export(mod, args, dynamic_shapes=new_shapes)

返回类型: Union[dict[str, Any], tuple[Any], list[Any]]

torch.export.Constraint¶: Union [_Constraint、_DerivedConstraint、_RelaxedConstraint] 的别名。

class torch.export.ExportedProgram(root, graph, graph_signature, state_dict, range_constraints, module_call_graph, example_inputs=None, constants=None, *, verifiers=None)[source][source]¶

`export()` 生成的程序包。它包含一个表示张量计算的 torch.fx.Graph、一个包含所有 lifted 参数和 buffers 的张量值的 `state_dict`，以及各种元数据。

您可以像调用 export() 跟踪的原始可调用对象一样，以相同的调用约定调用 ExportedProgram。

要在图上执行变换，请使用 `.module` 属性访问一个 torch.fx.GraphModule。然后可以使用 FX 变换来重写图。之后，您可以简单地再次使用 export() 构建一个正确的 ExportedProgram。

module()[source][source]¶

返回一个自包含的 GraphModule，其中所有参数/缓冲区都已内联。

返回类型

模块

buffers()[source][source]¶

返回一个用于遍历原始模块缓冲区的迭代器。

警告

此API是实验性的，并且 *不* 向后兼容。

返回类型: Iterator[Tensor]

named_buffers()[source][source]¶

返回一个用于遍历原始模块缓冲区的迭代器，生成缓冲区的名称及其缓冲区本身。

警告

此API是实验性的，并且 *不* 向后兼容。

返回类型: Iterator[tuple[str, torch.Tensor]]

parameters()[source][source]¶

返回一个用于遍历原始模块参数的迭代器。

警告

此API是实验性的，并且 *不* 向后兼容。

返回类型: Iterator[Parameter]

named_parameters()[source][source]¶

返回一个用于遍历原始模块参数的迭代器，生成参数的名称及其参数本身。

警告

此API是实验性的，并且 *不* 向后兼容。

返回类型: Iterator[tuple[str, torch.nn.parameter.Parameter]]

run_decompositions(decomp_table=None, decompose_custom_triton_ops=False)[source][source]¶

对导出的程序运行一组分解操作，并返回一个新的导出程序。默认情况下，我们将运行核心 ATen 分解以获取 Core ATen Operator Set 中的操作符。

目前，我们不会分解联合图。

参数: decomp_table (Optional[dict[torch._ops.OperatorBase, Callable]]) – 一个可选参数，用于指定 Aten 操作的分解行为。(1) 如果为None，我们将分解为核心aten分解。(2) 如果为空，我们不会分解任何操作符
返回类型: ExportedProgram

一些示例

如果您不想分解任何东西

ep = torch.export.export(model, ...)
ep = ep.run_decompositions(decomp_table={})

如果您想获取核心aten操作符集，但排除某些操作符，您可以执行以下操作

ep = torch.export.export(model, ...)
decomp_table = torch.export.default_decompositions()
decomp_table[your_op] = your_custom_decomp
ep = ep.run_decompositions(decomp_table=decomp_table)

class torch.export.ExportBackwardSignature(gradients_to_parameters: dict[str, str], gradients_to_user_inputs: dict[str, str], loss_output: str)[source][source]¶

class torch.export.ExportGraphSignature(input_specs, output_specs)[source][source]¶

ExportGraphSignature 建模了 Export Graph 的输入/输出签名，它是一个具有更强不变量保证的 fx.Graph。

Export Graph 是函数式的，并且不通过 getattr 节点访问图中的参数或缓冲区等“状态”。相反， export() 保证参数、缓冲区和常量张量作为输入从图中提取出来。类似地，对缓冲区的任何变动也不包含在图中，相反，变动后的缓冲区的更新值被建模为 Export Graph 的附加输出。

所有输入和输出的顺序是

Inputs = [*parameters_buffers_constant_tensors, *flattened_user_inputs]
Outputs = [*mutated_inputs, *flattened_user_outputs]

例如，如果导出以下模块

class CustomModule(nn.Module):
    def __init__(self) -> None:
        super(CustomModule, self).__init__()

        # Define a parameter
        self.my_parameter = nn.Parameter(torch.tensor(2.0))

        # Define two buffers
        self.register_buffer('my_buffer1', torch.tensor(3.0))
        self.register_buffer('my_buffer2', torch.tensor(4.0))

    def forward(self, x1, x2):
        # Use the parameter, buffers, and both inputs in the forward method
        output = (x1 + self.my_parameter) * self.my_buffer1 + x2 * self.my_buffer2

        # Mutate one of the buffers (e.g., increment it by 1)
        self.my_buffer2.add_(1.0) # In-place addition

        return output

结果图为

graph():
    %arg0_1 := placeholder[target=arg0_1]
    %arg1_1 := placeholder[target=arg1_1]
    %arg2_1 := placeholder[target=arg2_1]
    %arg3_1 := placeholder[target=arg3_1]
    %arg4_1 := placeholder[target=arg4_1]
    %add_tensor := call_function[target=torch.ops.aten.add.Tensor](args = (%arg3_1, %arg0_1), kwargs = {})
    %mul_tensor := call_function[target=torch.ops.aten.mul.Tensor](args = (%add_tensor, %arg1_1), kwargs = {})
    %mul_tensor_1 := call_function[target=torch.ops.aten.mul.Tensor](args = (%arg4_1, %arg2_1), kwargs = {})
    %add_tensor_1 := call_function[target=torch.ops.aten.add.Tensor](args = (%mul_tensor, %mul_tensor_1), kwargs = {})
    %add_tensor_2 := call_function[target=torch.ops.aten.add.Tensor](args = (%arg2_1, 1.0), kwargs = {})
    return (add_tensor_2, add_tensor_1)

结果 ExportGraphSignature 为

ExportGraphSignature(
    input_specs=[
        InputSpec(kind=<InputKind.PARAMETER: 2>, arg=TensorArgument(name='arg0_1'), target='my_parameter'),
        InputSpec(kind=<InputKind.BUFFER: 3>, arg=TensorArgument(name='arg1_1'), target='my_buffer1'),
        InputSpec(kind=<InputKind.BUFFER: 3>, arg=TensorArgument(name='arg2_1'), target='my_buffer2'),
        InputSpec(kind=<InputKind.USER_INPUT: 1>, arg=TensorArgument(name='arg3_1'), target=None),
        InputSpec(kind=<InputKind.USER_INPUT: 1>, arg=TensorArgument(name='arg4_1'), target=None)
    ],
    output_specs=[
        OutputSpec(kind=<OutputKind.BUFFER_MUTATION: 3>, arg=TensorArgument(name='add_2'), target='my_buffer2'),
        OutputSpec(kind=<OutputKind.USER_OUTPUT: 1>, arg=TensorArgument(name='add_1'), target=None)
    ]
)

class torch.export.ModuleCallSignature(inputs: list[Union[torch.export.graph_signature.TensorArgument, torch.export.graph_signature.SymIntArgument, torch.export.graph_signature.SymFloatArgument, torch.export.graph_signature.SymBoolArgument, torch.export.graph_signature.ConstantArgument, torch.export.graph_signature.CustomObjArgument, torch.export.graph_signature.TokenArgument]], outputs: list[Union[torch.export.graph_signature.TensorArgument, torch.export.graph_signature.SymIntArgument, torch.export.graph_signature.SymFloatArgument, torch.export.graph_signature.SymBoolArgument, torch.export.graph_signature.ConstantArgument, torch.export.graph_signature.CustomObjArgument, torch.export.graph_signature.TokenArgument]], in_spec: torch.utils._pytree.TreeSpec, out_spec: torch.utils._pytree.TreeSpec, forward_arg_names: Optional[list[str]] = None)[source][source]¶

class torch.export.ModuleCallEntry(fqn: str, signature: Optional[torch.export.exported_program.ModuleCallSignature] = None)[source][source]¶

class torch.export.decomp_utils.CustomDecompTable[source][source]¶

这是一个自定义字典，专门用于在导出中处理 decomp_table。我们需要它的原因是，在新的体系中，您只能通过 *删除* decomp table 中的操作符来保留它。这对于自定义操作符来说是个问题，因为我们不知道自定义操作符何时实际加载到调度器（dispatcher）中。因此，我们需要记录自定义操作符的操作，直到我们真正需要具象化它（也就是运行分解过程时）。

我们维护的不变量是

所有 aten 分解在初始化时加载
当用户从表中读取时，我们会具象化所有操作符，以使调度器更有可能加载自定义操作符。
如果是写入操作，我们不一定具象化
我们在导出期间最后一次加载，就在调用 run_decompositions() 之前。

copy()[source][source]¶

返回类型: CustomDecompTable

items()[source][source]¶

keys()[source][source]¶

materialize()[source][source]¶

返回类型: dict[torch._ops.OperatorBase, Callable]

pop(*args)[source][source]¶

update(other_dict)[source][source]¶

class torch.export.graph_signature.InputKind(value)[source][source]¶: 一个枚举类。

class torch.export.graph_signature.InputSpec(kind: torch.export.graph_signature.InputKind, arg: Union[torch.export.graph_signature.TensorArgument, torch.export.graph_signature.SymIntArgument, torch.export.graph_signature.SymFloatArgument, torch.export.graph_signature.SymBoolArgument, torch.export.graph_signature.ConstantArgument, torch.export.graph_signature.CustomObjArgument, torch.export.graph_signature.TokenArgument], target: Optional[str], persistent: Optional[bool] = None)[source][source]¶

class torch.export.graph_signature.OutputKind(value)[source][source]¶: 一个枚举类。

class torch.export.graph_signature.OutputSpec(kind: torch.export.graph_signature.OutputKind, arg: Union[torch.export.graph_signature.TensorArgument, torch.export.graph_signature.SymIntArgument, torch.export.graph_signature.SymFloatArgument, torch.export.graph_signature.SymBoolArgument, torch.export.graph_signature.ConstantArgument, torch.export.graph_signature.CustomObjArgument, torch.export.graph_signature.TokenArgument], target: Optional[str])[source][source]¶

class torch.export.graph_signature.SymIntArgument(name: str)[source][source]¶

class torch.export.graph_signature.SymBoolArgument(name: str)[source][source]¶

class torch.export.graph_signature.SymFloatArgument(name: str)[source][source]¶

class torch.export.graph_signature.ExportGraphSignature(input_specs, output_specs)[source][source]¶

ExportGraphSignature 建模了 Export Graph 的输入/输出签名，它是一个具有更强不变量保证的 fx.Graph。

Export Graph 是函数式的，并且不通过 getattr 节点访问图中的参数或缓冲区等“状态”。相反，export() 保证参数、缓冲区和常量张量作为输入从图中提取出来。类似地，对缓冲区的任何变动也不包含在图中，相反，变动后的缓冲区的更新值被建模为 Export Graph 的附加输出。

所有输入和输出的顺序是

Inputs = [*parameters_buffers_constant_tensors, *flattened_user_inputs]
Outputs = [*mutated_inputs, *flattened_user_outputs]

例如，如果导出以下模块

class CustomModule(nn.Module):
    def __init__(self) -> None:
        super(CustomModule, self).__init__()

        # Define a parameter
        self.my_parameter = nn.Parameter(torch.tensor(2.0))

        # Define two buffers
        self.register_buffer('my_buffer1', torch.tensor(3.0))
        self.register_buffer('my_buffer2', torch.tensor(4.0))

    def forward(self, x1, x2):
        # Use the parameter, buffers, and both inputs in the forward method
        output = (x1 + self.my_parameter) * self.my_buffer1 + x2 * self.my_buffer2

        # Mutate one of the buffers (e.g., increment it by 1)
        self.my_buffer2.add_(1.0) # In-place addition

        return output

结果图为

graph():
    %arg0_1 := placeholder[target=arg0_1]
    %arg1_1 := placeholder[target=arg1_1]
    %arg2_1 := placeholder[target=arg2_1]
    %arg3_1 := placeholder[target=arg3_1]
    %arg4_1 := placeholder[target=arg4_1]
    %add_tensor := call_function[target=torch.ops.aten.add.Tensor](args = (%arg3_1, %arg0_1), kwargs = {})
    %mul_tensor := call_function[target=torch.ops.aten.mul.Tensor](args = (%add_tensor, %arg1_1), kwargs = {})
    %mul_tensor_1 := call_function[target=torch.ops.aten.mul.Tensor](args = (%arg4_1, %arg2_1), kwargs = {})
    %add_tensor_1 := call_function[target=torch.ops.aten.add.Tensor](args = (%mul_tensor, %mul_tensor_1), kwargs = {})
    %add_tensor_2 := call_function[target=torch.ops.aten.add.Tensor](args = (%arg2_1, 1.0), kwargs = {})
    return (add_tensor_2, add_tensor_1)

结果 ExportGraphSignature 为

ExportGraphSignature(
    input_specs=[
        InputSpec(kind=<InputKind.PARAMETER: 2>, arg=TensorArgument(name='arg0_1'), target='my_parameter'),
        InputSpec(kind=<InputKind.BUFFER: 3>, arg=TensorArgument(name='arg1_1'), target='my_buffer1'),
        InputSpec(kind=<InputKind.BUFFER: 3>, arg=TensorArgument(name='arg2_1'), target='my_buffer2'),
        InputSpec(kind=<InputKind.USER_INPUT: 1>, arg=TensorArgument(name='arg3_1'), target=None),
        InputSpec(kind=<InputKind.USER_INPUT: 1>, arg=TensorArgument(name='arg4_1'), target=None)
    ],
    output_specs=[
        OutputSpec(kind=<OutputKind.BUFFER_MUTATION: 3>, arg=TensorArgument(name='add_2'), target='my_buffer2'),
        OutputSpec(kind=<OutputKind.USER_OUTPUT: 1>, arg=TensorArgument(name='add_1'), target=None)
    ]
)

replace_all_uses(old, new)[source][source]¶

在签名中，将旧名称的所有使用替换为新名称。

get_replace_hook(replace_inputs=False)[source][source]¶

class torch.export.graph_signature.CustomObjArgument(name: str, class_fqn: str, fake_val: Optional[torch._library.fake_class_registry.FakeScriptObject] = None)[source][source]¶

class torch.export.unflatten.FlatArgsAdapter[source][source]¶

将输入参数与 input_spec 适配，以对齐 target_spec。

abstract adapt(target_spec, input_spec, input_args, metadata=None)[source][source]¶

注意：此适配器可能会修改给定的 input_args_with_path。

返回类型: list[Any]

class torch.export.unflatten.InterpreterModule(graph, ty=None)[source][source]¶

一个使用 torch.fx.Interpreter 来执行的模块，而不是使用 GraphModule 通常的代码生成方式。这提供了更好的堆栈跟踪信息，并使得调试执行更容易。

class torch.export.unflatten.InterpreterModuleDispatcher(attrs, call_modules)[source][source]¶

一个包含一系列 InterpreterModules 的模块，这些 InterpreterModules 对应于该模块的一系列调用。对模块的每次调用都会分派给下一个 InterpreterModule，并在最后一个之后循环回第一个。

torch.export.unflatten.unflatten(module, flat_args_adapter=None)[source][source]¶

展平 ExportedProgram，产生一个具有与原始 eager 模块相同模块层级的模块。当您尝试将 torch.export 用于期望模块层级而不是 torch.export 通常产生的平坦图的另一个系统时，这可能很有用。

注意

展平后的模块的 args/kwargs 不一定与 eager 模块匹配，因此进行模块交换（例如 self.submod = new_mod）不一定奏效。如果您需要替换一个模块，您需要设置 torch.export.export() 的 preserve_module_call_signature 参数。

参数

module (ExportedProgram) – 要展平的 ExportedProgram。
flat_args_adapter (Optional[FlatArgsAdapter]) – 如果输入 TreeSpec 与导出模块的不匹配，则适配平坦参数。

返回值

UnflattenedModule 的实例，它具有与原始 eager 模块在导出前相同的模块层级。

返回类型

UnflattenedModule

torch.export.passes.move_to_device_pass(ep, location)[source][source]¶

将导出的程序移动到给定的设备。

参数

ep (ExportedProgram) – 要移动的导出程序。
location (Union[torch.device, str, Dict[str, str]]) – 要将导出程序移动到的设备。如果是字符串，则解释为设备名称。如果是字典，则解释为从现有设备到目标设备的映射。

返回值

已移动的导出程序。

返回类型

ExportedProgram

torch.export¶

概述¶

现有框架¶

导出 PyTorch 模型¶

一个例子¶

非严格导出¶

用于训练和推理的导出¶

表达动态性¶

序列化¶

特化¶

输入张量形状¶

Python 原语¶

Python 容器¶

`torch.export` 的限制¶

图中断¶

数据/形状相关的控制流¶

操作符缺失 Fake/Meta/Abstract 内核¶

阅读更多¶

API 参考¶

文档

教程

资源