注意

跳到末尾下载完整的示例代码

使用 ExecuTorch 开发者工具剖析模型¶

作者： Jack Khuu

ExecuTorch 开发者工具是一套旨在为用户提供剖析、调试和可视化 ExecuTorch 模型能力的工具。

本教程将展示如何利用开发者工具剖析模型的完整端到端流程。具体而言，它将：

生成开发者工具使用的工件（ETRecord，ETDump）。
创建一个使用这些工件的 Inspector 类。
利用 Inspector 类分析模型剖析结果。

先决条件¶

要运行本教程，首先需要设置 ExecuTorch 环境。

生成 ETRecord（可选）¶

第一步是生成一个 ETRecord。ETRecord 包含模型图和元数据，用于将运行时结果（如剖析）关联到 eager 模型。这是通过 executorch.devtools.generate_etrecord 生成的。

executorch.devtools.generate_etrecord 接受输出文件路径 (str)、edge 方言模型 (EdgeProgramManager)、ExecuTorch 方言模型 (ExecutorchProgramManager)，以及一个可选的包含附加模型的字典。

在本教程中，使用一个示例模型（如下所示）进行演示。

import copy

import torch
import torch.nn as nn
import torch.nn.functional as F
from executorch.devtools import generate_etrecord

from executorch.exir import (
    EdgeCompileConfig,
    EdgeProgramManager,
    ExecutorchProgramManager,
    to_edge,
)
from torch.export import export, ExportedProgram


# Generate Model
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        # 1 input image channel, 6 output channels, 5x5 square convolution
        # kernel
        self.conv1 = nn.Conv2d(1, 6, 5)
        self.conv2 = nn.Conv2d(6, 16, 5)
        # an affine operation: y = Wx + b
        self.fc1 = nn.Linear(16 * 5 * 5, 120)  # 5*5 from image dimension
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        # Max pooling over a (2, 2) window
        x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2))
        # If the size is a square, you can specify with a single number
        x = F.max_pool2d(F.relu(self.conv2(x)), 2)
        x = torch.flatten(x, 1)  # flatten all dimensions except the batch dimension
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x


model = Net()

aten_model: ExportedProgram = export(model, (torch.randn(1, 1, 32, 32),), strict=True)

edge_program_manager: EdgeProgramManager = to_edge(
    aten_model, compile_config=EdgeCompileConfig(_check_ir_validity=True)
)
edge_program_manager_copy = copy.deepcopy(edge_program_manager)
et_program_manager: ExecutorchProgramManager = edge_program_manager.to_executorch()


# Generate ETRecord
etrecord_path = "etrecord.bin"
generate_etrecord(etrecord_path, edge_program_manager_copy, et_program_manager)

警告

用户应对 to_edge() 的输出进行深度复制，并将该深度复制传递给 generate_etrecord API。这是必要的，因为后续的调用 to_executorch() 会执行就地修改并在此过程中丢失调试数据。

生成 ETDump¶

下一步是生成一个 ETDump。ETDump 包含执行捆绑程序模型的运行时结果。

在本教程中，从上述示例模型创建了一个捆绑程序 (Bundled Program)。

import torch
from executorch.devtools import BundledProgram

from executorch.devtools.bundled_program.config import MethodTestCase, MethodTestSuite
from executorch.devtools.bundled_program.serialize import (
    serialize_from_bundled_program_to_flatbuffer,
)

from executorch.exir import to_edge
from torch.export import export

# Step 1: ExecuTorch Program Export
m_name = "forward"
method_graphs = {m_name: export(model, (torch.randn(1, 1, 32, 32),), strict=True)}

# Step 2: Construct Method Test Suites
inputs = [[torch.randn(1, 1, 32, 32)] for _ in range(2)]

method_test_suites = [
    MethodTestSuite(
        method_name=m_name,
        test_cases=[
            MethodTestCase(inputs=inp, expected_outputs=getattr(model, m_name)(*inp))
            for inp in inputs
        ],
    )
]

# Step 3: Generate BundledProgram
executorch_program = to_edge(method_graphs).to_executorch()
bundled_program = BundledProgram(executorch_program, method_test_suites)

# Step 4: Serialize BundledProgram to flatbuffer.
serialized_bundled_program = serialize_from_bundled_program_to_flatbuffer(
    bundled_program
)
save_path = "bundled_program.bp"
with open(save_path, "wb") as f:
    f.write(serialized_bundled_program)

使用 CMake（按照这些说明设置 cmake）执行捆绑程序以生成 ETDump。

cd executorch
./examples/devtools/build_example_runner.sh
cmake-out/examples/devtools/example_runner --bundled_program_path="bundled_program.bp"

创建 Inspector¶

最后一步是通过传入工件路径创建 Inspector。Inspector 从 ETDump 中获取运行时结果，并将其与 Edge 方言图的操作关联起来。

回想：ETRecord 不是必需的。如果未提供 ETRecord，Inspector 将显示运行时结果，但没有操作关联。

要可视化所有运行时事件，调用 Inspector 的 print_data_tabular。

from executorch.devtools import Inspector

etrecord_path = "etrecord.bin"
etdump_path = "etdump.etdp"
inspector = Inspector(etdump_path=etdump_path, etrecord=etrecord_path)
inspector.print_data_tabular()

False

使用 Inspector 进行分析¶

Inspector 提供两种访问已摄取信息的方式：EventBlocks 和 DataFrames。这些方式使用户能够对其模型性能执行自定义分析。

以下是使用 EventBlock 和 DataFrame 方法的示例用法。

# Set Up
import pprint as pp

import pandas as pd

pd.set_option("display.max_colwidth", None)
pd.set_option("display.max_columns", None)

如果用户需要原始剖析结果，他们可以执行类似于查找 addmm.out 事件的原始运行时数据。

for event_block in inspector.event_blocks:
    # Via EventBlocks
    for event in event_block.events:
        if event.name == "native_call_addmm.out":
            print(event.name, event.perf_data.raw if event.perf_data else "")

    # Via Dataframe
    df = event_block.to_dataframe()
    df = df[df.event_name == "native_call_addmm.out"]
    print(df[["event_name", "raw"]])
    print()

如果用户想将操作追踪回其模型代码，他们可以执行类似于查找最慢 convolution.out 调用的模块层次结构和堆栈跟踪。

for event_block in inspector.event_blocks:
    # Via EventBlocks
    slowest = None
    for event in event_block.events:
        if event.name == "native_call_convolution.out":
            if slowest is None or event.perf_data.p50 > slowest.perf_data.p50:
                slowest = event
    if slowest is not None:
        print(slowest.name)
        print()
        pp.pprint(slowest.stack_traces)
        print()
        pp.pprint(slowest.module_hierarchy)

    # Via Dataframe
    df = event_block.to_dataframe()
    df = df[df.event_name == "native_call_convolution.out"]
    if len(df) > 0:
        slowest = df.loc[df["p50"].idxmax()]
        assert slowest
        print(slowest.name)
        print()
        pp.pprint(slowest.stack_traces if slowest.stack_traces else "")
        print()
        pp.pprint(slowest.module_hierarchy if slowest.module_hierarchy else "")

如果用户想要模块的总运行时，他们可以使用 find_total_for_module。

print(inspector.find_total_for_module("L__self__"))
print(inspector.find_total_for_module("L__self___conv2"))

0.0
0.0

注意：find_total_for_module 是 Inspector 的特殊的一等方法。

结论¶

在本教程中，我们学习了使用 ExecuTorch 开发者工具处理 ExecuTorch 模型所需的步骤。它还展示了如何使用 Inspector API 分析模型运行结果。

提及的链接¶

脚本总运行时间： (0 minutes 1.892 seconds)

由 Sphinx-Gallery 生成