注意

转到末尾下载完整的示例代码

使用 ExecuTorch SDK 分析模型¶

作者： Jack Khuu

The ExecuTorch SDK 是一套旨在为用户提供分析、调试和可视化 ExecuTorch 模型的能力的工具。

本教程将展示如何利用 SDK 的完整端到端流程。具体来说，它将

生成 SDK 使用的工件 (ETRecord, ETDump).
创建一个使用这些工件的检查器类。
利用检查器类分析模型。

先决条件¶

要运行本教程，您首先需要设置您的 ExecuTorch 环境.

生成 ETRecord（可选）¶

第一步是生成一个 ETRecord。 ETRecord 包含模型图和元数据，用于将运行时结果（例如分析）链接到急切模型。这是通过 executorch.sdk.generate_etrecord 生成的。

executorch.sdk.generate_etrecord 接收一个输出文件路径（str）、边缘方言模型 (EdgeProgramManager)、ExecuTorch 方言模型 (ExecutorchProgramManager) 和一个可选的附加模型字典。

在本教程中，使用一个示例模型（如下所示）进行演示。

import copy

import torch
import torch.nn as nn
import torch.nn.functional as F

from executorch.exir import (
    EdgeCompileConfig,
    EdgeProgramManager,
    ExecutorchProgramManager,
    to_edge,
)
from executorch.sdk import generate_etrecord
from torch.export import export, ExportedProgram


# Generate Model
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        # 1 input image channel, 6 output channels, 5x5 square convolution
        # kernel
        self.conv1 = nn.Conv2d(1, 6, 5)
        self.conv2 = nn.Conv2d(6, 16, 5)
        # an affine operation: y = Wx + b
        self.fc1 = nn.Linear(16 * 5 * 5, 120)  # 5*5 from image dimension
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        # Max pooling over a (2, 2) window
        x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2))
        # If the size is a square, you can specify with a single number
        x = F.max_pool2d(F.relu(self.conv2(x)), 2)
        x = torch.flatten(x, 1)  # flatten all dimensions except the batch dimension
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x


model = Net()

aten_model: ExportedProgram = export(
    model,
    (torch.randn(1, 1, 32, 32),),
)

edge_program_manager: EdgeProgramManager = to_edge(
    aten_model, compile_config=EdgeCompileConfig(_check_ir_validity=True)
)
edge_program_manager_copy = copy.deepcopy(edge_program_manager)
et_program_manager: ExecutorchProgramManager = edge_program_manager.to_executorch()


# Generate ETRecord
etrecord_path = "etrecord.bin"
generate_etrecord(etrecord_path, edge_program_manager_copy, et_program_manager)

警告

用户应该对 to_edge() 的输出进行深拷贝，并将深拷贝传递给 generate_etrecord API。这是因为后续调用 to_executorch() 会进行就地修改，并在过程中丢失调试数据。

生成 ETDump¶

下一步是生成一个 ETDump。 ETDump 包含从执行捆绑程序模型中获得的运行时结果。

在本教程中，从上面的示例模型创建了一个捆绑程序。

import torch

from executorch.exir import to_edge
from executorch.sdk import BundledProgram

from executorch.sdk.bundled_program.config import MethodTestCase, MethodTestSuite
from executorch.sdk.bundled_program.serialize import (
    serialize_from_bundled_program_to_flatbuffer,
)
from torch.export import export

# Step 1: ExecuTorch Program Export
m_name = "forward"
method_graphs = {m_name: export(model, (torch.randn(1, 1, 32, 32),))}

# Step 2: Construct Method Test Suites
inputs = [[torch.randn(1, 1, 32, 32)] for _ in range(2)]

method_test_suites = [
    MethodTestSuite(
        method_name=m_name,
        test_cases=[
            MethodTestCase(inputs=inp, expected_outputs=getattr(model, m_name)(*inp))
            for inp in inputs
        ],
    )
]

# Step 3: Generate BundledProgram
executorch_program = to_edge(method_graphs).to_executorch()
bundled_program = BundledProgram(executorch_program, method_test_suites)

# Step 4: Serialize BundledProgram to flatbuffer.
serialized_bundled_program = serialize_from_bundled_program_to_flatbuffer(
    bundled_program
)
save_path = "bundled_program.bp"
with open(save_path, "wb") as f:
    f.write(serialized_bundled_program)

使用 CMake（按照这些说明设置 cmake）执行捆绑程序以生成 ETDump

cd executorch
rm -rf cmake-out && mkdir cmake-out && cd cmake-out && cmake -DEXECUTORCH_BUILD_SDK=1 -DEXECUTORCH_BUILD_EXTENSION_DATA_LOADER=1 ..
cd ..
cmake --build cmake-out -j8 -t sdk_example_runner
./cmake-out/examples/sdk/sdk_example_runner --bundled_program_path <bundled_program>

创建检查器¶

最后一步是通过传入工件路径来创建 Inspector。检查器从 ETDump 中获取运行时结果，并将它们与边缘方言图的操作符相关联。

回顾：不需要 ETRecord。如果没有提供 ETRecord，检查器将显示运行时结果，但没有操作符关联。

要可视化所有运行时事件，请调用检查器的 print_data_tabular。

from executorch.sdk import Inspector

etdump_path = "etdump.etdp"
inspector = Inspector(etdump_path=etdump_path, etrecord=etrecord_path)
inspector.print_data_tabular()

False

使用检查器进行分析¶

Inspector 提供两种访问摄取信息的方式：EventBlocks 和 DataFrames。这些媒介使用户能够对其模型性能执行自定义分析。

以下是使用 EventBlock 和 DataFrame 方法的示例用法。

# Set Up
import pprint as pp

import pandas as pd

pd.set_option("display.max_colwidth", None)
pd.set_option("display.max_columns", None)

如果用户想要原始分析结果，他们可以执行类似于查找 addmm.out 事件的原始运行时数据的操作。

for event_block in inspector.event_blocks:
    # Via EventBlocks
    for event in event_block.events:
        if event.name == "native_call_addmm.out":
            print(event.name, event.perf_data.raw)

    # Via Dataframe
    df = event_block.to_dataframe()
    df = df[df.event_name == "native_call_addmm.out"]
    print(df[["event_name", "raw"]])
    print()

如果用户想要将运算符追溯到其模型代码，他们可以执行类似于查找最慢的 convolution.out 调用的模块层次结构和堆栈跟踪的操作。

for event_block in inspector.event_blocks:
    # Via EventBlocks
    slowest = None
    for event in event_block.events:
        if event.name == "native_call_convolution.out":
            if slowest is None or event.perf_data.p50 > slowest.perf_data.p50:
                slowest = event
    if slowest is not None:
        print(slowest.name)
        print()
        pp.pprint(slowest.stack_traces)
        print()
        pp.pprint(slowest.module_hierarchy)

    # Via Dataframe
    df = event_block.to_dataframe()
    df = df[df.event_name == "native_call_convolution.out"]
    if len(df) > 0:
        slowest = df.loc[df["p50"].idxmax()]
        print(slowest.event_name)
        print()
        pp.pprint(slowest.stack_traces)
        print()
        pp.pprint(slowest.module_hierarchy)

如果用户想要获取模块的总运行时间，可以使用 find_total_for_module。

print(inspector.find_total_for_module("L__self__"))
print(inspector.find_total_for_module("L__self___conv2"))

0.0
0.0

注意：find_total_for_module 是 Inspector 的特殊一等方法。

结论¶

在本教程中，我们学习了使用 ExecuTorch SDK 使用 ExecuTorch 模型所需的步骤。它还展示了如何使用 Inspector API 分析模型运行结果。

提到的链接¶

脚本总运行时间：（0 分钟 1.260 秒）

由 Sphinx-Gallery 生成的画廊