将模型作为委托进行 Lowering¶

受众：机器学习工程师，对应用委托以加速运行时程序感兴趣的人员。

后端委托是后端处理和执行 PyTorch 程序的入口点，旨在利用专用后端和硬件的性能和效率优势，同时仍为 PyTorch 用户提供接近 PyTorch 运行时的体验。后端委托通常由 ExecuTorch 或供应商提供。在程序中利用委托的方式是通过标准入口点 to_backend。

前端接口¶

将程序委托给后端有三种流程

将整个模块 Lowering 到后端。这适用于测试后端和预处理阶段。
将整个模块 Lowering 到后端，并将其与其他模块组合。这适用于重用从其他流程导出的 Lowering 模块。
根据分区器 Lowering 模块的部分。这适用于 Lowering 包含可 Lowering 和不可 Lowering 节点的模型，是最简化的流程。

流程 1：Lowering 整个模块¶

此流程从带有 Edge Dialect 表示的跟踪图模块开始。要对其进行 Lowering，我们调用以下函数，该函数返回一个 LoweredBackendModule（关于此函数的更多文档可以在导出 API 参考中找到）

# defined in backend_api.py
def to_backend(
    backend_id: str,
    edge_program: ExportedProgram,
    compile_spec: List[CompileSpec],
) -> LoweredBackendModule:

在此函数内部，会调用后端的 preprocess() 函数，该函数生成一个编译后的 blob，该 blob 将被写入到 flatbuffer 二进制文件中。Lowering 后的模块可以直接捕获，或者放回父模块中进行捕获。最终，捕获的模块会序列化到 flatbuffer 模型中，该模型可以由运行时加载。

以下是此流程的一个示例

from executorch.exir.backend.backend_api import to_backend
import executorch.exir as exir
import torch
from torch.export import export
from executorch.exir import to_edge

# The submodule runs in a specific backend. In this example,  `BackendWithCompilerDemo` backend
class LowerableSubModel(torch.nn.Module):
    def __init__(self):
        super().__init__()

    def forward(self, x):
        return torch.sin(x)

# Convert the lowerable module to Edge IR Representation
to_be_lowered = LowerableSubModel()
example_input = (torch.ones(1), )
to_be_lowered_exir_submodule = to_edge(export(to_be_lowered, example_input))

# Import the backend implementation
from executorch.exir.backend.test.backend_with_compiler_demo import (
    BackendWithCompilerDemo,
)
lowered_module = to_backend('BackendWithCompilerDemo', to_be_lowered_exir_submodule.exported_program(), [])

我们可以通过直接运行以下命令将程序序列化为 flatbuffer 格式

# Save the flatbuffer to a local file
save_path = "delegate.pte"
with open(save_path, "wb") as f:
    f.write(lowered_module.buffer())

流程 2：Lowering 整个模块并组合¶

或者，在流程 1 之后，我们可以将此 Lowering 后的模块与另一个模块组合

# This submodule runs in executor runtime
class NonLowerableSubModel(torch.nn.Module):
    def __init__(self, bias):
        super().__init__()
        self.bias = bias

    def forward(self, a, b):
        return torch.add(torch.add(a, b), self.bias)


# The composite module, including lower part and non-lowerpart
class CompositeModel(torch.nn.Module):
    def __init__(self):
        super().__init__()
        self.non_lowerable = NonLowerableSubModel(torch.ones(1) * 0.3)
        self.lowerable = lowered_module

    def forward(self, x):
        a = self.lowerable(x)
        b = self.lowerable(a)
        ret = self.non_lowerable(a, b)
        return a, b, ret

composite_model = CompositeModel()
model_inputs = (torch.ones(1), )
exec_prog = to_edge(export(composite_model, model_inputs)).to_executorch()

# Save the flatbuffer to a local file
save_path = "delegate.pte"
with open(save_path, "wb") as f:
    f.write(exec_prog.buffer)

流程 3：分区¶

第三个流程也从带有 Edge Dialect 表示的跟踪图模块开始。要 Lowering 此图模块中的某些节点，我们可以使用重载的 to_backend 函数。

def to_backend(
    edge_program: ExportedProgram,
    partitioner: Partitioner,
) -> ExportedProgram:

此函数接收一个 Partitioner，它会为所有需要 Lowering 的节点添加标签。它将返回一个 partition_tags 字典，将标签映射到后端名称和模块编译规范。然后，带有标签的节点将使用流程 1 的过程进行分区并 Lowering 到其映射的后端。可用的辅助分区器文档可以在此处找到。这些 Lowering 后的模块将被插入到顶层模块中并进行序列化。