Torch-TensorRT 详解¶

Torch-TensorRT 是一个用于 PyTorch 模型的编译器，通过 TensorRT 模型优化 SDK 定位 NVIDIA GPU。它旨在为 PyTorch 模型提供更好的推理性能，同时保持 PyTorch 出色的易用性。

Dynamo 前端¶

Dynamo 前端是 Torch-TensorRT 的默认前端。它利用了 PyTorch 的 dynamo 编译器栈。

`torch.compile` (即时编译 JIT)¶

torch.compile 是一个 JIT 编译器栈，因此编译会延迟到首次使用时进行。这意味着当图中的条件发生变化时，图会自动重新编译。这为用户提供了最大的运行时灵活性，但限制了有关序列化的选项。

在底层，torch.compile 将其认为可以下层到 Torch-TensorRT 的子图委托给它。Torch-TensorRT 进一步将这些图下层为仅包含 Core ATen Operators 或适合 TensorRT 加速的特定“高级算子”的操作。子图会进一步划分为将在 PyTorch 中运行的部分和根据算子支持情况将进一步编译为 TensorRT 的部分。然后，TensorRT 引擎会替换受支持的块，并将混合子图返回给 torch.compile 以便在调用时运行。

接受的格式¶

torch.fx GraphModule (torch.fx.GraphModule)
PyTorch 模块 (torch.nn.Module)

返回值¶

首次调用时触发编译的封装函数

`torch_tensorrt.dynamo.compile` (提前编译 AOT)¶

torch_tensorrt.dynamo.compile 是一个 AOT 编译器，模型在显式的编译阶段进行编译。这些编译产物可以被序列化并在以后重新加载。图通过 torch.export.trace 系统下层为仅包含 Core ATen Operators 或适合 TensorRT 加速的特定“高级算子”的操作图。子图会进一步划分为将在 PyTorch 中运行的部分和根据算子支持情况将进一步编译为 TensorRT 的部分。然后，TensorRT 引擎会替换受支持的块，并将混合子图打包到 ExportedProgram 中，该程序可以被序列化和重新加载。

接受的格式¶

torch.export.ExportedProgram (torch.export.ExportedProgram)
torch.fx GraphModule (torch.fx.GraphModule) (通过 torch.export.export)
PyTorch 模块 (torch.nn.Module) (通过 torch.export.export)

返回值¶

torch.fx.GraphModule (可使用 torch.export.ExportedProgram 序列化)

传统前端¶

由于 PyTorch 生态系统多年来出现了一些编译器技术，Torch-TensorRT 保留了一些针对它们的传统功能。

TorchScript (torch_tensorrt.ts.compile)¶

TorchScript 前端是 Torch-TensorRT 最初的默认前端，针对 TorchScript 格式的模型。提供的图将被划分为受支持和不受支持的块。受支持的块将下层到 TensorRT，不受支持的块将保留使用 LibTorch 运行。结果图将作为 ScriptModule 返回给用户，该模块可以使用 Torch-TensorRT PyTorch 运行时扩展进行加载和保存。

接受的格式¶

TorchScript 模块 (torch.jit.ScriptModule)
PyTorch 模块 (torch.nn.Module) (通过 torch.jit.script 或 torch.jit.trace)

返回值¶

TorchScript 模块 (torch.jit.ScriptModule)

FX 图模块 (torch_tensorrt.fx.compile)¶

此前端几乎已完全被 Dynamo 前端取代，Dynamo 前端是 FX 前端可用功能的超集。原始 FX 前端保留在代码库中是为了向后兼容。

接受的格式¶

torch.fx GraphModule (torch.fx.GraphModule)
PyTorch 模块 (torch.nn.Module) (通过 torch.fx.trace)

返回值¶

torch.fx GraphModule (torch.fx.GraphModule)

`torch_tensorrt.compile`¶

由于存在许多不同的前端和支持的格式，我们提供了一个名为 torch_tensorrt.compile 的便捷层，允许用户访问所有不同的编译器选项。您可以通过设置 ir 选项来指定 torch_tensorrt.compile 使用哪种编译器路径，告知 Torch-TensorRT 尝试通过特定的中间表示形式来下层提供的模型。

`ir` 选项¶

torch_compile: 使用 torch.compile 系统。立即返回一个在首次调用时进行编译的封装函数。
dynamo: 通过 torch.export/ torchdynamo 栈运行图。如果输入模块是 torch.nn.Module，则必须是“可导出追踪的”，因为该模块将使用 torch.export.export 进行追踪。返回一个 torch.fx.GraphModule，该模块可以立即运行或通过 torch.export.export 或 torch_tensorrt.save 进行保存。
torchscript 或 ts: 通过 TorchScript 栈运行图。如果输入模块是 torch.nn.Module，则必须是“可脚本化的”，因为该模块将使用 torch.jit.script 进行编译。返回一个 torch.jit.ScriptModule，该模块可以立即运行或通过 torch.save 或 torch_tensorrt.save 进行保存。
fx: 通过 torch.fx 栈运行图。如果输入模块是 torch.nn.Module，它将使用 torch.fx.trace 进行追踪，并受其限制。

Torch-TensorRT 详解¶

Dynamo 前端¶

`torch.compile` (即时编译 JIT)¶

接受的格式¶

返回值¶

`torch_tensorrt.dynamo.compile` (提前编译 AOT)¶

接受的格式¶

返回值¶

传统前端¶

TorchScript (torch_tensorrt.ts.compile)¶

接受的格式¶

返回值¶

FX 图模块 (torch_tensorrt.fx.compile)¶

接受的格式¶

返回值¶

`torch_tensorrt.compile`¶

`ir` 选项¶

文档

教程

资源

Torch-TensorRT 详解¶

Dynamo 前端¶

torch.compile (即时编译 JIT)¶

接受的格式¶

返回值¶

torch_tensorrt.dynamo.compile (提前编译 AOT)¶

接受的格式¶

返回值¶

传统前端¶

TorchScript (torch_tensorrt.ts.compile)¶

接受的格式¶

返回值¶

FX 图模块 (torch_tensorrt.fx.compile)¶

接受的格式¶

返回值¶

torch_tensorrt.compile¶

ir 选项¶

文档

教程

资源

`torch.compile` (即时编译 JIT)¶

`torch_tensorrt.dynamo.compile` (提前编译 AOT)¶

`torch_tensorrt.compile`¶

`ir` 选项¶