快速入门¶
选项 1: torch.compile¶
您可以在使用 torch.compile 的任何地方使用 Torch-TensorRT
import torch
import torch_tensorrt
model = MyModel().eval().cuda() # define your model here
x = torch.randn((1, 3, 224, 224)).cuda() # define what the inputs to the model will look like
optimized_model = torch.compile(model, backend="tensorrt")
optimized_model(x) # compiled on first run
optimized_model(x) # this will be fast!
选项 2: 导出¶
如果您希望提前优化模型和/或在 C++ 环境中部署,Torch-TensorRT 提供了一种导出式工作流程,该工作流程会序列化一个优化的模块。此模块可以在 PyTorch 中或使用 libtorch 部署(即,无需 Python 依赖项)。
步骤 1: 优化 + 序列化¶
import torch
import torch_tensorrt
model = MyModel().eval().cuda() # define your model here
inputs = [torch.randn((1, 3, 224, 224)).cuda()] # define a list of representative inputs here
trt_gm = torch_tensorrt.compile(model, ir="dynamo", inputs)
torch_tensorrt.save(trt_gm, "trt.ep", inputs=inputs) # PyTorch only supports Python runtime for an ExportedProgram. For C++ deployment, use a TorchScript file
torch_tensorrt.save(trt_gm, "trt.ts", output_format="torchscript", inputs=inputs)
步骤 2: 部署¶
在 Python 中部署:¶
import torch
import torch_tensorrt
inputs = [torch.randn((1, 3, 224, 224)).cuda()] # your inputs go here
# You can run this in a new python session!
model = torch.export.load("trt.ep").module()
# model = torch_tensorrt.load("trt.ep").module() # this also works
model(*inputs)
在 C++ 中部署:¶
#include "torch/script.h"
#include "torch_tensorrt/torch_tensorrt.h"
auto trt_mod = torch::jit::load("trt.ts");
auto input_tensor = [...]; // fill this with your inputs
auto results = trt_mod.forward({input_tensor});