torchtrtc¶

`torchtrtc` 是一个用于使用 Torch-TensorRT 编译器的 CLI 应用程序。它提供了一种便捷的方式，可以从命令行使用 Torch-TensorRT 编译 TorchScript 模块，以便快速检查支持情况或作为部署管道的一部分。编译器支持所有基本功能，包括训练后量化（尽管您必须先拥有校准缓存文件才能使用 PTQ 功能）。编译器可以输出两种格式：嵌入了 TensorRT 引擎的 TorchScript 程序，或 TensorRT 引擎本身作为 PLAN 文件。

编译后运行程序所需的只是 C++ 链接 libtorchtrt.so 或在 Python 中导入 torch_tensorrt 包。使用编译模块的其他方面与标准 TorchScript 完全相同。使用 torch.jit.load() 加载，并像运行任何其他模块一样运行。

torchtrtc [input_file_path] [output_file_path]
  [input_specs...] {OPTIONS}

  torchtrtc is a compiler for TorchScript, it will compile and optimize
  TorchScript programs to run on NVIDIA GPUs using TensorRT

OPTIONS:

    -h, --help                        Display this help menu
    Verbiosity of the compiler
      -v, --verbose                     Dumps debugging information about the
                                        compilation process onto the console
      -w, --warnings                    Disables warnings generated during
                                        compilation onto the console (warnings
                                        are on by default)
      --i, --info                       Dumps info messages generated during
                                        compilation onto the console
    --build-debuggable-engine         Creates a debuggable engine
    --allow-gpu-fallback              (Only used when targeting DLA
                                      (device-type)) Lets engine run layers on
                                      GPU if they are not supported on DLA
    --require-full-compilation        Require that the model should be fully
                                      compiled to TensorRT or throw an error
    --check-method-support=[method_name]
                                      Check the support for end to end
                                      compilation of a specified method in the
                                      TorchScript module
    --disable-tf32                    Prevent Float32 layers from using the
                                      TF32 data format
    --sparse-weights                  Enable sparsity for weights of conv and
                                      FC layers
    -p[precision...],
    --enable-precision=[precision...] (Repeatable) Enabling an operating
                                      precision for kernels to use when
                                      building the engine (Int8 requires a
                                      calibration-cache argument) [ float |
                                      float32 | f32 | fp32 | half | float16 |
                                      f16 | fp16 | int8 | i8 | char ]
                                      (default: float)
    -d[type], --device-type=[type]    The type of device the engine should be
                                      built for [ gpu | dla ] (default: gpu)
    --gpu-id=[gpu_id]                 GPU id if running on multi-GPU platform
                                      (defaults to 0)
    --dla-core=[dla_core]             DLACore id if running on available DLA
                                      (defaults to 0)
    --engine-capability=[capability]  The type of device the engine should be
                                      built for [ standard | safety |
                                      dla_standalone ]
    --calibration-cache-file=[file_path]
                                      Path to calibration cache file to use
                                      for post training quantization
    --teo=[op_name...],
    --torch-executed-op=[op_name...]  (Repeatable) Operator in the graph that
                                      should always be run in PyTorch for
                                      execution (partial compilation must be
                                      enabled)
    --tem=[module_name...],
    --torch-executed-mod=[module_name...]
                                      (Repeatable) Module that should always
                                      be run in Pytorch for execution (partial
                                      compilation must be enabled)
    --mbs=[num_ops],
    --min-block-size=[num_ops]        Minimum number of contiguous TensorRT
                                      supported ops to compile a subgraph to
                                      TensorRT
    --embed-engine                    Whether to treat input file as a
                                      serialized TensorRT engine and embed it
                                      into a TorchScript module (device spec
                                      must be provided)
    --num-avg-timing-iters=[num_iters]
                                      Number of averaging timing iterations
                                      used to select kernels
    --workspace-size=[workspace_size] Maximum size of workspace given to
                                      TensorRT
    --dla-sram-size=[dla_sram_size]   Fast software managed RAM used by DLA
                                      to communicate within a layer.
    --dla-local-dram-size=[dla_local_dram_size]  Host RAM used by DLA to share
                                      intermediate tensor data across operations.
    --dla-global-dram-size=[dla_global_dram_size] Host RAM used by DLA to store
                                      weights and metadata for execution
    --atol=[atol]                     Absolute tolerance threshold for acceptable
                                      numerical deviation from standard torchscript
                                      output (default 1e-8)
    --rtol=[rtol]                     Relative tolerance threshold for acceptable
                                      numerical deviation from standard torchscript
                                      output  (default 1e-5)
    --no-threshold-check              Skip checking threshold compliance
    --truncate-long-double,
    --truncate, --truncate-64bit      Truncate weights that are provided in
                                      64bit to 32bit (Long, Double to Int,
                                      Float)
    --save-engine                     Instead of compiling a full a
                                      TorchScript program, save the created
                                      engine to the path specified as the
                                      output path
    --custom-torch-ops                (repeatable) Shared object/DLL containing custom torch operators
    --custom-converters               (repeatable) Shared object/DLL containing custom converters
    input_file_path                   Path to input TorchScript file
    output_file_path                  Path for compiled TorchScript (or
                                      TensorRT engine) file
    input_specs...                    Specs for inputs to engine, can either
                                      be a single size or a range defined by
                                      Min, Optimal, Max sizes, e.g.
                                      "(N,..,C,H,W)"
                                      "[(MIN_N,..,MIN_C,MIN_H,MIN_W);(OPT_N,..,OPT_C,OPT_H,OPT_W);(MAX_N,..,MAX_C,MAX_H,MAX_W)]".
                                      Data Type and format can be specified by
                                      adding an "@" followed by dtype and "%"
                                      followed by format to the end of the
                                      shape spec. e.g. "(3, 3, 32,
                                      32)@f16%NHWC"
    "--" can be used to terminate flag options and force all following
    arguments to be treated as positional options

例如：

torchtrtc tests/modules/ssd_traced.jit.pt ssd_trt.ts "[(1,3,300,300); (1,3,512,512); (1, 3, 1024, 1024)]@f16%contiguous" -p f16

包含一组自定义算子

torchtrtc tests/modules/ssd_traced.jit.pt ssd_trt.ts --custom-torch-ops=<path to custom library .so file> "[(1,3,300,300); (1,3,512,512); (1, 3, 1024, 1024)]@fp16%contiguous" -p f16

包含一组自定义转换器

torchtrtc tests/modules/ssd_traced.jit.pt ssd_trt.ts --custom-converters=<path to custom library .so file> "[(1,3,300,300); (1,3,512,512); (1, 3, 1024, 1024)]@fp16%contiguous" -p f16

torchtrtc¶

文档

教程

资源