quantize ¶

torchao.quantization.quantize_(model: Module, config: Union[AOBaseConfig, Callable[[Module], Module]], filter_fn: Optional[Callable[[Module, str], bool]] = None, set_inductor_config: Optional[bool] = None, device: Optional[Union[device, str, int]] = None)[源]¶

使用 config 转换模型中线性模块的权重，模型会被原地修改

参数:

model (torch.nn.Module) – 输入模型
config (Union[AOBaseConfig, Callable[[torch.nn.Module], torch.nn.Module]]) – (1) 工作流配置对象或 (2) 一个将 tensor subclass 转换应用于模块权重并返回该模块的函数（例如，将 linear 模块的权重张量转换为仿射量化张量）。注意：(2) 将在未来版本中删除。
filter_fn (Optional[Callable[[torch.nn.Module, str], bool]]) – 接受 nn.Module 实例和模块完全限定名的函数，如果希望对该模块的权重运行 config，则返回 True
module (该模块的权重) –
set_inductor_config (bool, optional) – 是否自动使用推荐的 inductor 配置设置（默认为 None）
device (device, optional) – 在应用 filter_fn 之前将模块移动到的设备。可以设置为 “cuda” 以加速量化。最终模型将位于指定的 device 上。默认为 None（不改变设备）。

示例

import torch
import torch.nn as nn
from torchao import quantize_

# quantize with some predefined `config` method that corresponds to
# optimized execution paths or kernels (e.g. int4 tinygemm kernel)
# also customizable with arguments
# currently options are
# int8_dynamic_activation_int4_weight (for executorch)
# int8_dynamic_activation_int8_weight (optimized with int8 mm op and torch.compile)
# int4_weight_only (optimized with int4 tinygemm kernel and torch.compile)
# int8_weight_only (optimized with int8 mm op and torch.compile
from torchao.quantization.quant_api import int4_weight_only

m = nn.Sequential(nn.Linear(32, 1024), nn.Linear(1024, 32))
quantize_(m, int4_weight_only(group_size=32))

quantize ¶

文档

教程

资源

quantize¶

文档

教程

资源

quantize ¶