AffineQuantizedTensor¶

class torchao.dtypes.AffineQuantizedTensor(tensor_impl: AQTTensorImpl, block_size: Tuple[int, ...], shape: Size, quant_min: Optional[Union[int, float]] = None, quant_max: Optional[Union[int, float]] = None, zero_point_domain: ZeroPointDomain = ZeroPointDomain.INT, dtype=None, strides=None)[source]¶

仿射量化张量子类。仿射量化意味着我们使用仿射变换对浮点张量进行量化：quantized_tensor = float_tensor / scale + zero_point

要了解仿射量化的 choose_qparams、量化和反量化过程中发生的情况，请查看 https://github.com/pytorch/ao/blob/main/torchao/quantization/quant_primitives.py 并检查这三个量化基本操作：choose_qparams_affine、quantize_affine 和 dequantize_affine

张量子类的形状和 dtype 代表了张量子类在外部的表现形式，与其内部表示的类型或方向无关。

字段

tensor_impl (AQTTensorImpl)：用作量化数据通用张量实现存储的张量，
例如，根据设备和运算符/内核，存储普通张量 (int_data, scale, zero_point) 或打包格式
block_size (Tuple[int, …])：量化粒度，表示共享同一 qparam 的张量元素的大小
例如，当大小与输入张量维度相同时，我们使用的是逐张量量化
shape (torch.Size)：原始高精度张量的形状
quant_min (Optional[int])：张量的最小量化值，如果未指定，将从 int_data 的 dtype 推导出来
quant_max (Optional[int])：张量的最大量化值，如果未指定，将从 int_data 的 dtype 推导出来
zero_point_domain (ZeroPointDomain)：零点所属的域，应该是整数或浮点数
如果零点在整数域，则在量化过程中将零点添加到量化后的整数值；如果零点在浮点域，则在量化过程中从浮点（未量化）值中减去零点，默认为 ZeroPointDomain.INT
dtype：原始高精度张量的 dtype，例如 torch.float32

dequantize() → Tensor[source]¶: 给定一个量化张量，对其进行反量化并返回反量化后的浮点张量。

classmethod from_hp_to_floatx(input_float: Tensor, block_size: Tuple[int, ...], target_dtype: dtype, _layout: Layout, scale_dtype: Optional[dtype] = None)[source]¶: 将高精度张量转换为 float8 量化张量。

classmethod from_hp_to_floatx_static(input_float: Tensor, scale: Tensor, block_size: Tuple[int, ...], target_dtype: dtype, _layout: Layout)[source]¶: 使用静态参数从高精度张量创建 float8 AffineQuantizedTensor。

classmethod from_hp_to_fpx(input_float: Tensor, _layout: Layout)[source]¶: 从高精度张量创建 floatx AffineQuantizedTensor。Floatx 表示为 ebits 和 mbits，支持 float1-float7 的表示。

classmethod from_hp_to_intx(input_float: Tensor, mapping_type: MappingType, block_size: Tuple[int, ...], target_dtype: dtype, quant_min: Optional[int] = None, quant_max: Optional[int] = None, eps: Optional[float] = None, scale_dtype: Optional[dtype] = None, zero_point_dtype: Optional[dtype] = None, preserve_zero: bool = True, zero_point_domain: ZeroPointDomain = ZeroPointDomain.INT, _layout: Layout = PlainLayout(), use_hqq: bool = False)[source]¶: 将高精度张量转换为整数仿射量化张量。

classmethod from_hp_to_intx_static(input_float: Tensor, scale: Tensor, zero_point: Optional[Tensor], block_size: Tuple[int, ...], target_dtype: dtype, quant_min: Optional[int] = None, quant_max: Optional[int] = None, zero_point_domain: ZeroPointDomain = ZeroPointDomain.INT, _layout: Layout = PlainLayout())[source]¶: 使用静态参数从高精度张量创建整数 AffineQuantizedTensor。

to(*args, **kwargs) → Tensor[source]¶

执行张量的 dtype 和/或设备转换。通过 self.to(*args, **kwargs) 的参数推断出 torch.dtype 和 torch.device。

注意

如果 self 张量已经具有正确的 torch.dtype 和 torch.device，则返回 self。否则，返回的张量是 self 的副本，具有所需的 torch.dtype 和 torch.device。

以下是调用 to 的方法

to(dtype, non_blocking=False, copy=False, memory_format=torch.preserve_format) → Tensor[source]

返回具有指定 dtype 的张量

参数
memory_format (torch.memory_format, 可选)：返回张量所需的内存格式。默认值：torch.preserve_format。

to(device=None, dtype=None, non_blocking=False, copy=False, memory_format=torch.preserve_format) → Tensor[source]

返回具有指定 device 和（可选）dtype 的张量。如果 dtype 为 None，则推断为 self.dtype。当指定 non_blocking 时，如果可能，会尝试异步转换，例如将具有 pinned memory 的 CPU 张量转换为 CUDA 张量。当设置 copy 时，即使张量已与所需转换匹配，也会创建一个新张量。

参数
memory_format (torch.memory_format, 可选)：返回张量所需的内存格式。默认值：torch.preserve_format。

to(other, non_blocking=False, copy=False) → Tensor[source]: 返回具有与张量 other 相同的 torch.dtype 和 torch.device 的张量。当指定 non_blocking 时，如果可能，会尝试异步转换，例如将具有 pinned memory 的 CPU 张量转换为 CUDA 张量。当设置 copy 时，即使张量已与所需转换匹配，也会创建一个新张量。

示例

>>> tensor = torch.randn(2, 2)  # Initially dtype=float32, device=cpu
>>> tensor.to(torch.float64)
tensor([[-0.5044,  0.0005],
        [ 0.3310, -0.0584]], dtype=torch.float64)

>>> cuda0 = torch.device('cuda:0')
>>> tensor.to(cuda0)
tensor([[-0.5044,  0.0005],
        [ 0.3310, -0.0584]], device='cuda:0')

>>> tensor.to(cuda0, dtype=torch.float64)
tensor([[-0.5044,  0.0005],
        [ 0.3310, -0.0584]], dtype=torch.float64, device='cuda:0')

>>> other = torch.randn((), dtype=torch.float64, device=cuda0)
>>> tensor.to(other, non_blocking=True)
tensor([[-0.5044,  0.0005],
        [ 0.3310, -0.0584]], dtype=torch.float64, device='cuda:0')

AffineQuantizedTensor¶

文档

教程

资源