TBE CPU 自动向量化¶

FP8/16/32 自动向量化实现方法¶

template<typename InType, typename IndexType, typename OffsetType, typename OutType> static bool ALWAYS_INLINE EmbeddingSpMDM_autovec (const int64_t block_size, const int64_t output_size, const int64_t index_size, const int64_t data_size, const InType *input, const IndexType *indices, const OffsetType *offsets_or_lengths, const float *weights, bool normalize_by_lengths, OutType *out, const bool is_weight_positional, const bool use_offsets, const int64_t output_stride, const int64_t input_stride, const bool no_bag, const bool is_bf16_out, const bool is_bf16_in)

方法 EmbeddingSpMDM_ref 的自动向量化版本，用于 FP32 权重类型。

模板参数:

InType – 输入数据类型（使用 uint8_t）
IndexType – 索引数据类型（使用 int64_t）
OffsetType – 偏移数据类型（使用 int32_t）
OutType – 输出数据类型（使用 float）

参数:

block_size – 块中元素的数量 (int64_t)
output_size – 输出中元素的数量 (int64_t)
index_size – 索引中元素的数量 (int64_t)
data_size – 数据中元素的数量 (int64_t)
input – 输入地址 (InType*)
indices – 索引地址 (IndexType*)
offsets_or_lengths – 偏移地址 (OffsetType*)
weights – 求和权重；可选，对于非加权求和可以为空 (float*)
normalize_by_lengths – 是否按长度归一化 (bool)
out – 输出地址 (OutType*)
is_weight_positional – 如果为 true，则权重是位置相关的；对于 FP32 自动向量化实现，设置为 false (bool)
use_offsets – 如果为 true，将使用偏移量而不是长度；对于 FP32 自动向量化实现，设置为 true (bool)
output_stride – 如果为 -1，则 output_stride 与 block_size 相同；对于 FP32 自动向量化实现，设置为 -1 (int64_t)
input_stride – 如果为 -1，则 input_stride 与 block_size 相同；对于 FP32 自动向量化实现，设置为 -1 (int64_t)
scale_bias_last – 如果为 true，则比例和偏差出现在每行末尾；对于 FP32 自动向量化实现，设置为 true (bool)
no_bag – 如果为 true，则没有 embedding bag；对于 FP32 自动向量化实现，设置为 false (bool)
is_bf16_out – 如果为 true，则输出为 BFLOAT16 类型；对于 FP32 自动向量化实现，设置为 false (bool)
is_bf16_in – 如果为 true，则输入为 BFLOAT16 类型；对于 FP32 自动向量化实现，设置为 false (bool)

template<typename IndexType, typename OffsetType, typename OutType> static bool ALWAYS_INLINE EmbeddingSpMDMFP8_autovec (const int64_t block_size, const int64_t output_size, const int64_t index_size, const int64_t data_size, const uint8_t *input, const IndexType *indices, const OffsetType *offsets_or_lengths, const float *weights, bool normalize_by_lengths, OutType *out, const bool is_weight_positional, const bool use_offsets, const int64_t output_stride, const int64_t input_stride, const int exponent_bits, const int exponent_bias, const bool is_bf16_out)

方法 EmbeddingSpMDM_ref 的自动向量化版本，用于 FP8 权重类型。

模板参数:

InType – 输入数据类型（使用 uint8_t）
IndexType – 索引数据类型（使用 int64_t）
OffsetType – 偏移数据类型（使用 int32_t）
OutType – 输出数据类型（使用 float）

参数:

block_size – 块中元素的数量 (int64_t)
output_size – 输出中元素的数量 (int64_t)
index_size – 索引中元素的数量 (int64_t)
data_size – 数据中元素的数量 (int64_t)
input – 输入地址 (InType*)
indices – 索引地址 (IndexType*)
offsets_or_lengths – 偏移地址 (OffsetType*)
weights – 求和权重；可选，对于非加权求和可以为空 (float*)
normalize_by_lengths – 是否按长度归一化 (bool)
out – 输出地址 (OutType*)
is_weight_positional – 如果为 true，则权重是位置相关的；对于 FP8 自动向量化实现，设置为 false (bool)
use_offsets – 如果为 true，将使用偏移量而不是长度；对于 FP8 自动向量化实现，设置为 true (bool)
output_stride – 如果为 -1，则 output_stride 与 block_size 相同；对于 FP8 自动向量化实现，设置为 -1 (int64_t)
exponent_bits – 指数中使用的位数
exponent_bias – 指数中使用的偏差
is_bf16_out – 如果为 true，则输出为 BFLOAT16 类型；对于 FP8 自动向量化实现，设置为 false (bool)

TBE CPU 自动向量化¶

FP8/16/32 自动向量化实现方法¶

文档

教程

资源