torchrec.inference¶

torchrec.inference.model_packager¶

class torchrec.inference.model_packager.PredictFactoryPackager¶

基类：object

classmethod save_predict_factory(predict_factory: ~typing.Type[~torchrec.inference.modules.PredictFactory], configs: ~typing.Dict[str, ~typing.Any], output: ~typing.Union[str, ~pathlib.Path, ~typing.BinaryIO], extra_files: ~typing.Dict[str, ~typing.Union[str, bytes]], loader_code: str = '\nimport %PACKAGE%\n\nMODULE_FACTORY=%PACKAGE%.%CLASS%\n', package_importer: ~typing.Union[~torch.package.importer.Importer, ~typing.List[~torch.package.importer.Importer]] = <torch.package.importer._SysImporter object>) → None¶

abstract classmethod set_extern_modules()¶

指示抽象类方法的装饰器。

已弃用，请改用“classmethod”和“abstractmethod”。

abstract classmethod set_mocked_modules()¶

指示抽象类方法的装饰器。

已弃用，请改用“classmethod”和“abstractmethod”。

torchrec.inference.model_packager.load_config_text(name: str) → str¶

torchrec.inference.model_packager.load_pickle_config(name: str, clazz: Type[T]) → T¶

torchrec.inference.modules¶

class torchrec.inference.modules.BatchingMetadata(type: str, device: str, pinned: List[str])¶

基类：object

批处理的元数据类，应与 C++ 定义保持同步。

device: str¶

pinned: List[str]¶

type: str¶

class torchrec.inference.modules.PredictFactory¶

基类：ABC

创建要在推理时间使用的模型（具有已学习的权重）。

abstract batching_metadata() → Dict[str, BatchingMetadata]¶: 返回一个从输入名称到 BatchingMetadata 的字典。此信息用于对输入请求进行批处理。

batching_metadata_json() → str¶: 将批处理元数据序列化为 JSON，以便于与 torch::deploy 环境一起解析。

abstract create_predict_module() → Module¶: 返回已分片的模型，其中已分配权重。state_dict() 必须与 TransformModule.transform_state_dict() 匹配。它假设 torch.distributed.init_process_group 已被调用，并将根据 torch.distributed.get_world_size() 对模型进行分片。

model_inputs_data() → Dict[str, Any]¶: 返回一个用于基准测试输入生成的各种数据的字典。

qualname_metadata() → Dict[str, QualNameMetadata]¶: 返回一个从限定名（方法名称）到 QualNameMetadata 的字典。这是模型特定方法执行的附加信息。

qualname_metadata_json() → str¶: 将 qualname 元数据序列化为 JSON 格式，方便与 torch::deploy 环境解析。

abstract result_metadata() → str¶: 返回一个字符串，表示结果类型。此信息用于结果拆分。

abstract run_weights_dependent_transformations(predict_module: Module) → Module¶: 运行依赖于预测模块权重的转换。例如，降低到后端。

abstract run_weights_independent_tranformations(predict_module: Module) → Module¶: 运行不依赖于预测模块权重的转换。例如，fx 跟踪，模型拆分等。

class torchrec.inference.modules.PredictModule(module: Module)¶

Bases: Module

用于在基于 torch.deploy 的后端工作的模块接口。用户应该覆盖 predict_forward 以将批次输入格式转换为模块输入格式。

调用参数: batch: 输入张量的字典

返回：

输出张量的字典

返回类型：

输出

参数：

module – 实际的预测模块
device – 此模块的主要设备，将在正向调用中使用。

示例

module = PredictModule(torch.device("cuda", torch.cuda.current_device()))

forward(batch: Dict[str, Tensor]) → Any¶

定义每次调用时执行的计算。

应由所有子类覆盖。

注意

虽然正向传递的配方需要在此函数中定义，但应该在之后调用 Module 实例，而不是此函数，因为前者会处理运行注册的钩子，而后者会静默地忽略它们。

abstract predict_forward(batch: Dict[str, Tensor]) → Any¶

property predict_module: Module¶

state_dict(destination: Optional[Dict[str, Any]] = None, prefix: str = '', keep_vars: bool = False) → Dict[str, Any]¶

返回一个字典，其中包含对模块整个状态的引用。

参数和持久缓冲区（例如，运行平均值）都包含在内。键是相应的参数和缓冲区名称。设置为 None 的参数和缓冲区不包含在内。

注意

返回的对象是浅拷贝。它包含对模块参数和缓冲区的引用。

警告

当前 state_dict() 也为 destination、prefix 和 keep_vars 接受位置参数。但是，这将被弃用，关键字参数将在将来的版本中强制执行。

警告

请避免使用参数 destination，因为它不是为最终用户设计的。

参数：

destination (dict, optional) – 如果提供，模块的状态将被更新到字典中，并返回相同的对象。否则，将创建一个 OrderedDict 并返回。默认值：None。
prefix (str, optional) – 添加到参数和缓冲区名称之前的，用于在 state_dict 中组合键的前缀。默认值：''。
keep_vars (bool, optional) – 默认情况下，state_dict 中返回的 Tensor 与自动梯度分离。如果设置为 True，则不会执行分离。默认值：False。

返回：

包含模块整个状态的字典

返回类型：

dict

示例

>>> # xdoctest: +SKIP("undefined vars")
>>> module.state_dict().keys()
['bias', 'weight']

training: bool¶

class torchrec.inference.modules.QualNameMetadata(need_preproc: bool)¶

基类：object

need_preproc: bool¶

torchrec.inference.modules.quantize_dense(predict_module: PredictModule, dtype: dtype, additional_embedding_module_type: List[Type[Module]] = []) → Module¶

torchrec.inference.modules.quantize_embeddings(module: Module, dtype: dtype, inplace: bool, additional_qconfig_spec_keys: Optional[List[Type[Module]]] = None, additional_mapping: Optional[Dict[Type[Module], Type[Module]]] = None, output_dtype: dtype = torch.float32, per_table_weight_dtype: Optional[Dict[str, dtype]] = None) → Module¶

torchrec.inference.modules.quantize_feature(module: Module, inputs: Tuple[Tensor, ...]) → Tuple[Tensor, ...]¶

torchrec.inference.modules.quantize_inference_model(model: Module, quantization_mapping: Optional[Dict[str, Type[Module]]] = None, per_table_weight_dtype: Optional[Dict[str, dtype]] = None, fp_weight_dtype: dtype = torch.int8) → Module¶: 对模型进行量化。

torchrec.inference.modules.shard_quant_model(model: Module, world_size: int = 1, compute_device: str = 'cuda', sharders: Optional[List[ModuleSharder[Module]]] = None, fused_params: Optional[Dict[str, Any]] = None, device_memory_size: Optional[int] = None, constraints: Optional[Dict[str, ParameterConstraints]] = None) → Tuple[Module, ShardingPlan]¶: 对模型进行切片。

torchrec.inference.modules.trim_torch_package_prefix_from_typename(typename: str) → str¶

torchrec.inference¶

torchrec.inference.model_packager¶

torchrec.inference.modules¶

模块内容¶

文档

教程

资源