get_distributed_backend¶

torchtune.training.get_distributed_backend(device_type: str, offload_ops_to_cpu: bool = False) → str[source]¶

根据设备类型获取 PyTorch 分布式后端。

参数：

device_type (str) – 用于获取后端的设备类型。
offload_ops_to_cpu (bool, optional) – 检查是否应将任何操作卸载到 CPU 的标志。这类操作的示例包括 FSDP 的 CPU 卸载和分布式检查点的异步保存。默认为 False。

示例

>>> get_distributed_backend("cuda")
'nccl'
>>> get_distributed_backend("cpu")
'gloo'
>>> get_distributed_backend("cuda", offload_ops_to_cpu=True)
'cuda:nccl,cpu:gloo'

返回：: 用于 torch.distributed.init_process_group 中的分布式后端。
返回类型：: str

get_distributed_backend¶

文档

教程

资源