分布式与并行训练教程¶

创建日期：Oct 04, 2022 | 最后更新：Oct 31, 2024 | 最后验证：Nov 05, 2024

分布式训练是一种模型训练范式，涉及将训练工作负载分散到多个工作节点上，从而显著提高训练速度和模型精度。虽然分布式训练可用于任何类型的机器学习模型训练，但对于大型模型和计算密集型任务（如深度学习）使用它效益最大。

在 PyTorch 中有几种执行分布式训练的方法，每种方法在特定用例中都有其优势

DistributedDataParallel (DDP)
Fully Sharded Data Parallel (FSDP)
Tensor Parallel (TP)
Device Mesh
Remote Procedure Call (RPC) 分布式训练
自定义扩展

在分布式概览中阅读有关这些选项的更多信息。

学习 DDP¶

DDP 入门视频教程

关于如何开始使用DistributedDataParallel并深入到更复杂主题的循序渐进视频系列

代码视频

https://pytorch.ac.cn/tutorials/beginner/ddp_series_intro.html?utm_source=distr_landing&utm_medium=ddp_series_intro

分布式数据并行入门

本教程提供了 PyTorch 分布式数据并行（DistributedData Parallel）的简短入门指南。

代码

https://pytorch.ac.cn/tutorials/intermediate/ddp_tutorial.html?utm_source=distr_landing&utm_medium=intermediate_ddp_tutorial

使用 Join 上下文管理器进行输入不均衡的分布式训练

本教程描述了 Join 上下文管理器，并演示了其与 DistributedData Parallel 的用法。

代码

https://pytorch.ac.cn/tutorials/advanced/generic_join.html?utm_source=distr_landing&utm_medium=generic_join

学习 FSDP¶

FSDP 入门

本教程演示了如何在 MNIST 数据集上使用 FSDP 进行分布式训练。

代码

https://pytorch.ac.cn/tutorials/intermediate/FSDP_tutorial.html?utm_source=distr_landing&utm_medium=FSDP_getting_started

FSDP 进阶

在本教程中，您将学习如何使用 FSDP 对 HuggingFace (HF) T5 模型进行微调以用于文本摘要。

代码

https://pytorch.ac.cn/tutorials/intermediate/FSDP_advanced_tutorial.html?utm_source=distr_landing&utm_medium=FSDP_advanced

学习 Tensor Parallel (TP)¶

使用 Tensor Parallel (TP) 进行大规模 Transformer 模型训练

本教程演示了如何使用 Tensor Parallel 和 Fully Sharded Data Parallel 在数百到数千个 GPU 上训练大型 Transformer 类模型。

代码

https://pytorch.ac.cn/tutorials/intermediate/TP_tutorial.html

学习 DeviceMesh¶

DeviceMesh 入门

在本教程中，您将了解DeviceMesh以及它如何帮助进行分布式训练。

代码

https://pytorch.ac.cn/tutorials/recipes/distributed_device_mesh.html?highlight=devicemesh

学习 RPC¶

分布式 RPC 框架入门

本教程演示了如何开始进行基于 RPC 的分布式训练。

代码

https://pytorch.ac.cn/tutorials/intermediate/rpc_tutorial.html?utm_source=distr_landing&utm_medium=rpc_getting_started

使用分布式 RPC 框架实现参数服务器

本教程将引导您完成一个使用 PyTorch 分布式 RPC 框架实现参数服务器的简单示例。

代码

https://pytorch.ac.cn/tutorials/intermediate/rpc_param_server_tutorial.html?utm_source=distr_landing&utm_medium=rpc_param_server_tutorial

使用异步执行实现批量 RPC 处理

在本教程中，您将使用 @rpc.functions.async_execution 装饰器构建批量处理 RPC 应用。

代码

https://pytorch.ac.cn/tutorials/intermediate/rpc_async_execution.html?utm_source=distr_landing&utm_medium=rpc_async_execution

结合分布式数据并行与分布式 RPC 框架

在本教程中，您将学习如何结合分布式数据并行与分布式模型并行。

代码

https://pytorch.ac.cn/tutorials/advanced/rpc_ddp_tutorial.html?utm_source=distr_landing&utm_medium=rpc_plus_ddp

自定义扩展¶

使用 Cpp 扩展自定义 Process Group 后端

在本教程中，您将学习如何实现自定义ProcessGroup后端，并使用 cpp 扩展将其接入 PyTorch 分布式包。

代码

https://pytorch.ac.cn/tutorials/intermediate/process_group_cpp_extension_tutorial.html?utm_source=distr_landing&utm_medium=custom_extensions_cpp