CUDA 流清理器¶

注意

这是一个原型功能，这意味着它处于早期阶段，用于收集反馈和进行测试，其组件可能会发生变化。

概述¶

本模块介绍了 CUDA Sanitizer，一个用于检测在不同流上运行的内核之间的同步错误的工具。

它存储对张量的访问信息，以确定它们是否已同步。在 Python 程序中启用它时，如果检测到可能的数据竞争，将打印详细警告并退出程序。

可以通过导入此模块并调用 enable_cuda_sanitizer() 或导出 TORCH_CUDA_SANITIZER 环境变量来启用它。

用法¶

这是一个 PyTorch 中简单同步错误的示例

import torch

a = torch.rand(4, 2, device="cuda")

with torch.cuda.stream(torch.cuda.Stream()):
    torch.mul(a, 5, out=a)

张量 a 在默认流上初始化，并在没有任何同步方法的情况下在新流上被修改。这两个内核将在同一个张量上并发运行，这可能导致第二个内核在第一个内核写入之前读取未初始化的数据，或者第一个内核可能会覆盖第二个内核的部分结果。当在命令行中运行此脚本时，带上

TORCH_CUDA_SANITIZER=1 python example_error.py

CSAN 会打印以下输出

============================
CSAN detected a possible data race on tensor with data pointer 139719969079296
Access by stream 94646435460352 during kernel:
aten::mul.out(Tensor self, Tensor other, *, Tensor(a!) out) -> Tensor(a!)
writing to argument(s) self, out, and to the output
With stack trace:
  File "example_error.py", line 6, in <module>
    torch.mul(a, 5, out=a)
  ...
  File "pytorch/torch/cuda/_sanitizer.py", line 364, in _handle_kernel_launch
    stack_trace = traceback.StackSummary.extract(

Previous access by stream 0 during kernel:
aten::rand(int[] size, *, int? dtype=None, Device? device=None) -> Tensor
writing to the output
With stack trace:
  File "example_error.py", line 3, in <module>
    a = torch.rand(10000, device="cuda")
  ...
  File "pytorch/torch/cuda/_sanitizer.py", line 364, in _handle_kernel_launch
    stack_trace = traceback.StackSummary.extract(

Tensor was allocated with stack trace:
  File "example_error.py", line 3, in <module>
    a = torch.rand(10000, device="cuda")
  ...
  File "pytorch/torch/cuda/_sanitizer.py", line 420, in _handle_memory_allocation
    traceback.StackSummary.extract(

这提供了对错误根源的深入了解

从 ID 为 0（默认流）和 94646435460352（新流）的流中错误地访问了张量
张量是通过调用 a = torch.rand(10000, device="cuda") 分配的
错误的访问是由运算符引起的
- a = torch.rand(10000, device="cuda") 在流 0 上
- torch.mul(a, 5, out=a) 在流 94646435460352 上
错误消息还会显示所调用运算符的模式，以及一个说明哪些运算符参数对应于受影响张量的注释。
- 在此示例中，可以看出张量 a 对应于调用运算符 torch.mul 的参数 self、out 和 output 值。

另请参阅

支持的 torch 运算符及其模式列表可以在此处查看。

可以通过强制新流等待默认流来修复此错误

with torch.cuda.stream(torch.cuda.Stream()):
    torch.cuda.current_stream().wait_stream(torch.cuda.default_stream())
    torch.mul(a, 5, out=a)

再次运行脚本时，不会报告任何错误。

API 参考¶

torch.cuda._sanitizer.enable_cuda_sanitizer()[source][source]¶

启用 CUDA Sanitizer。

Sanitizer 将开始分析由 torch 函数调用的低级别 CUDA 调用是否存在同步错误。发现的所有数据竞争将打印到标准错误输出，并附带可疑原因的堆栈跟踪。为了获得最佳结果，应在程序的最开始启用 Sanitizer。

CUDA 流清理器¶

概述¶

用法¶

API 参考¶

文档

教程

资源