• 文档 >
  • 张量 CUDA Stream API
快捷方式

Tensor CUDA Stream API

一个 CUDA Stream 是属于特定 CUDA 设备的执行线性序列。PyTorch C++ API 使用 CUDAStream 类和有用的辅助函数支持 CUDA 流,从而简化了流操作。您可以在 CUDAStream.h 中找到它们。本说明提供了有关如何使用 Pytorch C++ CUDA Stream API 的更多详细信息。

获取 CUDA 流

Pytorch 的 C++ API 提供以下方法来获取 CUDA 流

  1. 从 CUDA 流池中获取一个新流,流是从池中预分配的,并以循环方式返回。

CUDAStream getStreamFromPool(const bool isHighPriority = false, DeviceIndex device = -1);

提示

您可以通过将 isHighPriority 设置为 true 来请求来自高优先级池的流,或者通过设置设备索引(默认为当前 CUDA 流的设备索引)来请求特定设备的流。

  1. 获取传递的 CUDA 设备的默认 CUDA 流,如果没有传递设备索引,则获取当前设备的默认 CUDA 流。

CUDAStream getDefaultCUDAStream(DeviceIndex device_index = -1);

提示

当您没有显式使用流时,大多数计算都发生在默认流中。

  1. 获取当前 CUDA 流,用于索引为 device_index 的 CUDA 设备,如果没有传递设备索引,则获取当前设备的 CUDA 流。

CUDAStream getCurrentCUDAStream(DeviceIndex device_index = -1);

提示

当前 CUDA 流通常是设备的默认 CUDA 流,但如果有人调用了 setCurrentCUDAStream 或使用了 StreamGuardCUDAStreamGuard,则可能有所不同。

设置 CUDA 流

Pytorch 的 C++ API 提供以下方法来设置 CUDA 流

  1. 将传递的流所在设备上的当前流设置为传递的流。

void setCurrentCUDAStream(CUDAStream stream);

注意

此函数可能与当前设备无关。它仅更改流所在设备上的当前流。我们建议改为使用 CUDAStreamGuard,因为它会切换到流的设备并将其设置为该设备上的当前流。CUDAStreamGuard 还会在销毁时恢复当前设备和流

  1. 使用 CUDAStreamGuard 在作用域内切换到 CUDA 流,它在 CUDAStreamGuard.h 中定义

提示

如果您需要在多个 CUDA 设备上设置流,请使用 CUDAMultiStreamGuard

CUDA 流使用示例

  1. 在同一设备上获取和设置 CUDA 流

// This example shows how to acquire and set CUDA stream on the same device.
// `at::cuda::setCurrentCUDAStream` is used to set current CUDA stream

// create a tensor on device 0
torch::Tensor tensor0 = torch::ones({2, 2}, torch::device(torch::kCUDA));
// get a new CUDA stream from CUDA stream pool on device 0
at::cuda::CUDAStream myStream = at::cuda::getStreamFromPool();
// set current CUDA stream from default stream to `myStream` on device 0
at::cuda::setCurrentCUDAStream(myStream);
// sum() on tensor0 uses `myStream` as current CUDA stream
tensor0.sum();

// get the default CUDA stream on device 0
at::cuda::CUDAStream defaultStream = at::cuda::getDefaultCUDAStream();
// set current CUDA stream back to default CUDA stream on device 0
at::cuda::setCurrentCUDAStream(defaultStream);
// sum() on tensor0 uses `defaultStream` as current CUDA stream
tensor0.sum();
// This example is the same as previous example, but explicitly specify device
// index and use CUDA stream guard to set current CUDA stream

// create a tensor on device 0
torch::Tensor tensor0 = torch::ones({2, 2}, torch::device(torch::kCUDA));
// get a new stream from CUDA stream pool on device 0
at::cuda::CUDAStream myStream = at::cuda::getStreamFromPool(false, 0);
// set the current CUDA stream to `myStream` within the scope using CUDA stream guard
{
  at::cuda::CUDAStreamGuard guard(myStream);
  // current CUDA stream is `myStream` from here till the end of bracket.
  // sum() on tensor0 uses `myStream` as current CUDA stream
  tensor0.sum();
}
// current CUDA stream is reset to default CUDA stream after CUDA stream guard is destroyed
// sum() on tensor0 uses default CUDA stream on device 0 as current CUDA stream
tensor0.sum();

注意

以上代码在同一 CUDA 设备上运行。setCurrentCUDAStream 将始终在当前设备上设置当前 CUDA 流,但请注意,setCurrentCUDAStream 实际上是在传递的 CUDA 流所在的设备上设置当前流。

  1. 在多个设备上获取和设置 CUDA 流。

// This example shows how to acquire and set CUDA stream on two devices.

// acquire new CUDA streams from CUDA stream pool on device 0 and device 1
at::cuda::CUDAStream myStream0 = at::cuda::getStreamFromPool(false, 0);
at::cuda::CUDAStream myStream1 = at::cuda::getStreamFromPool(false, 1);

// set current CUDA stream to `myStream0` on device 0
at::cuda::setCurrentCUDAStream(myStream0);
// set current CUDA stream to `myStream1` on device 1
at::cuda::setCurrentCUDAStream(myStream1);

// create a tensor on device 0, no need to specify device index since
// current device index is 0
torch::Tensor tensor0 = torch::ones({2, 2}, torch::device(at::kCUDA));
// sum() on tensor0 use `myStream0` as current CUDA stream on device 0
tensor0.sum();

// change the current device index to 1 by using CUDA device guard within a bracket scope
{
  at::cuda::CUDAGuard device_guard{1};
  // create a tensor on device 1
  torch::Tensor tensor1 = torch::ones({2, 2}, torch::device(at::kCUDA));
  // sum() on tensor 1 uses `myStream1` as current CUDA stream on device 1
  tensor1.sum();
}

// current device is reset to device 0 after device_guard is destroyed

// acquire a new CUDA stream on device 1
at::cuda::CUDAStream myStream1_1 = at::cuda::getStreamFromPool(false, 1);
// create a new tensor on device 1
torch::Tensor tensor1 = torch::ones({2, 2}, torch::device({torch::kCUDA, 1}));

// change the current device index to 1 and current CUDA stream on device 1
// to `myStream1_1` using CUDA stream guard within a scope
{
  at::cuda::CUDAStreamGuard stream_guard(myStream1_1);
  // sum() on tensor1 use `myStream1_1` as current CUDA stream on device 1
  tensor1.sum();
}

// current device is reset to device 0 and current CUDA stream on device 1 is
// reset to `myStream1`

// sum() on tensor1 uses `myStream1` as current CUDA stream on device 1
tensor1.sum();
  1. 使用 CUDA 多流保护

// This example shows how to use CUDA multistream guard to set
// two streams on two devices at the same time.

// create two tensor, one on device 0, one on device 1
torch::Tensor tensor0 = torch::ones({2, 2}, torch::device({torch::kCUDA, 0}));
torch::Tensor tensor1 = torch::ones({2, 2}, torch::device({torch::kCUDA, 1}));

// acquire new CUDA streams from CUDA stream pool on device 0 and device 1
at::cuda::CUDAStream myStream0 = at::cuda::getStreamFromPool(false, 0);
at::cuda::CUDAStream myStream1 = at::cuda::getStreamFromPool(false, 1);

// set current CUDA stream on device 0 to `myStream0` and
// set current CUDA stream on device 1 to `myStream1` CUDA using multistream guard
{
  at::cuda::CUDAMultiStreamGuard multi_guard({myStream0, myStream1});

  // sum() on tensor0 uses `myStream0` as current CUDA stream on device 0
  tensor0.sum();
  // sum() on tensor1 uses `myStream1` as current CUDA stream on device 1
  tensor1.sum();
}

// current CUDA stream on device 0 is reset to default CUDA stream on device 0
// current CUDA stream on device 1 is reset to default CUDA stream on device 1

// sum() on tensor0 uses default CUDA stream as current CUDA stream on device 0
tensor0.sum();
// sum() on tensor1 uses default CUDA stream as current CUDA stream on device 1
tensor1.sum();

注意

CUDAMultiStreamGuard 不会更改当前设备索引,它仅更改每个传递的流所在设备上的流。除了作用域控制之外,此保护等效于对每个传递的流调用 setCurrentCUDAStream

  1. 处理多个设备上的 CUDA 流的骨架示例

// This is a skeleton example that shows how to handle CUDA streams on multiple devices
// Suppose you want to do work on the non-default stream on two devices simultaneously, and we
// already have streams on both devices in two vectors. The following code shows three ways
// of acquiring and setting the streams.

// Usage 0: acquire CUDA stream and set current CUDA stream with `setCurrentCUDAStream`
// Create a CUDA stream vector `streams0` on device 0
std::vector<at::cuda::CUDAStream> streams0 =
  {at::cuda::getDefaultCUDAStream(), at::cuda::getStreamFromPool()};
// set current stream as `streams0[0]` on device 0
at::cuda::setCurrentCUDAStream(streams0[0]);

// create a CUDA stream vector `streams1` on device using CUDA device guard
std::vector<at::cuda::CUDAStream> streams1;
{
  // device index is set to 1 within this scope
  at::cuda::CUDAGuard device_guard(1);
  streams1.push_back(at::cuda::getDefaultCUDAStream());
  streams1.push_back(at::cuda::getStreamFromPool());
}
// device index is reset to 0 after device_guard is destroyed

// set current stream as `streams1[0]` on device 1
at::cuda::setCurrentCUDAStream(streams1[0]);


// Usage 1: use CUDA device guard to change the current device index only
{
  at::cuda::CUDAGuard device_guard(1);

  // current device index is changed to 1 within scope
  // current CUDA stream is still `streams1[0]` on device 1, no change
}
// current device index is reset to 0 after `device_guard` is destroyed


// Usage 2: use CUDA stream guard to change both current device index and current CUDA stream.
{
  at::cuda::CUDAStreamGuard stream_guard(streams1[1]);

  // current device index and current CUDA stream are set to 1 and `streams1[1]` within scope
}
// current device index and current CUDA stream are reset to 0 and `streams0[0]` after
// stream_guard is destroyed


// Usage 3: use CUDA multi-stream guard to change multiple streams on multiple devices
{
  // This is the same as calling `torch::cuda::setCurrentCUDAStream` on both streams
  at::cuda::CUDAMultiStreamGuard multi_guard({streams0[1], streams1[1]});

  // current device index is not change, still 0
  // current CUDA stream on device 0 and device 1 are set to `streams0[1]` and `streams1[1]`
}
// current CUDA stream on device 0 and device 1 are reset to `streams0[0]` and `streams1[0]`
// after `multi_guard` is destroyed.

文档

访问 PyTorch 的全面开发者文档

查看文档

教程

获取针对初学者和高级开发人员的深入教程

查看教程

资源

查找开发资源并获得问题的解答

查看资源