张量基础#

支持 PyTorch 的 ATen 张量库是一个简单的张量库，它直接在 C++17 中公开了 Torch 中的张量操作。ATen 的 API 是从 PyTorch 使用的相同声明中自动生成的，因此这两个 API 将随时间同步。

张量类型是动态解析的，因此 API 是通用的，不包含模板。也就是说，存在一个 Tensor 类型。它可以保存 CPU 或 CUDA 张量，并且该张量可以包含 Doubles、Float、Ints 等类型的数据。这种设计使得无需对所有内容进行模板化即可轻松编写通用代码。

请参阅 https://pytorch.ac.cn/cppdocs/api/namespace_at.html#functions 以获取提供的 API。摘录

Tensor atan2(const Tensor & other) const;
Tensor & atan2_(const Tensor & other);
Tensor pow(Scalar exponent) const;
Tensor pow(const Tensor & exponent) const;
Tensor & pow_(Scalar exponent);
Tensor & pow_(const Tensor & exponent);
Tensor lerp(const Tensor & end, Scalar weight) const;
Tensor & lerp_(const Tensor & end, Scalar weight);
Tensor histc() const;
Tensor histc(int64_t bins) const;
Tensor histc(int64_t bins, Scalar min) const;
Tensor histc(int64_t bins, Scalar min, Scalar max) const;

还提供了原地操作，它们总是以 _ 为后缀，以表明它们将修改张量。

高效访问张量元素#

使用张量范围的操作时，动态分派的相对开销非常小。然而，在某些情况下，特别是在您自己的内核中，需要高效的元素级访问，此时元素级循环内的动态分派开销会非常高。ATen 提供了访问器，这些访问器通过单次动态检查张量的类型和维数来创建。然后，访问器公开了一个用于高效访问张量元素的 API。

访问器是张量的临时视图。它们仅在所查看的张量的生命周期内有效，因此应像迭代器一样仅在函数内部局部使用。

请注意，访问器与内核函数内的 CUDA 张量不兼容。相反，您必须使用打包访问器，它的行为方式相同，但复制张量元数据而不是指向它。

因此，建议对 CPU 张量使用访问器，对 CUDA 张量使用打包访问器。

CPU 访问器#

torch::Tensor foo = torch::rand({12, 12});

// assert foo is 2-dimensional and holds floats.
auto foo_a = foo.accessor<float,2>();
float trace = 0;

for(int i = 0; i < foo_a.size(0); i++) {
  // use the accessor foo_a to get tensor data.
  trace += foo_a[i][i];
}

CUDA 访问器#

__global__ void packed_accessor_kernel(
    torch::PackedTensorAccessor64<float, 2> foo,
    float* trace) {
  int i = threadIdx.x;
  gpuAtomicAdd(trace, foo[i][i]);
}

torch::Tensor foo = torch::rand({12, 12});

// assert foo is 2-dimensional and holds floats.
auto foo_a = foo.packed_accessor64<float,2>();
float trace = 0;

packed_accessor_kernel<<<1, 12>>>(foo_a, &trace);

除了 PackedTensorAccessor64 和 packed_accessor64 之外，还有相应的 PackedTensorAccessor32 和 packed_accessor32，它们使用 32 位整数进行索引。这在 CUDA 上会快得多，但可能导致索引计算中的溢出。

请注意，模板可以包含其他参数，例如指针限制和用于索引的整数类型。有关访问器和打包访问器的详细模板描述，请参阅文档。

使用外部创建的数据#

如果您已经在内存中分配了张量数据（CPU 或 CUDA），您可以在 ATen 中将该内存视为一个 Tensor

float data[] = { 1, 2, 3,
                 4, 5, 6 };
torch::Tensor f = torch::from_blob(data, {2, 3});

这些张量不能调整大小，因为 ATen 不拥有该内存，但它们在其他方面表现得像普通张量一样。

标量和零维张量#

除了 Tensor 对象之外，ATen 还包括表示单个数字的 Scalar。与 Tensor 一样，Scalar 是动态类型的，可以保存 ATen 的任何数字类型。Scalar 可以从 C++ 数字类型隐式构造。需要 Scalar 是因为一些函数，例如 addmm，接受数字和张量，并期望这些数字与张量具有相同的动态类型。它们还用于 API 中，以指示函数总是返回 Scalar 值的位置，例如 sum。

namespace torch {
Tensor addmm(Scalar beta, const Tensor & self,
             Scalar alpha, const Tensor & mat1,
             const Tensor & mat2);
Scalar sum(const Tensor & self);
} // namespace torch

// Usage.
torch::Tensor a = ...
torch::Tensor b = ...
torch::Tensor c = ...
torch::Tensor r = torch::addmm(1.0, a, .5, b, c);

除了 Scalar 之外，ATen 还允许 Tensor 对象是零维的。这些张量包含单个值，并且可以是更大 Tensor` 中单个元素的引用。它们可以在任何期望 Tensor 的地方使用。它们通常由像 select 这样减少 Tensor` 维度的操作符创建。


torch::Tensor two = torch::rand({10, 20});
two[1][2] = 4;
// ^^^^^^ <- zero-dimensional Tensor