functorch.vmap¶

functorch.vmap(func, in_dims=0, out_dims=0, randomness='error', *, chunk_size=None)[source]¶

vmap 是向量化映射；vmap(func) 返回一个新函数，该函数对输入的某个维度进行映射。在语义上，vmap 将映射推送到 func 调用的 PyTorch 操作中，有效地将这些操作向量化。

vmap 对于处理批处理维度很有用：可以编写一个在示例上运行的函数 func，然后使用 vmap(func) 将其提升为可以接收示例批处理的函数。vmap 也可用于在与 autograd 组合时计算批处理梯度。

注意

torch.vmap() 与 torch.func.vmap() 别名为方便起见。使用您喜欢的任何一个。

参数

func (function) – 一个接受一个或多个参数的 Python 函数。必须返回一个或多个张量。
in_dims (int or nested structure) – 指定输入的哪个维度应该被映射。in_dims 的结构应该与输入类似。如果某个特定输入的 in_dim 为 None，则表示没有映射维度。默认值：0。
out_dims (int or Tuple[int]) – 指定映射维度应该出现在输出中的位置。如果 out_dims 是元组，则它应该每个输出包含一个元素。默认值：0。
randomness (str) – 指定此 vmap 中的随机性在批次之间是否应该相同或不同。如果为“different”，则每个批次的随机性将不同。如果为“same”，则随机性在批次之间将相同。如果为“error”，则对随机函数的任何调用都会出错。默认值：'error'。警告：此标志仅适用于随机 PyTorch 操作，不适用于 Python 的随机模块或 numpy 随机性。
chunk_size (None or int) – 如果为 None（默认），则对输入应用单个 vmap。如果非 None，则每次计算 chunk_size 个样本的 vmap。请注意，chunk_size=1 等效于使用 for 循环计算 vmap。如果您在计算 vmap 时遇到内存问题，请尝试使用非 None 的 chunk_size。

返回

返回一个新的“批处理”函数。它接收与 func 相同的输入，除了每个输入在由 in_dims 指定的索引处具有额外维度。它接收与 func 相同的输出，除了每个输出在由 out_dims 指定的索引处具有额外维度。

使用 vmap() 的一个示例是计算批处理点积。PyTorch 没有提供批处理 torch.dot API；不用费心在文档中查找，而是使用 vmap() 来构建一个新函数。

>>> torch.dot                            # [D], [D] -> []
>>> batched_dot = torch.func.vmap(torch.dot)  # [N, D], [N, D] -> [N]
>>> x, y = torch.randn(2, 5), torch.randn(2, 5)
>>> batched_dot(x, y)

vmap() 有助于隐藏批处理维度，从而简化模型编写体验。

>>> batch_size, feature_size = 3, 5
>>> weights = torch.randn(feature_size, requires_grad=True)
>>>
>>> def model(feature_vec):
>>>     # Very simple linear model with activation
>>>     return feature_vec.dot(weights).relu()
>>>
>>> examples = torch.randn(batch_size, feature_size)
>>> result = torch.vmap(model)(examples)

vmap() 还可以帮助向量化以前难以或无法批处理的计算。一个示例是高阶梯度计算。PyTorch autograd 引擎计算 vjps（向量雅可比积）。为某些函数 f: R^N -> R^N 计算完整的雅可比矩阵通常需要 N 次调用 autograd.grad，每行雅可比矩阵一次。使用 vmap()，我们可以向量化整个计算，在一个对 autograd.grad 的调用中计算雅可比矩阵。

>>> # Setup
>>> N = 5
>>> f = lambda x: x ** 2
>>> x = torch.randn(N, requires_grad=True)
>>> y = f(x)
>>> I_N = torch.eye(N)
>>>
>>> # Sequential approach
>>> jacobian_rows = [torch.autograd.grad(y, x, v, retain_graph=True)[0]
>>>                  for v in I_N.unbind()]
>>> jacobian = torch.stack(jacobian_rows)
>>>
>>> # vectorized gradient computation
>>> def get_vjp(v):
>>>     return torch.autograd.grad(y, x, v)
>>> jacobian = torch.vmap(get_vjp)(I_N)

vmap() 也可以嵌套，生成具有多个批处理维度的输出

>>> torch.dot                            # [D], [D] -> []
>>> batched_dot = torch.vmap(torch.vmap(torch.dot))  # [N1, N0, D], [N1, N0, D] -> [N1, N0]
>>> x, y = torch.randn(2, 3, 5), torch.randn(2, 3, 5)
>>> batched_dot(x, y) # tensor of size [2, 3]

如果输入不是沿着第一个维度进行批处理的，则 in_dims 指定每个输入沿着哪个维度进行批处理

>>> torch.dot                            # [N], [N] -> []
>>> batched_dot = torch.vmap(torch.dot, in_dims=1)  # [N, D], [N, D] -> [D]
>>> x, y = torch.randn(2, 5), torch.randn(2, 5)
>>> batched_dot(x, y)   # output is [5] instead of [2] if batched along the 0th dimension

如果有多个输入，每个输入都沿着不同的维度进行批处理，则 in_dims 必须是一个元组，其中包含每个输入的批处理维度

>>> torch.dot                            # [D], [D] -> []
>>> batched_dot = torch.vmap(torch.dot, in_dims=(0, None))  # [N, D], [D] -> [N]
>>> x, y = torch.randn(2, 5), torch.randn(5)
>>> batched_dot(x, y) # second arg doesn't have a batch dim because in_dim[1] was None

如果输入是 Python 结构，则 in_dims 必须是一个元组，其中包含与输入形状匹配的结构

>>> f = lambda dict: torch.dot(dict['x'], dict['y'])
>>> x, y = torch.randn(2, 5), torch.randn(5)
>>> input = {'x': x, 'y': y}
>>> batched_dot = torch.vmap(f, in_dims=({'x': 0, 'y': None},))
>>> batched_dot(input)

默认情况下，输出沿着第一个维度进行批处理。但是，可以通过使用 out_dims 将其沿着任何维度进行批处理

>>> f = lambda x: x ** 2
>>> x = torch.randn(2, 5)
>>> batched_pow = torch.vmap(f, out_dims=1)
>>> batched_pow(x) # [5, 2]

对于任何使用 kwargs 的函数，返回的函数将不会对 kwargs 进行批处理，但会接受 kwargs

>>> x = torch.randn([2, 5])
>>> def fn(x, scale=4.):
>>>   return x * scale
>>>
>>> batched_pow = torch.vmap(fn)
>>> assert torch.allclose(batched_pow(x), x * 4)
>>> batched_pow(x, scale=x) # scale is not batched, output has shape [2, 2, 5]

注意

vmap 不会提供一般的自动批处理，也不会开箱即用地处理可变长度序列。

警告

我们已将 functorch 集成到 PyTorch 中。作为集成的最后一步，functorch.vmap 从 PyTorch 2.0 开始已被弃用，并将被删除，在 PyTorch >= 2.3 的未来版本中。请改用 torch.vmap；有关更多详细信息，请参阅 PyTorch 2.0 版本说明和/或 torch.func 迁移指南 https://pytorch.ac.cn/docs/master/func.migrating.html

functorch.vmap¶

文档

教程

资源