展开¶

class torch.nn.Unfold(kernel_size, dilation=1, padding=0, stride=1)[source]¶

从批处理输入张量中提取滑动局部块。

考虑一个批处理 input 张量，其形状为 $(N, C, *)$ ，其中 $N$ 是批处理维度， $C$ 是通道维度，而 $*$ 代表任意空间维度。此操作将 input 空间维度中每个滑动的 kernel_size 大小块展平为 3-D output 张量的列（即最后一维），形状为 $(N, C \times \prod(\text{kernel\_size}), L)$ ，其中 $C \times \prod(\text{kernel\_size})$ 是每个块中的值的总数（一个块有 $\prod(\text{kernel\_size})$ 个空间位置，每个位置包含一个 $C$ 通道向量），而 $L$ 是此类块的总数

L = \prod_d \left\lfloor\frac{\text{spatial\_size}[d] + 2 \times \text{padding}[d] % - \text{dilation}[d] \times (\text{kernel\_size}[d] - 1) - 1}{\text{stride}[d]} + 1\right\rfloor,

其中 $\text{spatial\_size}$ 由 input ( $*$ 上述) 的空间维度组成，而 $d$ 表示所有空间维度。

因此，对最后一个维度（列维度）的 output 进行索引，可以获得某个块中的所有值。

padding、stride 和 dilation 参数指定了滑动块的获取方式。

stride 控制滑动块的步长。
padding 控制在对输入进行重塑之前，在两侧添加隐式零填充的数量。对于每个维度，在两侧添加 padding 个点。
dilation 控制内核点之间的间距，也称为 à trous 算法。它很难描述，但这个链接对 dilation 的作用进行了很好的可视化。

参数

kernel_size (int 或 tuple) – 滑动块的大小
dilation (int 或 tuple, 可选) – 控制邻域内元素步长的参数。默认值：1
padding (int 或 tuple, 可选) – 在输入两侧添加的隐式零填充。默认值：0
stride (int 或 tuple, 可选) – 滑动块在输入空间维度上的步长。默认值：1

如果 kernel_size、dilation、padding 或 stride 是一个整数或长度为 1 的元组，则它们的值将复制到所有空间维度上。
对于两个输入空间维度的案例，此操作有时被称为 im2col。

注意

Fold 通过将所有包含块中的所有值相加来计算结果中大型张量中的每个组合值。 Unfold 通过从大型张量中复制来提取局部块中的值。因此，如果块重叠，它们就不是彼此的逆运算。

一般而言，折叠和展开操作的关系如下。考虑使用相同参数创建的 Fold 和 Unfold 实例。

>>> fold_params = dict(kernel_size=..., dilation=..., padding=..., stride=...)
>>> fold = nn.Fold(output_size=..., **fold_params)
>>> unfold = nn.Unfold(**fold_params)

那么对于任何（支持的）input 张量，以下等式成立

fold(unfold(input)) == divisor * input

其中 divisor 是一个张量，它仅取决于 input 的形状和数据类型。

>>> input_ones = torch.ones(input.shape, dtype=input.dtype)
>>> divisor = fold(unfold(input_ones))

当 divisor 张量不包含任何零元素时，fold 和 unfold 操作是彼此的逆运算（直到常数除数）。

警告

目前，仅支持 4 维输入张量（批处理图像类张量）。

形状

输入： $(N, C, *)$
输出： $(N, C \times \prod(\text{kernel\_size}), L)$ ，如上所述

示例

>>> unfold = nn.Unfold(kernel_size=(2, 3))
>>> input = torch.randn(2, 5, 3, 4)
>>> output = unfold(input)
>>> # each patch contains 30 values (2x3=6 vectors, each of 5 channels)
>>> # 4 blocks (2x3 kernels) in total in the 3x4 input
>>> output.size()
torch.Size([2, 30, 4])

>>> # Convolution is equivalent with Unfold + Matrix Multiplication + Fold (or view to output shape)
>>> inp = torch.randn(1, 3, 10, 12)
>>> w = torch.randn(2, 3, 4, 5)
>>> inp_unf = torch.nn.functional.unfold(inp, (4, 5))
>>> out_unf = inp_unf.transpose(1, 2).matmul(w.view(w.size(0), -1).t()).transpose(1, 2)
>>> out = torch.nn.functional.fold(out_unf, (7, 8), (1, 1))
>>> # or equivalently (and avoiding a copy),
>>> # out = out_unf.view(1, 2, 7, 8)
>>> (torch.nn.functional.conv2d(inp, w) - out).abs().max()
tensor(1.9073e-06)

展开¶

文档

教程

资源