PyTorch 0.4.0 迁移指南

作者：PyTorch 团队

欢迎阅读 PyTorch 0.4.0 迁移指南。在此版本中，我们引入了许多令人兴奋的新功能和重要的错误修复，旨在为用户提供更好、更清晰的接口。在本指南中，我们将介绍从以前版本迁移现有代码时最重要的更改。

Tensors 和 Variables 已合并
支持 0 维（标量）Tensors
volatile 标记已弃用
dtypes、devices 和 NumPy 风格的 Tensor 创建函数
编写与设备无关的代码
nn.Module 中子模块、参数和缓冲器名称的新边缘情况限制

合并 `Tensor` 和 `Variable` 类

torch.Tensor 和 torch.autograd.Variable 现在是同一个类。更准确地说，torch.Tensor 能够追踪历史记录，其行为类似于旧的 Variable；Variable 包装仍然像以前一样工作，但返回的对象类型是 torch.Tensor。这意味着您不再需要在代码中的每个地方都使用 Variable 包装器。

Tensor 的 `type()` 已更改

另请注意，Tensor 的 type() 不再反映数据类型。请改用 isinstance() 或 x.type()。

>>> x = torch.DoubleTensor([1, 1, 1])
>>> print(type(x))  # was torch.DoubleTensor
"<class 'torch.Tensor'>"
>>> print(x.type())  # OK: 'torch.DoubleTensor'
'torch.DoubleTensor'
>>> print(isinstance(x, torch.DoubleTensor))  # OK: True
True

现在 `autograd` 何时开始追踪历史记录？

requires_grad 是 autograd 的核心标记，现在是 Tensors 的一个属性。以前用于 Variables 的规则现在也适用于 Tensors；当操作的任何输入 Tensor 的 requires_grad=True 时，autograd 开始追踪历史记录。例如，

>>> x = torch.ones(1)  # create a tensor with requires_grad=False (default)
>>> x.requires_grad
False
>>> y = torch.ones(1)  # another tensor with requires_grad=False
>>> z = x + y
>>> # both inputs have requires_grad=False. so does the output
>>> z.requires_grad
False
>>> # then autograd won't track this computation. let's verify!
>>> z.backward()
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn
>>>
>>> # now create a tensor with requires_grad=True
>>> w = torch.ones(1, requires_grad=True)
>>> w.requires_grad
True
>>> # add to the previous result that has require_grad=False
>>> total = w + z
>>> # the total sum now requires grad!
>>> total.requires_grad
True
>>> # autograd can compute the gradients as well
>>> total.backward()
>>> w.grad
tensor([ 1.])
>>> # and no computation is wasted to compute gradients for x, y and z, which don't require grad
>>> z.grad == x.grad == y.grad == None
True

操作 `requires_grad` 标记

除了直接设置属性外，您还可以使用 my_tensor.requires_grad_() 原地更改此标记，或者像上面的示例一样，在创建时将其作为参数传入（默认为 False），例如，

>>> existing_tensor.requires_grad_()
>>> existing_tensor.requires_grad
True
>>> my_tensor = torch.zeros(3, 4, requires_grad=True)
>>> my_tensor.requires_grad
True

关于 `.data?`

.data 是从 Variable 中获取底层 Tensor 的主要方式。合并后，调用 y = x.data 仍然具有相似的语义。因此，y 将是一个 Tensor，与 x 共享相同的数据，与 x 的计算历史无关，并且 requires_grad=False。

但是，在某些情况下 .data 可能不安全。x.data 上的任何更改都不会被 autograd 追踪，如果反向传播时需要 x，计算的梯度将不正确。更安全的替代方法是使用 x.detach()，它也返回一个与 requires_grad=False 的 Tensor 共享数据，但如果反向传播时需要 x，其原地更改将被 autograd 报告。

以下是 .data 和 x.detach() 之间区别的示例（以及为什么我们通常推荐使用 detach）。

如果您使用 Tensor.detach()，梯度计算保证是正确的。

>>> a = torch.tensor([1,2,3.], requires_grad = True)
>>> out = a.sigmoid()
>>> c = out.detach()
>>> c.zero_()
tensor([ 0.,  0.,  0.])

>>> out  # modified by c.zero_() !!
tensor([ 0.,  0.,  0.])

>>> out.sum().backward()  # Requires the original value of out, but that was overwritten by c.zero_()
RuntimeError: one of the variables needed for gradient computation has been modified by an

但是，使用 Tensor.data 可能不安全，当计算梯度需要某个 tensor 却被原地修改时，很容易导致梯度不正确。

>>> a = torch.tensor([1,2,3.], requires_grad = True)
>>> out = a.sigmoid()
>>> c = out.data
>>> c.zero_()
tensor([ 0.,  0.,  0.])

>>> out  # out  was modified by c.zero_()
tensor([ 0.,  0.,  0.])

>>> out.sum().backward()
>>> a.grad  # The result is very, very wrong because `out` changed!
tensor([ 0.,  0.,  0.])

支持 0 维（标量）Tensors

以前，对 Tensor 向量（一维 tensor）进行索引会得到一个 Python 数，但对 Variable 向量进行索引（不一致地！）会得到一个大小为 (1,) 的向量！归约函数也有类似的行为，例如 tensor.sum() 会返回一个 Python 数，但 variable.sum() 会返回一个大小为 (1,) 的向量。

幸运的是，此版本在 PyTorch 中引入了对标量（0 维 tensor）的良好支持！可以使用新的 torch.tensor 函数创建标量（稍后将详细解释；现在可以将其视为 PyTorch 中等效于 numpy.array 的构造函数）。现在您可以做以下事情：

>>> torch.tensor(3.1416)         # create a scalar directly
tensor(3.1416)
>>> torch.tensor(3.1416).size()  # scalar is 0-dimensional
torch.Size([])
>>> torch.tensor([3]).size()     # compare to a vector of size 1
torch.Size([1])
>>>
>>> vector = torch.arange(2, 6)  # this is a vector
>>> vector
tensor([ 2.,  3.,  4.,  5.])
>>> vector.size()
torch.Size([4])
>>> vector[3]                    # indexing into a vector gives a scalar
tensor(5.)
>>> vector[3].item()             # .item() gives the value as a Python number
5.0
>>> mysum = torch.tensor([2, 3]).sum()
>>> mysum
tensor(5)
>>> mysum.size()
torch.Size([])

累积损失

考虑广泛使用的模式 total_loss += loss.data[0]。在 0.4.0 之前，loss 是一个包装大小为 (1,) 的 tensor 的 Variable，但在 0.4.0 中，loss 现在是一个标量，维度为 0。对标量进行索引没有意义（现在会发出警告，但在 0.5.0 中将成为硬性错误）。请使用 loss.item() 从标量中获取 Python 数。

请注意，如果在累积损失时不转换为 Python 数，您可能会发现程序中的内存使用量增加。这是因为上述表达式的右侧以前是 Python 浮点数，而现在是一个零维 Tensor。总损失因此累积了 Tensor 及其梯度历史，这可能会使大型 autograd 图保留比必要长得多的时间。

`volatile` 标记已弃用

volatile 标记现已弃用且无效。以前，涉及 volatile=True 的 Variable 的任何计算都不会被 autograd 追踪。现在这已被一套更灵活的上下文管理器取代，包括 torch.no_grad()、torch.set_grad_enabled(grad_mode) 等。

>>> x = torch.zeros(1, requires_grad=True)
>>> with torch.no_grad():
...     y = x * 2
>>> y.requires_grad
False
>>>
>>> is_train = False
>>> with torch.set_grad_enabled(is_train):
...     y = x * 2
>>> y.requires_grad
False
>>> torch.set_grad_enabled(True)  # this can also be used as a function
>>> y = x * 2
>>> y.requires_grad
True
>>> torch.set_grad_enabled(False)
>>> y = x * 2
>>> y.requires_grad
False

`dtypes`、`devices` 和 NumPy 风格的创建函数

在 PyTorch 之前的版本中，我们曾经将数据类型（例如浮点数 vs 双精度浮点数）、设备类型（cpu vs cuda）和布局（密集 vs 稀疏）一起指定为“张量类型”。例如，torch.cuda.sparse.DoubleTensor 是表示 double 数据类型、位于 CUDA 设备上且采用 COO 稀疏张量布局的 Tensor 类型。

在此版本中，我们引入了 torch.dtype、torch.device 和 torch.layout 类，以便通过 NumPy 风格的创建函数更好地管理这些属性。

`torch.dtype`

以下是可用的 torch.dtypes（数据类型）及其对应的 tensor 类型完整列表。

数据	`torch.dtype 类型`	Tensor 类型
32 位浮点数	`torch.float32` 或 `torch.float`	`torch.*.FloatTensor`
64 位浮点数	`torch.float64` 或 `torch.double`	`torch.*.DoubleTensor`
16 位浮点数	`torch.float16` 或 `torch.half`	`torch.*.HalfTensor`
8 位无符号整数	`torch.uint8`	`torch.*.ByteTensor`
8 位带符号整数	`torch.int8`	`torch.*.CharTensor`
16 位带符号整数	`torch.int16` 或 `torch.short`	`torch.*.ShortTensor`
32 位带符号整数	`torch.int32` 或 `torch.int`	`torch.*.IntTensor`
64 位带符号整数	`torch.int64` 或 `torch.long`	`torch.*.LongTensor`

可以通过 tensor 的 dtype 属性访问其 dtype。

`torch.device`

torch.device 包含一个设备类型（'cpu' 或 'cuda'）以及设备类型的可选设备序号 (id)。可以使用 torch.device('{device_type}') 或 torch.device('{device_type}:{device_ordinal}') 进行初始化。

如果未指定设备序号，则表示设备类型的当前设备；例如，torch.device('cuda') 等同于 torch.device('cuda:X')，其中 X 是 torch.cuda.current_device() 的结果。

可以通过 tensor 的 device 属性访问其设备。

`torch.layout`

torch.layout 表示 Tensor 的数据布局。目前支持 torch.strided（密集 tensor，默认）和 torch.sparse_coo（COO 格式的稀疏 tensor）。

可以通过 tensor 的 layout 属性访问其布局。

创建 Tensors

创建 Tensor 的方法现在也接受 dtype、device、layout 和 requires_grad 选项，以指定返回的 Tensor 的所需属性。例如，

>>> device = torch.device("cuda:1")
>>> x = torch.randn(3, 3, dtype=torch.float64, device=device)
tensor([[-0.6344,  0.8562, -1.2758],
        [ 0.8414,  1.7962,  1.0589],
        [-0.1369, -1.0462, -0.4373]], dtype=torch.float64, device='cuda:1')
>>> x.requires_grad  # default is False
False
>>> x = torch.zeros(3, requires_grad=True)
>>> x.requires_grad
True

`torch.tensor(数据, ...)`

torch.tensor 是新增的tensor 创建方法之一。它接受各种类数组数据，并将包含的值复制到新的 Tensor 中。如前所述，torch.tensor 相当于 NumPy 的 numpy.array 构造函数。与 torch.*Tensor 方法不同，您也可以通过这种方式创建零维 Tensor（即标量）（在 torch.*Tensor 方法中，单个 python 数被视为一个 Size）。此外，如果未提供 dtype 参数，它将根据数据推断合适的 dtype。这是从现有数据（如 Python 列表）创建 tensor 的推荐方法。例如，

>>> cuda = torch.device("cuda")
>>> torch.tensor([[1], [2], [3]], dtype=torch.half, device=cuda)
tensor([[ 1],
        [ 2],
        [ 3]], device='cuda:0')
>>> torch.tensor(1)               # scalar
tensor(1)
>>> torch.tensor([1, 2.3]).dtype  # type inferece
torch.float32
>>> torch.tensor([1, 2]).dtype    # type inferece
torch.int64

我们还添加了更多 tensor 创建方法。其中一些具有 torch.*_like 和/或 tensor.new_* 变体。

torch.*_like 接受一个输入 Tensor 而不是形状。除非另有指定，它返回一个具有与输入 Tensor 相同属性的 Tensor。

 >>> x = torch.randn(3, dtype=torch.float64)
 >>> torch.zeros_like(x)
 tensor([ 0.,  0.,  0.], dtype=torch.float64)
 >>> torch.zeros_like(x, dtype=torch.int)
 tensor([ 0,  0,  0], dtype=torch.int32)

tensor.new_* 也可以创建具有与 tensor 相同属性的 Tensors，但它总是接受一个形状参数。

 >>> x = torch.randn(3, dtype=torch.float64)
 >>> x.new_ones(2)
 tensor([ 1.,  1.], dtype=torch.float64)
 >>> x.new_ones(4, dtype=torch.int)
 tensor([ 1,  1,  1,  1], dtype=torch.int32)

要指定所需的形状，在大多数情况下可以使用元组（例如，torch.zeros((2, 3))）或可变参数（例如，torch.zeros(2, 3)）。

名称	返回的 Tensor	torch.*_like 变体	tensor.new_* 变体
`torch.empty`	未初始化的内存	✔	✔
`torch.zeros`	全零	✔	✔
`torch.ones`	全一	✔	✔
`torch.full`	填充给定值	✔	✔
`torch.rand`	独立同分布连续均匀分布 [0, 1)	✔
`torch.randn`	独立同分布正态分布 `Normal(0, 1)`	✔
`torch.randint`	给定范围内的独立同分布离散均匀分布	✔
`torch.randperm`	`{0, 1, ..., n - 1}` 的随机排列
`torch.tensor`	从现有数据（列表、NumPy ndarray 等）复制		✔
`torch.from_numpy`*	从 NumPy `ndarray` （共享存储，不复制）
`torch.arange`、`torch.range` 和 `torch.linspace`	给定范围内的均匀间隔值
`torch.logspace`	给定范围内的对数间隔值
`torch.eye`	单位矩阵

*: torch.from_numpy 只接受 NumPy ndarray 作为输入参数。

编写与设备无关的代码

之前的 PyTorch 版本难以编写与设备无关（即无需修改即可在启用 CUDA 的机器和仅 CPU 的机器上运行）的代码。

PyTorch 0.4.0 通过两种方式使其更容易：

Tensor 的 device 属性提供所有 Tensors 的 torch.device （get_device 仅适用于 CUDA tensors）
Tensors 和 Modules 的 to 方法可用于轻松将对象移动到不同设备（而不是必须根据上下文调用 cpu() 或 cuda()）

我们推荐以下模式：

# at beginning of the script
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

...

# then whenever you get a new Tensor or Module
# this won't copy if they are already on the desired device
input = data.to(device)
model = MyModule(...).to(device)

`nn.Module` 中子模块、参数和缓冲器名称的新边缘情况限制

在 module.add_module(name, value)、module.add_parameter(name, value) 或 module.add_buffer(name, value) 中，名称为或包含 "." 的不再允许，因为这些名称可能导致 state_dict 中的数据丢失。如果您正在加载包含此类名称的模块的检查点，请在加载之前更新模块定义并修补 state_dict。

代码示例（综合运用）

为了了解 0.4.0 中推荐的总体更改，我们来看一个 0.3.1 和 0.4.0 中常见代码模式的快速示例。

0.3.1（旧版）

model = MyRNN()
if use_cuda:
    model = model.cuda()

# train
total_loss = 0
for input, target in train_loader:
    input, target = Variable(input), Variable(target)
    hidden = Variable(torch.zeros(*h_shape))  # init hidden
    if use_cuda:
        input, target, hidden = input.cuda(), target.cuda(), hidden.cuda()
    ...  # get loss and optimize
    total_loss += loss.data[0]

# evaluate
for input, target in test_loader:
    input = Variable(input, volatile=True)
    if use_cuda:
        ...
    ...

0.4.0（新版）

# torch.device object used throughout this script
device = torch.device("cuda" if use_cuda else "cpu")

model = MyRNN().to(device)

# train
total_loss = 0
for input, target in train_loader:
    input, target = input.to(device), target.to(device)
    hidden = input.new_zeros(*h_shape)  # has the same device & dtype as `input`
    ...  # get loss and optimize
    total_loss += loss.item()           # get Python number from 1-element Tensor

# evaluate
with torch.no_grad():                   # operations inside don't track history
    for input, target in test_loader:
        ...

感谢阅读！请参阅我们的文档和发行说明以获取更多详细信息。

祝您使用 PyTorch 愉快！

合并 `Tensor` 和 `Variable` 类

Tensor 的 `type()` 已更改

现在 `autograd` 何时开始追踪历史记录？

操作 `requires_grad` 标记

关于 `.data?`

支持 0 维（标量）Tensors

累积损失

`volatile` 标记已弃用

`dtypes`、`devices` 和 NumPy 风格的创建函数

`torch.dtype`

`torch.device`

`torch.layout`

创建 Tensors

`torch.tensor(数据, ...)`

编写与设备无关的代码

`nn.Module` 中子模块、参数和缓冲器名称的新边缘情况限制

代码示例（综合运用）

文档

教程

资源

PyTorch 0.4.0 迁移指南

合并 Tensor 和 Variable 类

Tensor 的 type() 已更改

现在 autograd 何时开始追踪历史记录？

操作 requires_grad 标记

关于 .data?

支持 0 维（标量）Tensors

累积损失

volatile 标记已弃用

dtypes、devices 和 NumPy 风格的创建函数

torch.dtype

torch.device

torch.layout

创建 Tensors

torch.tensor(数据, ...)

编写与设备无关的代码

nn.Module 中子模块、参数和缓冲器名称的新边缘情况限制

代码示例（综合运用）

文档

教程

资源

合并 `Tensor` 和 `Variable` 类

Tensor 的 `type()` 已更改

现在 `autograd` 何时开始追踪历史记录？

操作 `requires_grad` 标记

关于 `.data?`

`volatile` 标记已弃用

`dtypes`、`devices` 和 NumPy 风格的创建函数

`torch.dtype`

`torch.device`

`torch.layout`

`torch.tensor(数据, ...)`

`nn.Module` 中子模块、参数和缓冲器名称的新边缘情况限制