CrossEntropyLoss¶

类别 torch.nn.CrossEntropyLoss(weight=None, size_average=None, ignore_index=-100, reduce=None, reduction='mean', label_smoothing=0.0)[来源][来源]¶

此准则计算输入 logits 和目标之间的交叉熵损失。

当训练一个有 C 个类别的分类问题时，它很有用。如果提供，可选参数 weight 应该是一个 1D Tensor，为每个类别分配权重。这在训练集不平衡时特别有用。

input 期望包含每个类别的未归一化 logits（通常不需要为正或总和为 1）。对于非批处理输入，input 必须是大小为 $(C)$ 的 Tensor；对于批处理输入，大小为 $(minibatch, C)$ ；或者对于 K 维情况，大小为 $(minibatch, C, d_1, d_2, ..., d_K)$ ，其中 $K \geq 1$ 。最后一种情况对于更高维度的输入很有用，例如计算 2D 图像的逐像素交叉熵损失。

此准则期望的 target 应包含以下两种形式之一：

在范围 $[0, C)$ 内的类别索引，其中 $C$ 是类别数量；如果指定了 ignore_index，该损失函数也会接受此类别索引（此索引不一定在类别范围内）。在这种情况下，未进行归约（即 reduction 设置为 'none'）的损失可以描述为：

$\ell(x, y) = L = \{l_1,\dots,l_N\}^\top, \quad l_n = - w_{y_n} \log \frac{\exp(x_{n,y_n})}{\sum_{c=1}^C \exp(x_{n,c})} \cdot \mathbb{1}\{y_n \not= \text{ignore\_index}\}$
其中 $x$ 是输入， $y$ 是目标， $w$ 是权重， $C$ 是类别数量， $N$ 涵盖 minibatch 维度以及 K 维情况下的 $d_1, ..., d_k$ 。如果 reduction 不是 'none'（默认为 'mean'），则

$\ell(x, y) = \begin{cases} \sum_{n=1}^N \frac{1}{\sum_{n=1}^N w_{y_n} \cdot \mathbb{1}\{y_n \not= \text{ignore\_index}\}} l_n, & \text{if reduction} = \text{`mean';}\\ \sum_{n=1}^N l_n, & \text{if reduction} = \text{`sum'.} \end{cases}$
注意，这种情况等价于对输入应用 LogSoftmax，然后应用 NLLLoss。
每类别的概率；在要求每个 mini-batch 项目的标签超出单个类别时很有用，例如用于混合标签（blended labels）、标签平滑（label smoothing）等。在这种情况下，未归约（即 reduction 设置为 'none'）的损失可以描述为

$\ell(x, y) = L = \{l_1,\dots,l_N\}^\top, \quad l_n = - \sum_{c=1}^C w_c \log \frac{\exp(x_{n,c})}{\sum_{i=1}^C \exp(x_{n,i})} y_{n,c}$
其中 $x$ 是输入， $y$ 是目标， $w$ 是权重， $C$ 是类别数量， $N$ 涵盖 minibatch 维度以及 K 维情况下的 $d_1, ..., d_k$ 。如果 reduction 不是 'none'（默认为 'mean'），则

$\ell(x, y) = \begin{cases} \frac{\sum_{n=1}^N l_n}{N}, & \text{if reduction} = \text{`mean';}\\ \sum_{n=1}^N l_n, & \text{if reduction} = \text{`sum'.} \end{cases}$

注意

请注意，当 target 包含类别索引时，此标准的性能通常更佳，因为这允许进行优化的计算。仅当每个 mini-batch 项目的单个类别标签限制性太大时，才考虑将 target 提供为类别概率。

参数

weight (Tensor, optional) – 给每个类别手动分配的重缩放权重。如果给出，必须是大小为 C 的 Tensor，且数据类型为浮点型。
size_average (bool, optional) – 已废弃（参见 reduction）。默认情况下，损失在 batch 中的每个损失元素上求平均。请注意，对于某些损失，每个样本有多个元素。如果字段 size_average 设置为 False，则改为对每个 mini-batch 的损失求和。当 reduce 为 False 时忽略。默认值：True
ignore_index (int, optional) – 指定一个被忽略且不计入输入梯度的目标值。当 size_average 为 True 时，损失在非忽略目标上求平均。请注意，ignore_index 仅适用于 target 包含类别索引的情况。
reduce (bool, optional) – 已废弃（参见 reduction）。默认情况下，根据 size_average 的设置，损失在每个 mini-batch 的观测值上求平均或求和。当 reduce 为 False 时，转而返回每个 batch 元素的损失，并忽略 size_average。默认值：True
reduction (str, optional) – 指定应用于输出的归约方式：'none' | 'mean' | 'sum'。'none'：不应用归约，'mean'：对输出求加权平均，'sum'：对输出求和。注意：size_average 和 reduce 正在被废弃，同时，指定这两个参数中的任何一个都将覆盖 reduction 的设置。默认值：'mean'
label_smoothing (float, optional) – 一个在 [0.0, 1.0] 范围内的浮点数。指定计算损失时平滑的数量，其中 0.0 表示不进行平滑。目标变成原始真实标签和均匀分布的混合，如论文 Rethinking the Inception Architecture for Computer Vision 中所述。默认值： $0.0$ 。

形状

输入: 形状为 $(C)$ 、 $(N, C)$ 或 $(N, C, d_1, d_2, ..., d_K)$ ，其中 $K \geq 1$ 表示 K 维损失的情况。
目标：如果包含类别索引，形状为 $()$ 、 $(N)$ 或 $(N, d_1, d_2, ..., d_K)$ ，在 K 维损失的情况下，其中 $K \geq 1$ 且每个值应介于 $[0, C)$ 。使用类别索引时，目标数据类型必须为 long。如果包含类别概率，目标必须与输入形状相同，并且每个值应介于 $[0, 1]$ 。这意味着使用类别概率时，目标数据类型必须为 float。
输出：如果 reduction 为 ‘none’，形状为 $()$ 、 $(N)$ 或 $(N, d_1, d_2, ..., d_K)$ ，在 K 维损失的情况下，其中 $K \geq 1$ ，取决于输入的形状。否则，为标量。

其中

\begin{aligned} C ={} & \text{number of classes} \\ N ={} & \text{batch size} \\ \end{aligned}

示例

>>> # Example of target with class indices
>>> loss = nn.CrossEntropyLoss()
>>> input = torch.randn(3, 5, requires_grad=True)
>>> target = torch.empty(3, dtype=torch.long).random_(5)
>>> output = loss(input, target)
>>> output.backward()
>>>
>>> # Example of target with class probabilities
>>> input = torch.randn(3, 5, requires_grad=True)
>>> target = torch.randn(3, 5).softmax(dim=1)
>>> output = loss(input, target)
>>> output.backward()

CrossEntropyLoss¶

文档

教程

资源