KLDivLoss¶

class torch.nn.KLDivLoss(size_average=None, reduce=None, reduction='mean', log_target=False)[源代码][源代码]¶

Kullback-Leibler 散度损失。

对于形状相同的张量 $y_{\text{pred}},\ y_{\text{true}}$ ，其中 $y_{\text{pred}}$ 是 input， $y_{\text{true}}$ 是 target，我们定义**逐点 KL 散度**为

L(y_{\text{pred}},\ y_{\text{true}}) = y_{\text{true}} \cdot \log \frac{y_{\text{true}}}{y_{\text{pred}}} = y_{\text{true}} \cdot (\log y_{\text{true}} - \log y_{\text{pred}})

为了避免计算此量时出现下溢问题，此损失函数要求 input 参数位于对数空间。如果 log_target= True，则 target 参数也可以提供在对数空间中。

总而言之，此函数大致等效于计算

if not log_target: # default
    loss_pointwise = target * (target.log() - input)
else:
    loss_pointwise = target.exp() * (target - input)

然后根据 reduction 参数对结果进行归约，如下所示

if reduction == "mean":  # default
    loss = loss_pointwise.mean()
elif reduction == "batchmean":  # mathematically correct
    loss = loss_pointwise.sum() / input.size(0)
elif reduction == "sum":
    loss = loss_pointwise.sum()
else:  # reduction == "none"
    loss = loss_pointwise

注意

与 PyTorch 中所有其他损失函数一样，此函数要求第一个参数 input 是模型的输出（例如神经网络的输出），第二个参数 target 是数据集中的观察值。这与标准数学符号 $KL(P\ ||\ Q)$ 不同，在标准数学符号中， $P$ 表示观察值的分布，而 $Q$ 表示模型的分布。

警告

reduction= “mean” 不会返回真正的 KL 散度值，请使用 reduction= “batchmean”，它与数学定义一致。

参数

size_average (bool, 可选) – 已弃用（参见 reduction）。默认情况下，损失会在批次中的每个损失元素上取平均。请注意，对于某些损失，每个样本有多个元素。如果字段 size_average 设置为 False，则损失将改为在每个小批量上求和。当 reduce 为 False 时忽略此参数。默认值：True
reduce (bool, 可选) – 已弃用（参见 reduction）。默认情况下，损失会在每个小批量上根据 size_average 对观察值求平均或求和。当 reduce 为 False 时，返回每个批次元素的损失，并忽略 size_average。默认值：True
reduction (str, 可选) – 指定应用于输出的归约方式。默认值：“mean”
log_target (bool, 可选) – 指定 target 是否位于对数空间。默认值：False

形状

输入: $(*)$ ，其中 $*$ 表示任意数量的维度。
目标: $(*)$ ，与输入具有相同形状。
输出: 默认为标量。如果 reduction 为 ‘none’，则为 $(*)$ ，与输入具有相同形状。

示例：

>>> kl_loss = nn.KLDivLoss(reduction="batchmean")
>>> # input should be a distribution in the log space
>>> input = F.log_softmax(torch.randn(3, 5, requires_grad=True), dim=1)
>>> # Sample a batch of distributions. Usually this would come from the dataset
>>> target = F.softmax(torch.rand(3, 5), dim=1)
>>> output = kl_loss(input, target)

>>> kl_loss = nn.KLDivLoss(reduction="batchmean", log_target=True)
>>> log_target = F.log_softmax(torch.rand(3, 5), dim=1)
>>> output = kl_loss(input, log_target)

KLDivLoss¶

文档

教程

资源