【实操】涨点神器你还不会，快点进来学习Label Smooth-阿里云开发者社区

【实操】涨点神器你还不会，快点进来学习Label Smooth

2024-02-28 181

版权

本文内容由阿里云实名注册用户自发贡献，版权归原作者所有，阿里云开发者社区不拥有其著作权，亦不承担相应法律责任。具体规则请查看《阿里云开发者社区用户服务协议》和《阿里云开发者社区知识产权保护指引》。如果您发现本社区中有涉嫌抄袭的内容，填写侵权投诉表单进行举报，一经查实，本社区将立刻删除涉嫌侵权内容。

简介： 【实操】涨点神器你还不会，快点进来学习Label Smooth

前言

相信大家在进来后主要目的是学习如何涨点，那么在本文中我们将着重于实操，捎带讲解一些Label Smooth原理。

Label Smooth的提出是在yolov4中首次被提出，在训练神经网络的过程中“过拟合”现象是我们经常会碰到的麻烦，而解决“过拟合”现象有效途径之一就是label smoothing，即：Label Smooth可以看作是一种防止过拟合的正则化方法。

原理

Label Smooth的原理主要是在One-Hot标签中加入噪声，减少训练时GroundTruth在计算损失函数的权重，来达到防止过拟合的作用，增强模型的泛化能力；

假设：我们一个5分类的任务，我们在没有经过label smoothing时的数据得到的数据如下

lua

复制代码

out = tensor([[ 0,  0,  0,  0,  1]], device='cuda:0', grad_fn=<AddmmBackward>)

我们需要使目标out从ont-hot 标签转变为soft label,也即是原来的1的位置上的数变为1 - a ,其它为0的位置上的数转变为a / (K - 1),这里的a通常是取0.1，K为数据类别数。

那么经过 Label Smooth 后的输出为：

lua

复制代码

LabelSmoothOut = tensor([[ 0.025,  0.025,  0.025,  0.025,  0.9]], device='cuda:0', grad_fn=<AddmmBackward>)

上述的操作已经被大佬从概率上证实确实是可以进行优化结果，感兴趣的话大家可以自信搜索(arxiv.org/pdf/1906.02…)

实操 Label Smooth

下文中就进入到实操环节，例举4个关于Label Smooth的操作。

交叉熵损失与概率

参考：Devin Yang示例

python

复制代码

import torch
import torch.nn as nn
from torch.autograd import Variable
class LabelSmoothingLoss(nn.Module):
    def __init__(self, classes, smoothing=0.0, dim=-1, weight=None):
        """if smoothing == 0, it's one-hot method
           if 0 < smoothing < 1, it's smooth method
        """
        super(LabelSmoothingLoss, self).__init__()
        self.confidence = 1.0 - smoothing
        self.smoothing = smoothing
        self.weight = weight
        self.cls = classes
        self.dim = dim
    def forward(self, pred, target):
        assert 0 <= self.smoothing < 1
        pred = pred.log_softmax(dim=self.dim)
        if self.weight is not None:
            pred = pred * self.weight.unsqueeze(0)
        with torch.no_grad():
            true_dist = torch.zeros_like(pred)
            true_dist.fill_(self.smoothing / (self.cls - 1))
            true_dist.scatter_(1, target.data.unsqueeze(1), self.confidence)
        return torch.mean(torch.sum(-true_dist * pred, dim=self.dim))
if __name__ == "__main__":
    crit = LabelSmoothingLoss(classes=5, smoothing=0.1)
    predict = torch.FloatTensor([[1, 0, 0, 0, 0],
                                 [0, 1, 0, 0, 0],
                                 [0, 0, 1, 0, 0]])
    v = crit(Variable(predict), Variable(torch.LongTensor([2, 1, 0])))
    print(v)

Shital Shah示例

python

复制代码

import torch
import torch.nn.functional as F
from torch.autograd import Variable
from torch.nn.modules.loss import _WeightedLoss
class SmoothCrossEntropyLoss(_WeightedLoss):
    def __init__(self, weight=None, reduction='mean', smoothing=0.0):
        super().__init__(weight=weight, reduction=reduction)
        self.smoothing = smoothing
        self.weight = weight
        self.reduction = reduction
    def k_one_hot(self, targets: torch.Tensor, n_classes: int, smoothing=0.0):
        with torch.no_grad():
            targets = torch.empty(size=(targets.size(0), n_classes),
                                  device=targets.device) \
                .fill_(smoothing / (n_classes - 1)) \
                .scatter_(1, targets.data.unsqueeze(1), 1. - smoothing)
        return targets
    def reduce_loss(self, loss):
        return loss.mean() if self.reduction == 'mean' else loss.sum() \
            if self.reduction == 'sum' else loss
    def forward(self, inputs, targets):
        assert 0 <= self.smoothing < 1
        targets = self.k_one_hot(targets, inputs.size(-1), self.smoothing)
        log_preds = F.log_softmax(inputs, -1)
        if self.weight is not None:
            log_preds = log_preds * self.weight.unsqueeze(0)
        return self.reduce_loss(-(targets * log_preds).sum(dim=-1))
if __name__ == "__main__":
    crit = SmoothCrossEntropyLoss(smoothing=0.5)
    tensorData = [[0, 0.2, 0.7, 0.1, 0, 0.15], [0, 0.9, 0.2, 0.2, 1, 0.15], [1, 0.2, 0.7, 0.9, 1, 0.15]]
    predict = torch.FloatTensor(tensorData)
    v = crit(Variable(predict),
             Variable(torch.LongTensor([2, 1, 0])))
    print(v)

标签平滑交叉熵损失

相较于上述两割我们稍微最小化编码写法以使其更简洁：

Datasaurus示例

python

复制代码

import torch
import torch.nn.functional as F
from torch.autograd import Variable
class LabelSmoothingLoss(torch.nn.Module):
    def __init__(self, smoothing: float = 0.1,
                 reduction="mean", weight=None):
        super(LabelSmoothingLoss, self).__init__()
        self.smoothing   = smoothing
        self.reduction = reduction
        self.weight    = weight
    def reduce_loss(self, loss):
        return loss.mean() if self.reduction == 'mean' else loss.sum() \
         if self.reduction == 'sum' else loss
    def linear_combination(self, x, y):
        return self.smoothing * x + (1 - self.smoothing) * y
    def forward(self, preds, target):
        assert 0 <= self.smoothing < 1
        if self.weight is not None:
            self.weight = self.weight.to(preds.device)
        n = preds.size(-1)
        log_preds = F.log_softmax(preds, dim=-1)
        loss = self.reduce_loss(-log_preds.sum(dim=-1))
        nll = F.nll_loss(
            log_preds, target, reduction=self.reduction, weight=self.weight
        )
        return self.linear_combination(loss / n, nll)
if __name__=="__main__":
    crit = LabelSmoothingLoss(smoothing=0.3, reduction="mean")
    predict = torch.FloatTensor([[0, 0.2, 0.7, 0.1, 0],
                                 [0, 0.9, 0.2, 0.2, 1],
                                 [1, 0.2, 0.7, 0.9, 1]])
    v = crit(Variable(predict),
             Variable(torch.LongTensor([2, 1, 0])))
    print(v)

NVIDIA/DeepLearning示例

python

复制代码

import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.autograd import Variable
class LabelSmoothing(nn.Module):
    """NLL loss with label smoothing.
    """
    def __init__(self, smoothing=0.0):
        """Constructor for the LabelSmoothing module.
        :param smoothing: label smoothing factor
        """
        super(LabelSmoothing, self).__init__()
        self.confidence = 1.0 - smoothing
        self.smoothing = smoothing
    def forward(self, x, target):
        logprobs = torch.nn.functional.log_softmax(x, dim=-1)
        nll_loss = -logprobs.gather(dim=-1, index=target.unsqueeze(1))
        nll_loss = nll_loss.squeeze(1)
        smooth_loss = -logprobs.mean(dim=-1)
        loss = self.confidence * nll_loss + self.smoothing * smooth_loss
        return loss.mean()
if __name__ == "__main__":
    crit = LabelSmoothing(smoothing=0.3)
    predict = torch.FloatTensor([[0, 0.2, 0.7, 0.1, 0],
                                 [0, 0.9, 0.2, 0.2, 1],
                                 [1, 0.2, 0.7, 0.9, 1]])
    v = crit(Variable(predict),
             Variable(torch.LongTensor([2, 1, 0])))
    print(v)

参考：

【实操】涨点神器你还不会，快点进来学习Label Smooth

前言

原理

实操 Label Smooth

交叉熵损失与概率

参考：Devin Yang示例

Shital Shah示例

标签平滑交叉熵损失

Datasaurus示例

NVIDIA/DeepLearning示例

热门文章

最新文章

相关课程

相关电子书

热门

活动广场

任务中心

开发者评测

高校计划

乘风者计划

训练营

阿里云MVP

话题

直播

下载

镜像站

技术资料

插件

【实操】涨点神器你还不会，快点进来学习Label Smooth

前言

原理

实操 Label Smooth

交叉熵损失与概率

参考：Devin Yang示例

Shital Shah示例

标签平滑交叉熵损失

Datasaurus示例

NVIDIA/DeepLearning示例

热门文章

最新文章

相关课程

相关电子书