前言
相信大家在进来后主要目的是学习如何涨点,那么在本文中我们将着重于实操,捎带讲解一些Label Smooth原理。
Label Smooth的提出是在yolov4中首次被提出,在训练神经网络的过程中“过拟合”现象是我们经常会碰到的麻烦, 而解决“过拟合”现象有效途径之一就是label smoothing,即:Label Smooth可以看作是一种防止过拟合的正则化方法。
原理
Label Smooth的原理主要是在One-Hot标签中加入噪声,减少训练时GroundTruth在计算损失函数的权重,来达到防止过拟合的作用,增强模型的泛化能力;
假设:我们一个5分类的任务,我们在没有经过label smoothing时的数据得到的数据如下
lua
复制代码
out = tensor([[ 0, 0, 0, 0, 1]], device='cuda:0', grad_fn=<AddmmBackward>)
我们需要使目标out从ont-hot 标签转变为soft label,也即是原来的1的位置上的数变为1 - a ,其它为0的位置上的数转变为a / (K - 1),这里的a通常是取0.1,K为数据类别数。
那么经过 Label Smooth 后的输出为:
lua
复制代码
LabelSmoothOut = tensor([[ 0.025, 0.025, 0.025, 0.025, 0.9]], device='cuda:0', grad_fn=<AddmmBackward>)
上述的操作已经被大佬从概率上证实确实是可以进行优化结果,感兴趣的话大家可以自信搜索(arxiv.org/pdf/1906.02…)
实操 Label Smooth
下文中就进入到实操环节,例举4个关于Label Smooth的操作。
交叉熵损失与概率
参考:Devin Yang示例
python
复制代码
import torch import torch.nn as nn from torch.autograd import Variable class LabelSmoothingLoss(nn.Module): def __init__(self, classes, smoothing=0.0, dim=-1, weight=None): """if smoothing == 0, it's one-hot method if 0 < smoothing < 1, it's smooth method """ super(LabelSmoothingLoss, self).__init__() self.confidence = 1.0 - smoothing self.smoothing = smoothing self.weight = weight self.cls = classes self.dim = dim def forward(self, pred, target): assert 0 <= self.smoothing < 1 pred = pred.log_softmax(dim=self.dim) if self.weight is not None: pred = pred * self.weight.unsqueeze(0) with torch.no_grad(): true_dist = torch.zeros_like(pred) true_dist.fill_(self.smoothing / (self.cls - 1)) true_dist.scatter_(1, target.data.unsqueeze(1), self.confidence) return torch.mean(torch.sum(-true_dist * pred, dim=self.dim)) if __name__ == "__main__": crit = LabelSmoothingLoss(classes=5, smoothing=0.1) predict = torch.FloatTensor([[1, 0, 0, 0, 0], [0, 1, 0, 0, 0], [0, 0, 1, 0, 0]]) v = crit(Variable(predict), Variable(torch.LongTensor([2, 1, 0]))) print(v)
Shital Shah示例
python
复制代码
import torch import torch.nn.functional as F from torch.autograd import Variable from torch.nn.modules.loss import _WeightedLoss class SmoothCrossEntropyLoss(_WeightedLoss): def __init__(self, weight=None, reduction='mean', smoothing=0.0): super().__init__(weight=weight, reduction=reduction) self.smoothing = smoothing self.weight = weight self.reduction = reduction def k_one_hot(self, targets: torch.Tensor, n_classes: int, smoothing=0.0): with torch.no_grad(): targets = torch.empty(size=(targets.size(0), n_classes), device=targets.device) \ .fill_(smoothing / (n_classes - 1)) \ .scatter_(1, targets.data.unsqueeze(1), 1. - smoothing) return targets def reduce_loss(self, loss): return loss.mean() if self.reduction == 'mean' else loss.sum() \ if self.reduction == 'sum' else loss def forward(self, inputs, targets): assert 0 <= self.smoothing < 1 targets = self.k_one_hot(targets, inputs.size(-1), self.smoothing) log_preds = F.log_softmax(inputs, -1) if self.weight is not None: log_preds = log_preds * self.weight.unsqueeze(0) return self.reduce_loss(-(targets * log_preds).sum(dim=-1)) if __name__ == "__main__": crit = SmoothCrossEntropyLoss(smoothing=0.5) tensorData = [[0, 0.2, 0.7, 0.1, 0, 0.15], [0, 0.9, 0.2, 0.2, 1, 0.15], [1, 0.2, 0.7, 0.9, 1, 0.15]] predict = torch.FloatTensor(tensorData) v = crit(Variable(predict), Variable(torch.LongTensor([2, 1, 0]))) print(v)
标签平滑交叉熵损失
相较于上述两割我们稍微最小化编码写法以使其更简洁:
Datasaurus示例
python
复制代码
import torch import torch.nn.functional as F from torch.autograd import Variable class LabelSmoothingLoss(torch.nn.Module): def __init__(self, smoothing: float = 0.1, reduction="mean", weight=None): super(LabelSmoothingLoss, self).__init__() self.smoothing = smoothing self.reduction = reduction self.weight = weight def reduce_loss(self, loss): return loss.mean() if self.reduction == 'mean' else loss.sum() \ if self.reduction == 'sum' else loss def linear_combination(self, x, y): return self.smoothing * x + (1 - self.smoothing) * y def forward(self, preds, target): assert 0 <= self.smoothing < 1 if self.weight is not None: self.weight = self.weight.to(preds.device) n = preds.size(-1) log_preds = F.log_softmax(preds, dim=-1) loss = self.reduce_loss(-log_preds.sum(dim=-1)) nll = F.nll_loss( log_preds, target, reduction=self.reduction, weight=self.weight ) return self.linear_combination(loss / n, nll) if __name__=="__main__": crit = LabelSmoothingLoss(smoothing=0.3, reduction="mean") predict = torch.FloatTensor([[0, 0.2, 0.7, 0.1, 0], [0, 0.9, 0.2, 0.2, 1], [1, 0.2, 0.7, 0.9, 1]]) v = crit(Variable(predict), Variable(torch.LongTensor([2, 1, 0]))) print(v)
NVIDIA/DeepLearning示例
python
复制代码
import torch import torch.nn as nn import torch.nn.functional as F from torch.autograd import Variable class LabelSmoothing(nn.Module): """NLL loss with label smoothing. """ def __init__(self, smoothing=0.0): """Constructor for the LabelSmoothing module. :param smoothing: label smoothing factor """ super(LabelSmoothing, self).__init__() self.confidence = 1.0 - smoothing self.smoothing = smoothing def forward(self, x, target): logprobs = torch.nn.functional.log_softmax(x, dim=-1) nll_loss = -logprobs.gather(dim=-1, index=target.unsqueeze(1)) nll_loss = nll_loss.squeeze(1) smooth_loss = -logprobs.mean(dim=-1) loss = self.confidence * nll_loss + self.smoothing * smooth_loss return loss.mean() if __name__ == "__main__": crit = LabelSmoothing(smoothing=0.3) predict = torch.FloatTensor([[0, 0.2, 0.7, 0.1, 0], [0, 0.9, 0.2, 0.2, 1], [1, 0.2, 0.7, 0.9, 1]]) v = crit(Variable(predict), Variable(torch.LongTensor([2, 1, 0]))) print(v)
参考:
- github.com/pytorch/pyt…
- stackoverflow.com/a/59264908/…
- www.kaggle.com/c/siim-isic…
- github.com/NVIDIA/Deep…
- stackoverflow.com/questions/5….