1. Generalized Focal Loss
One-stage检测器基本上将目标检测定义为密集分类和定位(即边界盒回归)。该分类方法通常采用Focal loss进行优化,回归框位置通常采用狄拉克分布法进行学习。One-stage检测器的一个最新趋势是引入一个独立的预测分支来估计定位质量,预测的质量有助于分类,以提高检测性能。本文研究了质量估计、分类和定位这三个基本要素的表示。
1.1 Problem
- 质量的评估在训练与推理之使用不一致。
2) 目前定位质量估计的监督只分配给阳性样本,这是不可靠的,因为阴性样本可能有机会获得不可控制的更高的质量预测
ps:图 (b)中的散点图表示具有预测分数的随机抽样实例,其中蓝点清楚地说明了单独表示时预测分类分数和预测IoU分数之间的弱相关性。红圈中的部分包含许多可能的负面因素,伴随着大量的质量预测,其中所蕴含的负样本,可能会排在正样本前面,削弱了检测器的性能。
- 边界框的不灵活表示
广泛使用的包围盒表示可以看作是目标盒坐标的Dirac delta分布,但是它没有考虑到数据集的模糊性和不确定性。作者希望用一种general的分布去建模边界框的表示。
1.2 Solve
1)For localization quality representation
为了保证training和test一致,同时还能够兼顾分类score和质量预测score都能够训练到所有的正负样本,那就是将两者的表示进行联合。这个合并也非常有意思,从物理上来讲,依然还是保留分类的向量,但是对应类别位置的置信度的物理含义不再是分类的score,而是改为质量预测的score,这样就做到了两者的联合表示。 将分类分数和IoU分数统一为一个联合的单一变量(记作“分类IoU联合表征”) 。 此外,负面的将被监督为0质量分数,从而整体质量预测变得更加可靠。
2)For bounding box representation
提出通过直接学习其连续空间上的离散概率分布来表示盒位的任意分布,而不引入任何其他更强的先验(如高斯分布)。然而,对于提出的分类IoU联合表征的情况,除了仍然存在不平衡风险外,我们还面临一个新的问题,即连续IoU标签(0~ 1)作为监管,因为原来的FL目前只支持离散的{1,0}类别标签。我们成功地解决了这个问题,将FL从{1,0}离散版本扩展到其连续变体,称为Generalized Focal Loss广义焦点损耗(GFL)。
Quality Focal Loss (QFL)专注于一组稀疏的难例子,同时对相应类别产生连续的0 ~ 1质量估计;Distribution Focal Loss(DFL)使得网络在任意、灵活分布的情况下,快速地专注于学习目标包围盒连续位置周围值的概率。
Generalized Focal Loss (GFL)带来了三个优点:
Quality Focal Loss (QFL)
为了解决上述训练阶段和测试阶段不一致的问题,我们提出了质量(即IoU分数)和分类分数(以下简称“分类IoU”)的联合表示,其监督软化了标准onehot类别标签,导致可能浮动相应的类别,使得targety在∈[0,1]上。具体来说,y = 0表示负样本质量分数为0,而0 < y ≤ 1表示Iou分数为y的正样本。其中,质量标签y遵循常规定义,既预测的边界框与对应的真实边界框之间的iou得分,动态值为0~1.之后,对y进行sigmoid算子运算,实现多类实现,将sigmoid的输出标记为σ。
Distribution Focal Loss (DFL)
采用从位置到边界盒四周的相对偏移量作为回归目标(见图4中的回归分支)。边界盒回归模型的常规操作被标记为Dirac delta分布δ(x y)。作者提出通过明确鼓励接近目标的值的高概率来优化p(x)的形状。此外,通常情况下,最合适的底层位置(如果存在的话)将离粗标签不远。
Generalized Focal Loss (GFL)
值得注意的是,QFL和DFL可以统一成一个一般形式,在本文中称为Generalized Focal Loss (GFL)。
1.3 Append
about the Distributions
高斯假设的损耗目标实际上是动态加权的L2 loss,其训练权值与预测方差σ相关。当在边缘水平优化时,它在某种程度上类似于Dirac delta(标准L2损耗)。此外,还不清楚如何将高斯假设集成到基于iou-base的Loss公式中,因为它将目标表示的表达式与其优化目标紧密耦合。因此它不能享受基于iou-base的优化的好处。相比之下,作者提出的一般分布解耦了表示和损失目标,使其适用于任何类型的优化,包括边缘级和盒级。
IoU-branch superior than centerness-branch
1) IoU本身就是最终metric的衡量标准,所以用来做质量估计和排序是非常自然的。
2) centerness有一些不可避免的缺陷,比如对于stride=8的FPN的特征层(也就是P3),会存在一些小物体他们的centerness label极度小甚至接近于0,而IoU就会相对好很多,如下图所示。
More Examples of Distributed Bounding Boxes
我们发现有一些分布式表示学到了多个峰。比如伞这个物体,它的伞柄被椅子严重遮挡。如果我们不看伞柄,那么可以按照白色框(gt)来定位伞,但如果我们算上伞柄,我们又可以用绿色框(预测)来定位伞。在分布上,它也的确呈现一个双峰的模式(bottom),它的两个峰的概率会集中在底部的绿线和白线的两个位置。这个观察还是相当有趣的。这可能带来一个妙用,就是我们可以通过分布shape的情况去找哪些图片可能有界定很模糊的边界,从而再进行一些标注的refine或一致性的检查等等。颇有一种Learn From Data,再反哺Data的感觉。
2. Generalized Focal Loss V2
2.1 Problem
2.2 Solve
Quality Predictor部分,也就是本文的核心。
2.3 Append
3. Code
import torch import torch.nn as nn import torch.nn.functional as F from .utils import weighted_loss @weighted_loss def quality_focal_loss(pred, target, beta=2.0): r"""Quality Focal Loss (QFL) is from `Generalized Focal Loss: Learning Qualified and Distributed Bounding Boxes for Dense Object Detection <https://arxiv.org/abs/2006.04388>`_. Args: pred (torch.Tensor): Predicted joint representation of classification and quality (IoU) estimation with shape (N, C), C is the number of classes. target (tuple([torch.Tensor])): Target category label with shape (N,) and target quality label with shape (N,). beta (float): The beta parameter for calculating the modulating factor. Defaults to 2.0. Returns: torch.Tensor: Loss tensor with shape (N,). """ assert ( len(target) == 2 ), """target for QFL must be a tuple of two elements, including category label and quality label, respectively""" # label denotes the category id, score denotes the quality score label, score = target # negatives are supervised by 0 quality score pred_sigmoid = pred.sigmoid() scale_factor = pred_sigmoid zerolabel = scale_factor.new_zeros(pred.shape) loss = F.binary_cross_entropy_with_logits( pred, zerolabel, reduction="none" ) * scale_factor.pow(beta) # FG cat_id: [0, num_classes -1], BG cat_id: num_classes bg_class_ind = pred.size(1) pos = torch.nonzero((label >= 0) & (label < bg_class_ind), as_tuple=False).squeeze( 1 ) pos_label = label[pos].long() # positives are supervised by bbox quality (IoU) score scale_factor = score[pos] - pred_sigmoid[pos, pos_label] loss[pos, pos_label] = F.binary_cross_entropy_with_logits( pred[pos, pos_label], score[pos], reduction="none" ) * scale_factor.abs().pow(beta) loss = loss.sum(dim=1, keepdim=False) return loss @weighted_loss def distribution_focal_loss(pred, label): r"""Distribution Focal Loss (DFL) is from `Generalized Focal Loss: Learning Qualified and Distributed Bounding Boxes for Dense Object Detection <https://arxiv.org/abs/2006.04388>`_. Args: pred (torch.Tensor): Predicted general distribution of bounding boxes (before softmax) with shape (N, n+1), n is the max value of the integral set `{0, ..., n}` in paper. label (torch.Tensor): Target distance label for bounding boxes with shape (N,). Returns: torch.Tensor: Loss tensor with shape (N,). """ dis_left = label.long() dis_right = dis_left + 1 weight_left = dis_right.float() - label weight_right = label - dis_left.float() loss = ( F.cross_entropy(pred, dis_left, reduction="none") * weight_left + F.cross_entropy(pred, dis_right, reduction="none") * weight_right ) return loss class QualityFocalLoss(nn.Module): r"""Quality Focal Loss (QFL) is a variant of `Generalized Focal Loss: Learning Qualified and Distributed Bounding Boxes for Dense Object Detection <https://arxiv.org/abs/2006.04388>`_. Args: use_sigmoid (bool): Whether sigmoid operation is conducted in QFL. Defaults to True. beta (float): The beta parameter for calculating the modulating factor. Defaults to 2.0. reduction (str): Options are "none", "mean" and "sum". loss_weight (float): Loss weight of current loss. """ def __init__(self, use_sigmoid=True, beta=2.0, reduction="mean", loss_weight=1.0): super(QualityFocalLoss, self).__init__() assert use_sigmoid is True, "Only sigmoid in QFL supported now." self.use_sigmoid = use_sigmoid self.beta = beta self.reduction = reduction self.loss_weight = loss_weight def forward( self, pred, target, weight=None, avg_factor=None, reduction_override=None ): """Forward function. Args: pred (torch.Tensor): Predicted joint representation of classification and quality (IoU) estimation with shape (N, C), C is the number of classes. target (tuple([torch.Tensor])): Target category label with shape (N,) and target quality label with shape (N,). weight (torch.Tensor, optional): The weight of loss for each prediction. Defaults to None. avg_factor (int, optional): Average factor that is used to average the loss. Defaults to None. reduction_override (str, optional): The reduction method used to override the original reduction method of the loss. Defaults to None. """ assert reduction_override in (None, "none", "mean", "sum") reduction = reduction_override if reduction_override else self.reduction if self.use_sigmoid: loss_cls = self.loss_weight * quality_focal_loss( pred, target, weight, beta=self.beta, reduction=reduction, avg_factor=avg_factor, ) else: raise NotImplementedError return loss_cls class DistributionFocalLoss(nn.Module): r"""Distribution Focal Loss (DFL) is a variant of `Generalized Focal Loss: Learning Qualified and Distributed Bounding Boxes for Dense Object Detection <https://arxiv.org/abs/2006.04388>`_. Args: reduction (str): Options are `'none'`, `'mean'` and `'sum'`. loss_weight (float): Loss weight of current loss. """ def __init__(self, reduction="mean", loss_weight=1.0): super(DistributionFocalLoss, self).__init__() self.reduction = reduction self.loss_weight = loss_weight def forward( self, pred, target, weight=None, avg_factor=None, reduction_override=None ): """Forward function. Args: pred (torch.Tensor): Predicted general distribution of bounding boxes (before softmax) with shape (N, n+1), n is the max value of the integral set `{0, ..., n}` in paper. target (torch.Tensor): Target distance label for bounding boxes with shape (N,). weight (torch.Tensor, optional): The weight of loss for each prediction. Defaults to None. avg_factor (int, optional): Average factor that is used to average the loss. Defaults to None. reduction_override (str, optional): The reduction method used to override the original reduction method of the loss. Defaults to None. """ assert reduction_override in (None, "none", "mean", "sum") reduction = reduction_override if reduction_override else self.reduction loss_cls = self.loss_weight * distribution_focal_loss( pred, target, weight, reduction=reduction, avg_factor=avg_factor ) return loss_cls
1. 大白话 Generalized Focal Loss
2. 大白话 Generalized Focal Loss V2
3. Nanodet