3损失函数
3.1、分类损失varifocal loss
Focal loss定义:
其中a是前景背景的损失权重,p的y次是不同样本的权重,难分样本的损失权重会增大。当训练一个密集的物体检测器使连续的IACS回归时,本文从focal loss中借鉴了样本加权思想来解决类不平衡问题。但是,与focal loss同等对待正负样本的损失不同,而varifocal loss选择不对称地对待它们。varifocal loss定义如下:
其中p是预测的IACS得分,q是目标IoU分数。对于训练中的正样本,将q设置为生成的bbox和gt box之间的IoU(gt IoU),而对于训练中的负样本,所有类别的训练目标q均为0。
备注:Varifocal Loss会预测Iou-aware Cls_score(IACS)与分类两个得分,通过p的y次来有效降低负样本损失的权重,正样本选择不降低权重。此外,通过q(Iou感知得分)来对Iou高的正样本损失加大权重,相当于将训练重点放在高质量的样本上面。
@staticmethod def _varifocal_loss(pred_score, gt_score, label, alpha=0.75, gamma=2.0): weight = alpha * pred_score.pow(gamma) * (1 - label) + gt_score * label loss = F.binary_cross_entropy(pred_score, gt_score, weight=weight, reduction='sum') return loss
3.2、回归损失
1、GIoULoss
GIOU的计算很简单,对于两个bounding box A,B。我们可以算出其最小凸集(包围A、B的最小包围框)C。有了最小凸集,就可以计算GIOU:
计算方法很简单,从公式可以看出,GIOU有几个特点:
- GIOU是IOU的下界,且取值范围为(-1, 1]。当两个框不重合时,IOU始终为0,不论A、B相隔多远,但是对于GIOU来说,A,B不重合度越高(离的越远),GIOU越趋近于-1。关于这点,下面再详细解释一下。
- GIOU其实就是在IOU的基础上减掉了一个东西,这个减掉的东西,让我门避免了两个bbox不重合时Loss为0的情况。至于减掉的东西怎么去直观的理解,似乎不好找到一个很好的解释?
- 可导:这一点需要强调下,由于max,min,分段函数(比如ReLU)这些都是可导的,所以用1-GIOU作为Loss是可导的。
当IOU=0时:
显然, A∪B值不变,最大化GIOU就是要最小化C,最小化C就会促成2个框不断靠近,而不是像最小化IOU那样loss为0。
YOLO V3涨了2个点,Faster RCNN,MaskRCNN这种涨点少了些。主要原因在于Faster RCNN,MaskRCNN本身的Anchor很多,出现完全无重合的情况比较少,这样GIOU和IOU Loss就无明显差别。所以提升不是太明显。
在TOOD中,bbox(Bouding box)通过对齐的anchor(具有更大的分类得分、更精确的定位)预测得到,这样的bbox通常经过NMS后仍可以得以保留。此外,t可以在训练阶段通过对损失加权选择高质量的bbox。因此,采用t度量bbox的质量,同时结合GIoU Loss定义了TOOD的Reg Loss如下:
@register @serializable class GIoULoss(object): """ Generalized Intersection over Union, see https://arxiv.org/abs/1902.09630 Args: loss_weight (float): giou loss weight, default as 1 eps (float): epsilon to avoid divide by zero, default as 1e-10 reduction (string): Options are "none", "mean" and "sum". default as none """ def __init__(self, loss_weight=1., eps=1e-10, reduction='none'): self.loss_weight = loss_weight self.eps = eps assert reduction in ('none', 'mean', 'sum') self.reduction = reduction def bbox_overlap(self, box1, box2, eps=1e-10): """calculate the iou of box1 and box2 Args: box1 (Tensor): box1 with the shape (..., 4) box2 (Tensor): box1 with the shape (..., 4) eps (float): epsilon to avoid divide by zero Return: iou (Tensor): iou of box1 and box2 overlap (Tensor): overlap of box1 and box2 union (Tensor): union of box1 and box2 """ x1, y1, x2, y2 = box1 x1g, y1g, x2g, y2g = box2 xkis1 = paddle.maximum(x1, x1g) ykis1 = paddle.maximum(y1, y1g) xkis2 = paddle.minimum(x2, x2g) ykis2 = paddle.minimum(y2, y2g) w_inter = (xkis2 - xkis1).clip(0) h_inter = (ykis2 - ykis1).clip(0) overlap = w_inter * h_inter area1 = (x2 - x1) * (y2 - y1) area2 = (x2g - x1g) * (y2g - y1g) union = area1 + area2 - overlap + eps iou = overlap / union return iou, overlap, union def __call__(self, pbox, gbox, iou_weight=1., loc_reweight=None): x1, y1, x2, y2 = paddle.split(pbox, num_or_sections=4, axis=-1) x1g, y1g, x2g, y2g = paddle.split(gbox, num_or_sections=4, axis=-1) box1 = [x1, y1, x2, y2] box2 = [x1g, y1g, x2g, y2g] iou, overlap, union = self.bbox_overlap(box1, box2, self.eps) xc1 = paddle.minimum(x1, x1g) yc1 = paddle.minimum(y1, y1g) xc2 = paddle.maximum(x2, x2g) yc2 = paddle.maximum(y2, y2g) area_c = (xc2 - xc1) * (yc2 - yc1) + self.eps miou = iou - ((area_c - union) / area_c) if loc_reweight is not None: loc_reweight = paddle.reshape(loc_reweight, shape=(-1, 1)) loc_thresh = 0.9 giou = 1 - (1 - loc_thresh) * miou - loc_thresh * miou * loc_reweight else: giou = 1 - miou if self.reduction == 'none': loss = giou elif self.reduction == 'sum': loss = paddle.sum(giou * iou_weight) else: loss = paddle.mean(giou * iou_weight) return loss * self.loss_weight
2、L1 loss
均绝对误差(Mean Absolute Error,MAE) 是指模型预测值f(x)和真实值y之间距离的均值,其公式如下:
忽略下标i ,设n=1,以f(x)−y为横轴,MAE的值为纵轴,得到函数的图形如下:
MAE曲线连续,但是在y−f(x)=0处不可导。而且 MAE 大部分情况下梯度都是相等的,这意味着即使对于小的损失值,其梯度也是大的。这不利于函数的收敛和模型的学习。但是,无论对于什么样的输入值,都有着稳定的梯度,不会导致梯度爆炸问题,具有较为稳健性的解。
相比于MSE,MAE有个优点就是,对于离群点不那么敏感。因为MAE计算的是误差y−f(x)的绝对值,对于任意大小的差值,其惩罚都是固定的。
loss_l1 = F.l1_loss(pred_bboxes_pos, assigned_bboxes_pos)
3、DF Loss
对于任意分布来建模框的表示,它可以用积分形式嵌入到任意已有的和框回归相关的损失函数上,例如最近比较流行的GIoU Loss。这个实际上也就够了,不过涨点不是很明显,我们又仔细分析了一下,发现如果分布过于任意,网络学习的效率可能会不高,原因是一个积分目标可能对应了无穷多种分布模式。如下图所示:
考虑到真实的分布通常不会距离标注的位置太远,所以我们又额外加了个loss,希望网络能够快速地聚焦到标注位置附近的数值,使得他们概率尽可能大。基于此,我们取了个名字叫Distribution Focal Loss (DFL):
其形式上与QFL的右半部分很类似,含义是以类似交叉熵的形式去优化与标签y最接近的一左一右两个位置的概率,从而让网络快速地聚焦到目标位置的邻近区域的分布中去。
QFL和DFL的作用是正交的,他们的增益互不影响
def _df_loss(self, pred_dist, target): target_left = paddle.cast(target, 'int64') target_right = target_left + 1 weight_left = target_right.astype('float32') - target weight_right = 1 - weight_left loss_left = F.cross_entropy(pred_dist, target_left, reduction='none') * weight_left loss_right = F.cross_entropy(pred_dist, target_right, reduction='none') * weight_right return (loss_left + loss_right).mean(-1, keepdim=True)
3.3、总损失
其中,a表示分类损失的权重系数,b表示回归损失的权重系数,c表示DFL损失的权重系数。
def get_loss(self, head_outs, gt_meta): pred_scores, pred_distri, anchors, anchor_points, num_anchors_list, stride_tensor = head_outs anchor_points_s = anchor_points / stride_tensor pred_bboxes = self._bbox_decode(anchor_points_s, pred_distri) gt_labels = gt_meta['gt_class'] gt_bboxes = gt_meta['gt_bbox'] pad_gt_mask = gt_meta['pad_gt_mask'] # Epoch小于100使用ATSS匹配 if gt_meta['epoch_id'] < self.static_assigner_epoch: assigned_labels, assigned_bboxes, assigned_scores = \ self.static_assigner( anchors, num_anchors_list, gt_labels, gt_bboxes, pad_gt_mask, bg_index=self.num_classes, pred_bboxes=pred_bboxes.detach() * stride_tensor) alpha_l = 0.25 else: # Epoch大于100使用TAL匹配 assigned_labels, assigned_bboxes, assigned_scores = \ self.assigner( pred_scores.detach(), pred_bboxes.detach() * stride_tensor, anchor_points, num_anchors_list, gt_labels, gt_bboxes, pad_gt_mask, bg_index=self.num_classes) alpha_l = -1 # rescale bbox assigned_bboxes /= stride_tensor # cls loss if self.use_varifocal_loss: one_hot_label = F.one_hot(assigned_labels, self.num_classes) loss_cls = self._varifocal_loss(pred_scores, assigned_scores, one_hot_label) else: loss_cls = self._focal_loss(pred_scores, assigned_scores, alpha_l) assigned_scores_sum = assigned_scores.sum() if paddle_distributed_is_initialized(): paddle.distributed.all_reduce(assigned_scores_sum) assigned_scores_sum = paddle.clip(assigned_scores_sum / paddle.distributed.get_world_size(), min=1) loss_cls /= assigned_scores_sum loss_l1, loss_iou, loss_dfl = self._bbox_loss(pred_distri, pred_bboxes, anchor_points_s, assigned_labels, assigned_bboxes, assigned_scores, assigned_scores_sum) loss = self.loss_weight['class'] * loss_cls + self.loss_weight['iou'] * loss_iou + self.loss_weight['dfl'] * loss_dfl out_dict = { 'loss': loss, 'loss_cls': loss_cls, 'loss_iou': loss_iou, 'loss_dfl': loss_dfl, 'loss_l1': loss_l1, } return out_dict
4模型推理与部署
4.1、模型推理
# inference single image CUDA_VISIBLE_DEVICES=0 python tools/infer.py -c configs/ppyoloe/ppyoloe_crn_l_300e_coco.yml -o weights=https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_300e_coco.pdparams --infer_img=demo/000000014439_640x640.jpg # inference all images in the directory CUDA_VISIBLE_DEVICES=0 python tools/infer.py -c configs/ppyoloe/ppyoloe_crn_l_300e_coco.yml -o weights=https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_300e_coco.pdparams --infer_dir=demo
4.2、导出ONNX
# export inference model python tools/export_model.py configs/ppyoloe/ppyoloe_crn_l_300e_coco.yml --output_dir=output_inference -o weights=https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_300e_coco.pdparams # install paddle2onnx pip install paddle2onnx # convert to onnx paddle2onnx --model_dir output_inference/ppyoloe_crn_l_300e_coco --model_filename model.pdmodel --params_filename model.pdiparams --opset_version 11 --save_file ppyoloe_crn_l_300e_coco.onnx
4。3、导出TensorRT Engine
python tools/export_model.py configs/ppyoloe/ppyoloe_crn_l_300e_coco.yml -o weights=https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_300e_coco.pdparams -o trt=True
5参考
[1].https://github.com/PaddlePaddle/PaddleDetection
6推荐阅读
CVPR2022 | Anchor-Free之Label Assignment全新范式,目标检测经典之作!!!