2样本匹配
2.1 ATSS Assigner思想
ATSS论文指出One-Stage Anchor-Based和Center-Based Anchor-Free检测算法间的差异主要来自于正负样本的选择,基于此提出ATSS(Adaptive Training Sample Selection)方法,该方法能够自动根据GT的相关统计特征选择合适的Anchor Box作为正样本,在不带来额外计算量和参数的情况下,能够大幅提升模型的性能。
ATSS选取正样本的方法如下:其简要流程为:
- 计算每个 gt bbox 和多尺度输出层的所有 anchor 之间的 IoU
- 计算每个 gt bbox 中心坐标和多尺度输出层的所有 anchor 中心坐标的 l2 距离
- 遍历每个输出层,遍历每个 gt bbox,找出当前层中 topk (超参,默认是 9 )个最小 l2 距离的 anchor 。假设一共有 l 个输出层,那么对于任何一个 gt bbox,都会挑选出 topk×l 个候选位置
- 对于每个 gt bbox,计算所有候选位置 IoU 的均值和标准差,两者相加得到该 gt bbox 的自适应阈值
- 遍历每个 gt bbox,选择出候选位置中 IoU 大于阈值的位置,该位置认为是正样本,负责预测该 gt bbox
- 如果 topk 参数设置过大,可能会导致某些正样本位置不在 gt bbox 内部,故需要过滤掉这部分正样本,设置为背景样本
1、ATSS主要2大特性:
- 保证了所有的正样本Anchor都是在Ground Truth的周围。
- 最主要是根据不同层的特性对不同层的正样本的阈值进行了微调。
2、ATSS的贡献
- 指出Anchor-Base检测器和Anchor-Free检测器之间的本质区别实际上是如何定义正训练样本和负训练样本;
- 提出自适应训练样本选择,以根据目标的统计特征自动选择正负样本;
- 证明了在图像上的每个位置上平铺多个Anchor来提升检测的性能是没效果的;
class ATSSAssigner(nn.Layer): """Bridging the Gap Between Anchor-based and Anchor-free Detection via Adaptive Training Sample Selection """ __shared__ = ['num_classes'] def __init__(self, topk=9, num_classes=80, force_gt_matching=False, eps=1e-9): super(ATSSAssigner, self).__init__() self.topk = topk self.num_classes = num_classes self.force_gt_matching = force_gt_matching self.eps = eps def _gather_topk_pyramid(self, gt2anchor_distances, num_anchors_list, pad_gt_mask): pad_gt_mask = pad_gt_mask.tile([1, 1, self.topk]).astype(paddle.bool) gt2anchor_distances_list = paddle.split(gt2anchor_distances, num_anchors_list, axis=-1) num_anchors_index = np.cumsum(num_anchors_list).tolist() num_anchors_index = [0, ] + num_anchors_index[:-1] is_in_topk_list = [] topk_idxs_list = [] for distances, anchors_index in zip(gt2anchor_distances_list, num_anchors_index): num_anchors = distances.shape[-1] topk_metrics, topk_idxs = paddle.topk(distances, self.topk, axis=-1, largest=False) topk_idxs_list.append(topk_idxs + anchors_index) topk_idxs = paddle.where(pad_gt_mask, topk_idxs, paddle.zeros_like(topk_idxs)) is_in_topk = F.one_hot(topk_idxs, num_anchors).sum(axis=-2) is_in_topk = paddle.where(is_in_topk > 1, paddle.zeros_like(is_in_topk), is_in_topk) is_in_topk_list.append(is_in_topk.astype(gt2anchor_distances.dtype)) is_in_topk_list = paddle.concat(is_in_topk_list, axis=-1) topk_idxs_list = paddle.concat(topk_idxs_list, axis=-1) return is_in_topk_list, topk_idxs_list @paddle.no_grad() def forward(self, anchor_bboxes, num_anchors_list, gt_labels, gt_bboxes, pad_gt_mask, bg_index, gt_scores=None, pred_bboxes=None): """ ATSS匹配步骤如下: 1. 计算所有预测bbox与GT之间的IoU 2. 计算所有预测bbox与GT之间的距离 3. 在每个pyramid level上,对于每个gt,选择k个中心距离gt中心最近的bbox,总共选择k*l个bbox作为每个gt的候选框 4. 获取这些候选框对应的iou,计算mean和std,设 mean + std为 iou 阈值 5. 选择iou大于或等于阈值的样本为正样本 6. 将正样本的中心限制在gt内 7. 如果Anchor框被分配到多个gts,则选择具有最高的IoU的那个。 Args: anchor_bboxes (Tensor, float32): pre-defined anchors, shape(L, 4), "xmin, xmax, ymin, ymax" format num_anchors_list (List): num of anchors in each level gt_labels (Tensor, int64|int32): Label of gt_bboxes, shape(B, n, 1) gt_bboxes (Tensor, float32): Ground truth bboxes, shape(B, n, 4) pad_gt_mask (Tensor, float32): 1 means bbox, 0 means no bbox, shape(B, n, 1) bg_index (int): background index gt_scores (Tensor|None, float32) Score of gt_bboxes, shape(B, n, 1), if None, then it will initialize with one_hot label pred_bboxes (Tensor, float32, optional): predicted bounding boxes, shape(B, L, 4) Returns: assigned_labels (Tensor): (B, L) assigned_bboxes (Tensor): (B, L, 4) assigned_scores (Tensor): (B, L, C), if pred_bboxes is not None, then output ious """ assert gt_labels.ndim == gt_bboxes.ndim and gt_bboxes.ndim == 3 num_anchors, _ = anchor_bboxes.shape batch_size, num_max_boxes, _ = gt_bboxes.shape # 1. 计算所有预测bbox与GT之间的IoU, [B, n, L] ious = iou_similarity(gt_bboxes.reshape([-1, 4]), anchor_bboxes) ious = ious.reshape([batch_size, -1, num_anchors]) # 2. 计算所有预测bbox与GT之间的距离, [B, n, L] gt_centers = bbox_center(gt_bboxes.reshape([-1, 4])).unsqueeze(1) anchor_centers = bbox_center(anchor_bboxes) gt2anchor_distances = (gt_centers - anchor_centers.unsqueeze(0)).norm(2, axis=-1).reshape([batch_size, -1, num_anchors]) # 3. 在每个pyramid level上,对于每个gt,选择k个中心距离gt中心最近的bbox,总共选择k*l个bbox作为每个gt的候选框 # based on the center distance, [B, n, L] is_in_topk, topk_idxs = self._gather_topk_pyramid(gt2anchor_distances, num_anchors_list, pad_gt_mask) # 4. 获取这些候选框对应的iou,计算mean和std,设 mean + std为 iou 阈值 iou_candidates = ious * is_in_topk iou_threshold = paddle.index_sample(iou_candidates.flatten(stop_axis=-2), topk_idxs.flatten(stop_axis=-2)) iou_threshold = iou_threshold.reshape([batch_size, num_max_boxes, -1]) iou_threshold = iou_threshold.mean(axis=-1, keepdim=True) + iou_threshold.std(axis=-1, keepdim=True) is_in_topk = paddle.where(iou_candidates > iou_threshold.tile([1, 1, num_anchors]), is_in_topk, paddle.zeros_like(is_in_topk)) # 6. 将正样本的中心限制在gt内, [B, n, L] is_in_gts = check_points_inside_bboxes(anchor_centers, gt_bboxes) # 选择正样本, [B, n, L] mask_positive = is_in_topk * is_in_gts * pad_gt_mask # 7. 如果Anchor框被分配到多个gts,则选择具有最高的IoU的那个。 mask_positive_sum = mask_positive.sum(axis=-2) if mask_positive_sum.max() > 1: mask_multiple_gts = (mask_positive_sum.unsqueeze(1) > 1).tile([1, num_max_boxes, 1]) is_max_iou = compute_max_iou_anchor(ious) mask_positive = paddle.where(mask_multiple_gts, is_max_iou, mask_positive) mask_positive_sum = mask_positive.sum(axis=-2) # 8. 确认每个gt_bbox 都匹配到了 anchor if self.force_gt_matching: is_max_iou = compute_max_iou_gt(ious) * pad_gt_mask mask_max_iou = (is_max_iou.sum(-2, keepdim=True) == 1).tile([1, num_max_boxes, 1]) mask_positive = paddle.where(mask_max_iou, is_max_iou, mask_positive) mask_positive_sum = mask_positive.sum(axis=-2) assigned_gt_index = mask_positive.argmax(axis=-2) # 匹配目标 batch_ind = paddle.arange(end=batch_size, dtype=gt_labels.dtype).unsqueeze(-1) assigned_gt_index = assigned_gt_index + batch_ind * num_max_boxes assigned_labels = paddle.gather(gt_labels.flatten(), assigned_gt_index.flatten(), axis=0) assigned_labels = assigned_labels.reshape([batch_size, num_anchors]) assigned_labels = paddle.where(mask_positive_sum > 0, assigned_labels, paddle.full_like(assigned_labels, bg_index)) assigned_bboxes = paddle.gather(gt_bboxes.reshape([-1, 4]), assigned_gt_index.flatten(), axis=0) assigned_bboxes = assigned_bboxes.reshape([batch_size, num_anchors, 4]) assigned_scores = F.one_hot(assigned_labels, self.num_classes) if pred_bboxes is not None: # assigned iou ious = batch_iou_similarity(gt_bboxes, pred_bboxes) * mask_positive ious = ious.max(axis=-2).unsqueeze(-1) assigned_scores *= ious elif gt_scores is not None: gather_scores = paddle.gather(gt_scores.flatten(), assigned_gt_index.flatten(), axis=0) gather_scores = gather_scores.reshape([batch_size, num_anchors]) gather_scores = paddle.where(mask_positive_sum > 0, gather_scores, paddle.zeros_like(gather_scores)) assigned_scores *= gather_scores.unsqueeze(-1) return assigned_labels, assigned_bboxes, assigned_scores
2.2、Task-aligned Assigner思想(TOOD)
TOOD提出了Task Alignment Learning (TAL) 来显式的把2个任务的最优Anchor拉近。这是通过设计一个样本分配策略和任务对齐loss来实现的。样本分配计算每个Anchor的任务对齐度,同时任务对齐loss可以逐步将分类和定位的最佳Anchor统一起来。
类似于近期提出的One-Stage检测器,所提TOOD采用了类似的架构:Backbone-FPN-Head
。考虑到效率与简单性,类似ATSS, TOOD在每个位置放置一个Anchor。
正如所讨论的,由于分类与定位任务的发散性,现有One-Stage检测器存在任务不对齐(Task Mis-Alignment)约束问题。本文提出通过显式方式采用T-head+TAL
对2个任务进行对齐,见上图。T-head
与TAL
通过协同工作方式改善2个任务的对齐问题;
TOOD选取样本的方法具体来说:
- 首先,
T-head
在FPN特征基础上进行分类与定位预测; - 然后,
TAL
基于所提任务对齐测度计算任务对齐信息; - 最后,
T-head
根据从TAL传回的信息自动调整分类概率与定位预测。
1、Task-Aligned Head
本文所提T-Head见下图,它具有非常简单的结构:特征提取器+TAP
。
为增强分类与定位之间的相互作用,作者通过特征提取器学习任务交互
(Task-Interactive)特征,如中蓝色框部分。这种设计不仅有助于任务交互,同时可以为2个任务提供多级多尺度特征。
假设X表示FPN特征,特征提取器采用N个连续卷积计算任务交互特征:
因此,通过特征提取器可以得到丰富的多尺度特征并用于送入到后续2个TAP模块中进行分类与定位对齐。
2、Task-Aligned Sample Assignment
为与NMS搭配,训练样例的Anchor分配需要满足以下规则:
- 正常对齐的Anchor应当可以预测高分类得分,同时具有精确定位;
- 不对齐的Anchor应当具有低分类得分,并在NMS阶段被抑制。
基于上述两个规则,作者设计了一种新的Anchor对齐度量以显式度量Anchor层面的对齐度。该对齐度量将集成到样本分配与损失函数中以动态提炼每个Anchor的预测。
Anchor Alignment metric 考虑到分类得分与IoU表征了预测质量,我们采用2者的高阶组合度量任务对齐度,公式定义如下:
其中,s与u分别表示分类得分与IoU值,而用于控制两者的影响。因此,t在联合优化中起着非常重要的作用,它激励网络动态的聚焦于高质量的Anchor上。
Training sample assignment 正如已有研究表明,训练样例分配对于检测器的训练非常重要。
为提升两个任务的对齐性,TOOD聚焦于任务对齐Anchor,采用一种简单的分配规则选择训练样本:对每个实例,选择m个具有最大t值的Anchor作为正样例,选择其余的Anchor作为负样例。然后,通过新的损失函数(针对分类与定位的对齐而设计的损失函数)任务进行训练。
class TaskAlignedAssigner(nn.Layer): def __init__(self, topk=13, alpha=1.0, beta=6.0, eps=1e-9): super(TaskAlignedAssigner, self).__init__() self.topk = topk self.alpha = alpha self.beta = beta self.eps = eps @paddle.no_grad() def forward(self, pred_scores, pred_bboxes, anchor_points, num_anchors_list, gt_labels, gt_bboxes, pad_gt_mask, bg_index, gt_scores=None): """ Task-Aligned Assigner计算步骤如下: 1. 计算所有 bbox与 gt 之间的对齐度 2. 选择 top-k bbox 作为每个 gt 的候选项 3. 将正样品的中心限制在 gt 内(因为Anchor-Free检测器只能预测大于0的距离) 4. 如果一个Anchor被分配给多个gt,将选择IoU最高的那个。 Args: pred_scores (Tensor, float32): predicted class probability, shape(B, L, C) pred_bboxes (Tensor, float32): predicted bounding boxes, shape(B, L, 4) anchor_points (Tensor, float32): pre-defined anchors, shape(L, 2), "cxcy" format num_anchors_list (List): num of anchors in each level, shape(L) gt_labels (Tensor, int64|int32): Label of gt_bboxes, shape(B, n, 1) gt_bboxes (Tensor, float32): Ground truth bboxes, shape(B, n, 4) pad_gt_mask (Tensor, float32): 1 means bbox, 0 means no bbox, shape(B, n, 1) bg_index (int): background index gt_scores (Tensor|None, float32) Score of gt_bboxes, shape(B, n, 1) Returns: assigned_labels (Tensor): (B, L) assigned_bboxes (Tensor): (B, L, 4) assigned_scores (Tensor): (B, L, C) """ assert pred_scores.ndim == pred_bboxes.ndim assert gt_labels.ndim == gt_bboxes.ndim and gt_bboxes.ndim == 3 batch_size, num_anchors, num_classes = pred_scores.shape _, num_max_boxes, _ = gt_bboxes.shape # 计算GT与预测box之间的iou, [B, n, L] ious = iou_similarity(gt_bboxes, pred_bboxes) # 获取预测bboxes class score pred_scores = pred_scores.transpose([0, 2, 1]) batch_ind = paddle.arange(end=batch_size, dtype=gt_labels.dtype).unsqueeze(-1) gt_labels_ind = paddle.stack([batch_ind.tile([1, num_max_boxes]), gt_labels.squeeze(-1)], axis=-1) bbox_cls_scores = paddle.gather_nd(pred_scores, gt_labels_ind) # 计算bbox与 gt 之间的对齐度, [B, n, L] alignment_metrics = bbox_cls_scores.pow(self.alpha) * ious.pow(self.beta) # check the positive sample's center in gt, [B, n, L] is_in_gts = check_points_inside_bboxes(anchor_points, gt_bboxes) # 选择 top-k 预测 bbox 作为每个 gt 的候选项 is_in_topk = gather_topk_anchors(alignment_metrics * is_in_gts, self.topk, topk_mask=pad_gt_mask.tile([1, 1, self.topk]).astype(paddle.bool)) # select positive sample, [B, n, L] mask_positive = is_in_topk * is_in_gts * pad_gt_mask # 如果一个Anchor被分配给多个gt,将选择IoU最高的那个, [B, n, L] mask_positive_sum = mask_positive.sum(axis=-2) if mask_positive_sum.max() > 1: mask_multiple_gts = (mask_positive_sum.unsqueeze(1) > 1).tile([1, num_max_boxes, 1]) is_max_iou = compute_max_iou_anchor(ious) mask_positive = paddle.where(mask_multiple_gts, is_max_iou, mask_positive) mask_positive_sum = mask_positive.sum(axis=-2) assigned_gt_index = mask_positive.argmax(axis=-2) # assigned target assigned_gt_index = assigned_gt_index + batch_ind * num_max_boxes assigned_labels = paddle.gather(gt_labels.flatten(), assigned_gt_index.flatten(), axis=0) assigned_labels = assigned_labels.reshape([batch_size, num_anchors]) assigned_labels = paddle.where(mask_positive_sum > 0, assigned_labels, paddle.full_like(assigned_labels, bg_index)) assigned_bboxes = paddle.gather(gt_bboxes.reshape([-1, 4]), assigned_gt_index.flatten(), axis=0) assigned_bboxes = assigned_bboxes.reshape([batch_size, num_anchors, 4]) assigned_scores = F.one_hot(assigned_labels, num_classes) # rescale alignment metrics alignment_metrics *= mask_positive max_metrics_per_instance = alignment_metrics.max(axis=-1, keepdim=True) max_ious_per_instance = (ious * mask_positive).max(axis=-1, keepdim=True) alignment_metrics = alignment_metrics / (max_metrics_per_instance + self.eps) * max_ious_per_instance alignment_metrics = alignment_metrics.max(-2).unsqueeze(-1) assigned_scores = assigned_scores * alignment_metrics return assigned_labels, assigned_bboxes, assigned_scores