PP-YoLoE | PP-YoLov2全面升级Anchor-Free，速度精度完美超越YoLoX和YoLov5（二）-阿里云开发者社区

2样本匹配

2.1 ATSS Assigner思想

ATSS论文指出One-Stage Anchor-Based和Center-Based Anchor-Free检测算法间的差异主要来自于正负样本的选择，基于此提出ATSS(Adaptive Training Sample Selection)方法，该方法能够自动根据GT的相关统计特征选择合适的Anchor Box作为正样本，在不带来额外计算量和参数的情况下，能够大幅提升模型的性能。

ATSS选取正样本的方法如下：其简要流程为：

计算每个 gt bbox 和多尺度输出层的所有 anchor 之间的 IoU
计算每个 gt bbox 中心坐标和多尺度输出层的所有 anchor 中心坐标的 l2 距离
遍历每个输出层，遍历每个 gt bbox，找出当前层中 topk (超参，默认是 9 )个最小 l2 距离的 anchor 。假设一共有 l 个输出层，那么对于任何一个 gt bbox，都会挑选出 topk×l 个候选位置
对于每个 gt bbox，计算所有候选位置 IoU 的均值和标准差，两者相加得到该 gt bbox 的自适应阈值
遍历每个 gt bbox，选择出候选位置中 IoU 大于阈值的位置，该位置认为是正样本，负责预测该 gt bbox
如果 topk 参数设置过大，可能会导致某些正样本位置不在 gt bbox 内部，故需要过滤掉这部分正样本，设置为背景样本

1、ATSS主要2大特性：

保证了所有的正样本Anchor都是在Ground Truth的周围。
最主要是根据不同层的特性对不同层的正样本的阈值进行了微调。

2、ATSS的贡献

指出Anchor-Base检测器和Anchor-Free检测器之间的本质区别实际上是如何定义正训练样本和负训练样本；
提出自适应训练样本选择，以根据目标的统计特征自动选择正负样本；
证明了在图像上的每个位置上平铺多个Anchor来提升检测的性能是没效果的；

class ATSSAssigner(nn.Layer):
    """Bridging the Gap Between Anchor-based and Anchor-free Detection
     via Adaptive Training Sample Selection
    """
    __shared__ = ['num_classes']
    def __init__(self, topk=9, num_classes=80, force_gt_matching=False, eps=1e-9):
        super(ATSSAssigner, self).__init__()
        self.topk = topk
        self.num_classes = num_classes
        self.force_gt_matching = force_gt_matching
        self.eps = eps
    def _gather_topk_pyramid(self, gt2anchor_distances, num_anchors_list, pad_gt_mask):
        pad_gt_mask = pad_gt_mask.tile([1, 1, self.topk]).astype(paddle.bool)
        gt2anchor_distances_list = paddle.split(gt2anchor_distances, num_anchors_list, axis=-1)
        num_anchors_index = np.cumsum(num_anchors_list).tolist()
        num_anchors_index = [0, ] + num_anchors_index[:-1]
        is_in_topk_list = []
        topk_idxs_list = []
        for distances, anchors_index in zip(gt2anchor_distances_list, num_anchors_index):
            num_anchors = distances.shape[-1]
            topk_metrics, topk_idxs = paddle.topk(distances, self.topk, axis=-1, largest=False)
            topk_idxs_list.append(topk_idxs + anchors_index)
            topk_idxs = paddle.where(pad_gt_mask, topk_idxs, paddle.zeros_like(topk_idxs))
            is_in_topk = F.one_hot(topk_idxs, num_anchors).sum(axis=-2)
            is_in_topk = paddle.where(is_in_topk > 1, paddle.zeros_like(is_in_topk), is_in_topk)
            is_in_topk_list.append(is_in_topk.astype(gt2anchor_distances.dtype))
        is_in_topk_list = paddle.concat(is_in_topk_list, axis=-1)
        topk_idxs_list = paddle.concat(topk_idxs_list, axis=-1)
        return is_in_topk_list, topk_idxs_list
    @paddle.no_grad()
    def forward(self, anchor_bboxes, num_anchors_list, gt_labels, gt_bboxes, pad_gt_mask, bg_index, gt_scores=None, pred_bboxes=None):
        """
        ATSS匹配步骤如下：
        1. 计算所有预测bbox与GT之间的IoU
        2. 计算所有预测bbox与GT之间的距离
        3. 在每个pyramid level上，对于每个gt，选择k个中心距离gt中心最近的bbox，总共选择k*l个bbox作为每个gt的候选框
        4. 获取这些候选框对应的iou，计算mean和std，设 mean + std为 iou 阈值
        5. 选择iou大于或等于阈值的样本为正样本
        6. 将正样本的中心限制在gt内
        7. 如果Anchor框被分配到多个gts，则选择具有最高的IoU的那个。
        Args:
            anchor_bboxes (Tensor, float32): pre-defined anchors, shape(L, 4),
                    "xmin, xmax, ymin, ymax" format
            num_anchors_list (List): num of anchors in each level
            gt_labels (Tensor, int64|int32): Label of gt_bboxes, shape(B, n, 1)
            gt_bboxes (Tensor, float32): Ground truth bboxes, shape(B, n, 4)
            pad_gt_mask (Tensor, float32): 1 means bbox, 0 means no bbox, shape(B, n, 1)
            bg_index (int): background index
            gt_scores (Tensor|None, float32) Score of gt_bboxes,
                    shape(B, n, 1), if None, then it will initialize with one_hot label
            pred_bboxes (Tensor, float32, optional): predicted bounding boxes, shape(B, L, 4)
        Returns:
            assigned_labels (Tensor): (B, L)
            assigned_bboxes (Tensor): (B, L, 4)
            assigned_scores (Tensor): (B, L, C), if pred_bboxes is not None, then output ious
        """
        assert gt_labels.ndim == gt_bboxes.ndim and gt_bboxes.ndim == 3
        num_anchors, _ = anchor_bboxes.shape
        batch_size, num_max_boxes, _ = gt_bboxes.shape
        # 1. 计算所有预测bbox与GT之间的IoU, [B, n, L]
        ious = iou_similarity(gt_bboxes.reshape([-1, 4]), anchor_bboxes)
        ious = ious.reshape([batch_size, -1, num_anchors])
        # 2. 计算所有预测bbox与GT之间的距离, [B, n, L]
        gt_centers = bbox_center(gt_bboxes.reshape([-1, 4])).unsqueeze(1)
        anchor_centers = bbox_center(anchor_bboxes)
        gt2anchor_distances = (gt_centers - anchor_centers.unsqueeze(0)).norm(2, axis=-1).reshape([batch_size, -1, num_anchors])
        # 3. 在每个pyramid level上，对于每个gt，选择k个中心距离gt中心最近的bbox，总共选择k*l个bbox作为每个gt的候选框
        # based on the center distance, [B, n, L]
        is_in_topk, topk_idxs = self._gather_topk_pyramid(gt2anchor_distances, num_anchors_list, pad_gt_mask)
        # 4. 获取这些候选框对应的iou，计算mean和std，设 mean + std为 iou 阈值
        iou_candidates = ious * is_in_topk
        iou_threshold = paddle.index_sample(iou_candidates.flatten(stop_axis=-2), topk_idxs.flatten(stop_axis=-2))
        iou_threshold = iou_threshold.reshape([batch_size, num_max_boxes, -1])
        iou_threshold = iou_threshold.mean(axis=-1, keepdim=True) + iou_threshold.std(axis=-1, keepdim=True)
        is_in_topk = paddle.where(iou_candidates > iou_threshold.tile([1, 1, num_anchors]), is_in_topk, paddle.zeros_like(is_in_topk))
        # 6. 将正样本的中心限制在gt内, [B, n, L]
        is_in_gts = check_points_inside_bboxes(anchor_centers, gt_bboxes)
        # 选择正样本, [B, n, L]
        mask_positive = is_in_topk * is_in_gts * pad_gt_mask
        # 7. 如果Anchor框被分配到多个gts，则选择具有最高的IoU的那个。
        mask_positive_sum = mask_positive.sum(axis=-2)
        if mask_positive_sum.max() > 1:
            mask_multiple_gts = (mask_positive_sum.unsqueeze(1) > 1).tile([1, num_max_boxes, 1])
            is_max_iou = compute_max_iou_anchor(ious)
            mask_positive = paddle.where(mask_multiple_gts, is_max_iou, mask_positive)
            mask_positive_sum = mask_positive.sum(axis=-2)
        # 8. 确认每个gt_bbox 都匹配到了 anchor
        if self.force_gt_matching:
            is_max_iou = compute_max_iou_gt(ious) * pad_gt_mask
            mask_max_iou = (is_max_iou.sum(-2, keepdim=True) == 1).tile([1, num_max_boxes, 1])
            mask_positive = paddle.where(mask_max_iou, is_max_iou, mask_positive)
            mask_positive_sum = mask_positive.sum(axis=-2)
        assigned_gt_index = mask_positive.argmax(axis=-2)
        # 匹配目标
        batch_ind = paddle.arange(end=batch_size, dtype=gt_labels.dtype).unsqueeze(-1)
        assigned_gt_index = assigned_gt_index + batch_ind * num_max_boxes
        assigned_labels = paddle.gather(gt_labels.flatten(), assigned_gt_index.flatten(), axis=0)
        assigned_labels = assigned_labels.reshape([batch_size, num_anchors])
        assigned_labels = paddle.where(mask_positive_sum > 0, assigned_labels, paddle.full_like(assigned_labels, bg_index))
        assigned_bboxes = paddle.gather(gt_bboxes.reshape([-1, 4]), assigned_gt_index.flatten(), axis=0)
        assigned_bboxes = assigned_bboxes.reshape([batch_size, num_anchors, 4])
        assigned_scores = F.one_hot(assigned_labels, self.num_classes)
        if pred_bboxes is not None:
            # assigned iou
            ious = batch_iou_similarity(gt_bboxes, pred_bboxes) * mask_positive
            ious = ious.max(axis=-2).unsqueeze(-1)
            assigned_scores *= ious
        elif gt_scores is not None:
            gather_scores = paddle.gather(gt_scores.flatten(), assigned_gt_index.flatten(), axis=0)
            gather_scores = gather_scores.reshape([batch_size, num_anchors])
            gather_scores = paddle.where(mask_positive_sum > 0, gather_scores, paddle.zeros_like(gather_scores))
            assigned_scores *= gather_scores.unsqueeze(-1)
        return assigned_labels, assigned_bboxes, assigned_scores

2.2、Task-aligned Assigner思想（TOOD）

TOOD提出了Task Alignment Learning (TAL) 来显式的把2个任务的最优Anchor拉近。这是通过设计一个样本分配策略和任务对齐loss来实现的。样本分配计算每个Anchor的任务对齐度，同时任务对齐loss可以逐步将分类和定位的最佳Anchor统一起来。

类似于近期提出的One-Stage检测器，所提TOOD采用了类似的架构:Backbone-FPN-Head。考虑到效率与简单性，类似ATSS， TOOD在每个位置放置一个Anchor。

正如所讨论的，由于分类与定位任务的发散性，现有One-Stage检测器存在任务不对齐(Task Mis-Alignment)约束问题。本文提出通过显式方式采用T-head+TAL对2个任务进行对齐，见上图。T-head与TAL通过协同工作方式改善2个任务的对齐问题；

TOOD选取样本的方法具体来说:

首先，T-head在FPN特征基础上进行分类与定位预测;
然后，TAL基于所提任务对齐测度计算任务对齐信息;
最后，T-head根据从TAL传回的信息自动调整分类概率与定位预测。

1、Task-Aligned Head

本文所提T-Head见下图，它具有非常简单的结构：特征提取器+TAP。

为增强分类与定位之间的相互作用，作者通过特征提取器学习任务交互（Task-Interactive）特征，如中蓝色框部分。这种设计不仅有助于任务交互，同时可以为2个任务提供多级多尺度特征。

假设X表示FPN特征，特征提取器采用N个连续卷积计算任务交互特征：

因此，通过特征提取器可以得到丰富的多尺度特征并用于送入到后续2个TAP模块中进行分类与定位对齐。

2、Task-Aligned Sample Assignment

为与NMS搭配，训练样例的Anchor分配需要满足以下规则：

正常对齐的Anchor应当可以预测高分类得分，同时具有精确定位；
不对齐的Anchor应当具有低分类得分，并在NMS阶段被抑制。

基于上述两个规则，作者设计了一种新的Anchor对齐度量以显式度量Anchor层面的对齐度。该对齐度量将集成到样本分配与损失函数中以动态提炼每个Anchor的预测。

Anchor Alignment metric 考虑到分类得分与IoU表征了预测质量，我们采用2者的高阶组合度量任务对齐度，公式定义如下：

其中，s与u分别表示分类得分与IoU值，而用于控制两者的影响。因此，t在联合优化中起着非常重要的作用，它激励网络动态的聚焦于高质量的Anchor上。

Training sample assignment 正如已有研究表明，训练样例分配对于检测器的训练非常重要。

为提升两个任务的对齐性，TOOD聚焦于任务对齐Anchor，采用一种简单的分配规则选择训练样本：对每个实例，选择m个具有最大t值的Anchor作为正样例，选择其余的Anchor作为负样例。然后，通过新的损失函数(针对分类与定位的对齐而设计的损失函数)任务进行训练。

class TaskAlignedAssigner(nn.Layer):
    def __init__(self, topk=13, alpha=1.0, beta=6.0, eps=1e-9):
        super(TaskAlignedAssigner, self).__init__()
        self.topk = topk
        self.alpha = alpha
        self.beta = beta
        self.eps = eps
    @paddle.no_grad()
    def forward(self, pred_scores, pred_bboxes, anchor_points, num_anchors_list, gt_labels, gt_bboxes, pad_gt_mask, bg_index, gt_scores=None):
        """
        Task-Aligned Assigner计算步骤如下：
        1. 计算所有 bbox与 gt 之间的对齐度
        2. 选择 top-k bbox 作为每个 gt 的候选项
        3. 将正样品的中心限制在 gt 内(因为Anchor-Free检测器只能预测大于0的距离)
        4. 如果一个Anchor被分配给多个gt，将选择IoU最高的那个。
        Args:
            pred_scores (Tensor, float32): predicted class probability, shape(B, L, C)
            pred_bboxes (Tensor, float32): predicted bounding boxes, shape(B, L, 4)
            anchor_points (Tensor, float32): pre-defined anchors, shape(L, 2), "cxcy" format
            num_anchors_list (List): num of anchors in each level, shape(L)
            gt_labels (Tensor, int64|int32): Label of gt_bboxes, shape(B, n, 1)
            gt_bboxes (Tensor, float32): Ground truth bboxes, shape(B, n, 4)
            pad_gt_mask (Tensor, float32): 1 means bbox, 0 means no bbox, shape(B, n, 1)
            bg_index (int): background index
            gt_scores (Tensor|None, float32) Score of gt_bboxes, shape(B, n, 1)
        Returns:
            assigned_labels (Tensor): (B, L)
            assigned_bboxes (Tensor): (B, L, 4)
            assigned_scores (Tensor): (B, L, C)
        """
        assert pred_scores.ndim == pred_bboxes.ndim
        assert gt_labels.ndim == gt_bboxes.ndim and gt_bboxes.ndim == 3
        batch_size, num_anchors, num_classes = pred_scores.shape
        _, num_max_boxes, _ = gt_bboxes.shape
        # 计算GT与预测box之间的iou, [B, n, L]
        ious = iou_similarity(gt_bboxes, pred_bboxes)
        # 获取预测bboxes class score
        pred_scores = pred_scores.transpose([0, 2, 1])
        batch_ind = paddle.arange(end=batch_size, dtype=gt_labels.dtype).unsqueeze(-1)
        gt_labels_ind = paddle.stack([batch_ind.tile([1, num_max_boxes]), gt_labels.squeeze(-1)], axis=-1)
        bbox_cls_scores = paddle.gather_nd(pred_scores, gt_labels_ind)
        # 计算bbox与 gt 之间的对齐度, [B, n, L]
        alignment_metrics = bbox_cls_scores.pow(self.alpha) * ious.pow(self.beta)
        # check the positive sample's center in gt, [B, n, L]
        is_in_gts = check_points_inside_bboxes(anchor_points, gt_bboxes)
        # 选择 top-k 预测 bbox 作为每个 gt 的候选项
        is_in_topk = gather_topk_anchors(alignment_metrics * is_in_gts, self.topk, topk_mask=pad_gt_mask.tile([1, 1, self.topk]).astype(paddle.bool))
        # select positive sample, [B, n, L]
        mask_positive = is_in_topk * is_in_gts * pad_gt_mask
        # 如果一个Anchor被分配给多个gt，将选择IoU最高的那个, [B, n, L]
        mask_positive_sum = mask_positive.sum(axis=-2)
        if mask_positive_sum.max() > 1:
            mask_multiple_gts = (mask_positive_sum.unsqueeze(1) > 1).tile([1, num_max_boxes, 1])
            is_max_iou = compute_max_iou_anchor(ious)
            mask_positive = paddle.where(mask_multiple_gts, is_max_iou, mask_positive)
            mask_positive_sum = mask_positive.sum(axis=-2)
        assigned_gt_index = mask_positive.argmax(axis=-2)
        # assigned target
        assigned_gt_index = assigned_gt_index + batch_ind * num_max_boxes
        assigned_labels = paddle.gather(gt_labels.flatten(), assigned_gt_index.flatten(), axis=0)
        assigned_labels = assigned_labels.reshape([batch_size, num_anchors])
        assigned_labels = paddle.where(mask_positive_sum > 0, assigned_labels, paddle.full_like(assigned_labels, bg_index))
        assigned_bboxes = paddle.gather(gt_bboxes.reshape([-1, 4]), assigned_gt_index.flatten(), axis=0)
        assigned_bboxes = assigned_bboxes.reshape([batch_size, num_anchors, 4])
        assigned_scores = F.one_hot(assigned_labels, num_classes)
        # rescale alignment metrics
        alignment_metrics *= mask_positive
        max_metrics_per_instance = alignment_metrics.max(axis=-1, keepdim=True)
        max_ious_per_instance = (ious * mask_positive).max(axis=-1, keepdim=True)
        alignment_metrics = alignment_metrics / (max_metrics_per_instance + self.eps) * max_ious_per_instance
        alignment_metrics = alignment_metrics.max(-2).unsqueeze(-1)
        assigned_scores = assigned_scores * alignment_metrics
        return assigned_labels, assigned_bboxes, assigned_scores

PP-YoLoE | PP-YoLov2全面升级Anchor-Free，速度精度完美超越YoLoX和YoLov5（二）

2样本匹配

2.1 ATSS Assigner思想

1、ATSS主要2大特性：

2、ATSS的贡献

2.2、Task-aligned Assigner思想（TOOD）

1、Task-Aligned Head

2、Task-Aligned Sample Assignment

热门文章

最新文章

相关电子书

相关实验场景

热门

活动广场

任务中心

开发者评测

高校计划

乘风者计划

训练营

阿里云MVP

话题

直播

下载

镜像站

技术资料

插件

PP-YoLoE | PP-YoLov2全面升级Anchor-Free，速度精度完美超越YoLoX和YoLov5（二）

2样本匹配

2.1 ATSS Assigner思想

1、ATSS主要2大特性：

2、ATSS的贡献

2.2、Task-aligned Assigner思想（TOOD）

1、Task-Aligned Head

2、Task-Aligned Sample Assignment

热门文章

最新文章

相关电子书

相关实验场景