目标检测的Tricks | 【Trick9】nms非极大值抑制处理(包括变体merge-nms、and-nms、soft-nms、diou-nms等介绍)

1. NMS主要步骤



x = x[x[:, 4] > conf_thres]


min_wh, max_wh = 2, 4096  # (pixels) minimum and maximum box width and height
x = x[((x[:, 2:4] > min_wh) & (x[:, 2:4] < max_wh)).all(1)]



if multi_label:  # 针对每个类别执行非极大值抑制
    # (x[:, 5:] > conf_thres).nonzero(as_tuple=False): torch.Size([nums, 2])
    # 返回是满足条件的位置,每个nums表示第几行第几列非0
    # i: 满足条件的x
    # j: 满足条件的x所在位置索引
    i, j = (x[:, 5:] > conf_thres).nonzero(as_tuple=False).t()
    # 由于这里的j是直接根据20类来进行索引,但是在x中是维度是25,前5个维度是坐标与置信度信息,所以需要加5来进行偏移
    # 然后将边界框信息(xyxy), 置信度,索引,重新整合成新x
    x = torch.cat((box[i], x[i, j + 5].unsqueeze(1), j.float().unsqueeze(1)), 1)
else:  # best class only  直接针对每个类别中概率最大的类别进行非极大值抑制处理
    conf, j = x[:, 5:].max(1)
    x = torch.cat((box, conf.unsqueeze(1), j.float().unsqueeze(1)), 1)[conf > conf_thres]


# Apply finite constraint: 检测数据是否为有限数
if not torch.isfinite(x).all():
    x = x[torch.isfinite(x).all(1)]



# Batched NMS: x[box(xyxy), conf, max_index]
c = x[:, 5] * 0 if agnostic else x[:, 5]  # classes
boxes, scores = x[:, :4].clone() + c.view(-1, 1) * max_wh, x[:, 4]  # boxes (offset by class), scores
# 非极大值抑制处理,返回筛选后的索引
i = torchvision.ops.nms(boxes, scores, iou_thres)
i = i[:max_num]  # 最多只保留前max_num个目标信息
# 获取最后的筛选结果
output[xi] = x[i]

这里需要注意的点是,对于torchvision.ops.nms函数来说,传入的是预测的边界框,置信度以及iou阈值,这里是和真实边界框ground true是没有任何关系的,而返回是就是筛选出的索引。


def nms(boxes: Tensor, scores: Tensor, iou_threshold: float) -> Tensor:
    Performs non-maximum suppression (NMS) on the boxes according
    to their intersection-over-union (IoU).
    NMS iteratively removes lower scoring boxes which have an
    IoU greater than iou_threshold with another (higher scoring)
    If multiple boxes have the exact same score and satisfy the IoU
    criterion with respect to a reference box, the selected box is
    not guaranteed to be the same between CPU and GPU. This is similar
    to the behavior of argsort in PyTorch when repeated values are present.
        boxes (Tensor[N, 4])): boxes to perform NMS on. They
            are expected to be in ``(x1, y1, x2, y2)`` format with ``0 <= x1 < x2`` and
            ``0 <= y1 < y2``.
        scores (Tensor[N]): scores for each one of the boxes
        iou_threshold (float): discards all overlapping boxes with IoU > iou_threshold
        Tensor: int64 tensor with the indices of the elements that have been kept
        by NMS, sorted in decreasing order of scores
    return torch.ops.torchvision.nms(boxes, scores, iou_threshold)


# 源码的例子:
# Batched NMS: x[box(xyxy), conf, max_index]
# c = x[:, 5] * 0 if agnostic else x[:, 5]  # classes
# boxes, scores = x[:, :4].clone() + c.view(-1, 1) * max_wh, x[:, 4]  # boxes (offset by class), scores
# 我更改的例子:
# 非极大值抑制处理,根据边界框与置信度来进行筛选,返回筛选后的索引
boxes, scores = x[:, :4], x[:, 4]
i = torchvision.ops.nms(boxes, scores, iou_thres)
i = i[:max_num]  # 最多只保留前max_num个目标信息







2. NMS代码实现



# 筛选以及非极大值抑制处理
def non_max_suppression(prediction, conf_thres=0.1, iou_thres=0.6,
                        multi_label=True, classes=None, agnostic=False, max_num=100):
    Performs  Non-Maximum Suppression on inference results
        prediction: [batch, num_anchors(3个yolo预测层), (x+y+w+h+1+num_classes)]  3个anchor的预测结果总和
        conf_thres: 先进行一轮筛选,将分数过低的预测框(<conf_thres)删除(分数置0)
        nms_thres: iou阈值, 如果其余预测框与target的iou>iou_thres, 就将那个预测框置0
        multi_label: 是否是多标签
    Returns detections with shape:
        (x1, y1, x2, y2, object_conf, class)
    # Settings
    merge = True  # merge for best mAP
    min_wh, max_wh = 2, 4096  # (pixels) minimum and maximum box width and height
    time_limit = 10.0  # seconds to quit after
    t = time.time()
    nc = prediction[0].shape[1] - 5  # number of classes
    multi_label &= nc > 1  # multiple labels per box
    output = [None] * prediction.shape[0]
    for xi, x in enumerate(prediction):  # image index, image inference 遍历每张图片
        # Apply constraints
        # torch.Size([5040, 25]) -> torch.Size([3051, 25])
        x = x[x[:, 4] > conf_thres]  # confidence 根据obj confidence虑除背景目标
        # torch.Size([3051, 25]) -> torch.Size([3051, 25])
        x = x[((x[:, 2:4] > min_wh) & (x[:, 2:4] < max_wh)).all(1)]  # width-height 虑除小目标
        # If none remain process next image
        if not x.shape[0]:
        # Compute conf: x[:, xywh + conf + cls_score]
        x[..., 5:] *= x[..., 4:5]  # conf = obj_conf * cls_conf
        # Box (center x, center y, width, height) to (x1, y1, x2, y2)
        box = xywh2xyxy(x[:, :4])
        # Detections matrix nx6 (xyxy, conf, cls)
        if multi_label:  # 针对每个类别执行非极大值抑制
            # (x[:, 5:] > conf_thres).nonzero(as_tuple=False): torch.Size([nums, 2])
            # 返回是满足条件的位置,每个nums表示第几行第几列非0
            # i: 满足条件的x
            # j: 满足条件的x所在位置索引
            i, j = (x[:, 5:] > conf_thres).nonzero(as_tuple=False).t()
            # 由于这里的j是直接根据20类来进行索引,但是在x中是维度是25,前5个维度是坐标与置信度信息,所以需要加5来进行偏移
            # 然后将边界框信息(xyxy), 置信度,索引,重新整合成新x
            x = torch.cat((box[i], x[i, j + 5].unsqueeze(1), j.float().unsqueeze(1)), 1)
        else:  # best class only  直接针对每个类别中概率最大的类别进行非极大值抑制处理
            conf, j = x[:, 5:].max(1)
            x = torch.cat((box, conf.unsqueeze(1), j.float().unsqueeze(1)), 1)[conf > conf_thres]
        # Filter by class
        if classes:
            x = x[(j.view(-1, 1) == torch.tensor(classes, device=j.device)).any(1)]
        # Apply finite constraint: 检测数据是否为有限数
        if not torch.isfinite(x).all():
            x = x[torch.isfinite(x).all(1)]
        # If none remain process next image
        n = x.shape[0]  # number of boxes
        if not n:
        # Sort by confidence
        x = x[x[:, 4].argsort(descending=True)]
        # Batched NMS: x[box(xyxy), conf, max_index]
        # c = x[:, 5] * 0 if agnostic else x[:, 5]  # classes
        # boxes, scores = x[:, :4].clone() + c.view(-1, 1) * max_wh, x[:, 4]  # boxes (offset by class), scores
        # 非极大值抑制处理,根据边界框与置信度来进行筛选,返回筛选后的索引
        boxes, scores = x[:, :4], x[:, 4]
        i = torchvision.ops.nms(boxes, scores, iou_thres)
        i = i[:max_num]  # 最多只保留前max_num个目标信息
        # iou筛选处理: 使用加权平均值合并框
        if merge and (1 < n < 3E3):  # Merge NMS (boxes merged using weighted mean)
            try:  # update boxes as boxes(i,4) = weights(i,n) * boxes(n,4)
                iou = box_iou(boxes[i], boxes) > iou_thres  # iou matrix
                # [11, 78] * [1, 78] = [11, 78]
                weights = iou * scores[None]  # box weights
                # torch.mm: 矩阵a和b矩阵相乘
                # torch.mul: 矩阵a和b数乘,维度不变
                # torch.mm(weights, x[:, :4]).float(): [11, 78] * [78, 4] = [11, 4]
                # weights.sum(1, keepdim=True): torch.Size([11, 1])
                x[i, :4] = torch.mm(weights, x[:, :4]).float() / weights.sum(1, keepdim=True)  # merged boxes
                # i = i[iou.sum(1) > 1]  # require redundancy
            except:  # possible CUDA error https://github.com/ultralytics/yolov3/issues/1139
                print(x, i, x.shape, i.shape)
        output[xi] = x[i]
        if (time.time() - t) > time_limit:
            break  # time limit exceeded
    return output

补充说明:函数一般经过上诉的5个步骤进行筛选,一般就已经可以得到最后的预测结果的。尤其是在对类别进行阈值筛选哪里,就已经把上前的样本筛选剩下几十几百个正样本,在经过nms稍稍处理,就可以挑选出比较合适的预测结果的。但是,在函数的最后还有一个设定,就是是否使用Merge nms进行进一步的处理。


3. NMS的变体与实现

3.1 hard_nms_batch



  • 简单使用:
i = torchvision.ops.boxes.batched_nms(pred[:, :4], pred[:, 4], pred[:, 5], nms_thres)
output[image_i] = pred[i]
  • 详细介绍
def batched_nms(
    boxes: Tensor,
    scores: Tensor,
    idxs: Tensor,
    iou_threshold: float,
) -> Tensor:
    Performs non-maximum suppression in a batched fashion.
    Each index value correspond to a category, and NMS
    will not be applied between elements of different categories.
        boxes (Tensor[N, 4]): boxes where NMS will be performed. They
            are expected to be in ``(x1, y1, x2, y2)`` format with ``0 <= x1 < x2`` and
            ``0 <= y1 < y2``.
        scores (Tensor[N]): scores for each one of the boxes
        idxs (Tensor[N]): indices of the categories for each one of the boxes.
        iou_threshold (float): discards all overlapping boxes with IoU > iou_threshold
        Tensor: int64 tensor with the indices of the elements that have been kept by NMS, sorted
        in decreasing order of scores
    # Benchmarks that drove the following thresholds are at
    # https://github.com/pytorch/vision/issues/1311#issuecomment-781329339
    # Ideally for GPU we'd use a higher threshold
    if boxes.numel() > 4_000 and not torchvision._is_tracing():
        return _batched_nms_vanilla(boxes, scores, idxs, iou_threshold)
        return _batched_nms_coordinate_trick(boxes, scores, idxs, iou_threshold)

3.2 hard_nms



  • 简单使用:
i = torchvision.ops.boxes.nms(pred[:, :4], pred[:, 4], nms_thres)
output[image_i] = pred[i]

  • 详细介绍:
def nms(boxes: Tensor, scores: Tensor, iou_threshold: float) -> Tensor:
    Performs non-maximum suppression (NMS) on the boxes according
    to their intersection-over-union (IoU).
    NMS iteratively removes lower scoring boxes which have an
    IoU greater than iou_threshold with another (higher scoring)
    If multiple boxes have the exact same score and satisfy the IoU
    criterion with respect to a reference box, the selected box is
    not guaranteed to be the same between CPU and GPU. This is similar
    to the behavior of argsort in PyTorch when repeated values are present.
        boxes (Tensor[N, 4])): boxes to perform NMS on. They
            are expected to be in ``(x1, y1, x2, y2)`` format with ``0 <= x1 < x2`` and
            ``0 <= y1 < y2``.
        scores (Tensor[N]): scores for each one of the boxes
        iou_threshold (float): discards all overlapping boxes with IoU > iou_threshold
        Tensor: int64 tensor with the indices of the elements that have been kept
        by NMS, sorted in decreasing order of scores
    return torch.ops.torchvision.nms(boxes, scores, iou_threshold)

3.3 and-nms




# 降序排列 为NMS做准备  [43, 6]
pred = pred[pred[:, 4].argsort(descending=True)]
det_max = []  # 存放分数最高的框 即target
cls = pred[:, -1]
for c in cls.unique():  # 对所有的种类(不重复)
    dc = pred[cls == c]  # dc: 选出pred中所有类别是c的结果
    n = len(dc)  # 有多少个类别是c的预测框
    # 在hard-nms的逻辑基础上,增加是否为单独框的限制,删除没有重叠框的框(减少误检)。
    if method == 'and':  # requires overlap, single boxes erased
        while len(dc) > 1:
            iou = bbox_iou(dc[0], dc[1:])  # iou with other boxes
            if iou.max() > 0.5:  # 删除没有重叠框的框/iou小于0.5的框(减少误检)
            # 首先需要去除分数最高的,以及其他重复的预测框,再进行下一轮的筛选
            dc = dc[1:][iou < nms_thres]  # remove ious > threshold
# 对于同一张图的nms处理结果,进行拼接处理,并且按置信度进行降序排序(也就是从大到小排序)
if len(det_max):
    det_max = torch.cat(det_max)  # concatenate  因为之前是append进det_max的
    output[image_i] = det_max[(-det_max[:, 4]).argsort()]  # 排序

3.4 merge-nms





# 降序排列 为NMS做准备  [43, 6]
pred = pred[pred[:, 4].argsort(descending=True)]
det_max = []  # 存放分数最高的框 即target
cls = pred[:, -1]
for c in cls.unique():  # 对所有的种类(不重复)
    dc = pred[cls == c]  # dc: 选出pred中所有类别是c的结果
    n = len(dc)  # 有多少个类别是c的预测框
    # 在hard-nms的基础上,增加保留框位置平滑策略(重叠框位置信息求解平均值),使框的位置更加精确。
    if method == 'merge':  # weighted mixture box
        while len(dc):
            if len(dc) == 1:
            # 筛选出iou大于阈值的索引,这部分的预测框可以看出是重复的,对这部分筛选出来的预测框的置信度作为权重weights
            i = bbox_iou(dc[0], dc) > nms_thres  # i = True/False的集合
            weights = dc[i, 4:5]     # 根据i,保留所有True
            # weights: [8, 1], dc[i, :4]: [8, 4], (weights * dc[i, :4]): [8, 4]
            # .sum(0): {Tensor: 4}, .sum(1): {Tensor: 8}, weights.sum(): {Tensor}
            # 就是对当前重复的预测框进行一个平均边界预测,用挑选出来的置信度与边界框进行相乘,将边界框的平均偏移量作为需要挑选的预测框,所以需要处于权重和
            # 权重较高的预测框,更需要重视,但是也不忽视其他的预测框就是merge的核心
            dc[0, :4] = (weights * dc[i, :4]).sum(0) / weights.sum()  # 重叠框位置信息求解平均值
            # 将根据权重和处理好的预测框作为这批重复率高预测框的最好框,从而继续处理其他重复率高的预测框
            # i=0表示重合度比较低,提出重复率高的预测框,接下来继续循环这些剩下的重合度较低的预测框
            dc = dc[i == 0]
# 对于同一张图的nms处理结果,进行拼接处理,并且按置信度进行降序排序(也就是从大到小排序)
if len(det_max):
    det_max = torch.cat(det_max)  # concatenate  因为之前是append进det_max的
    output[image_i] = det_max[(-det_max[:, 4]).argsort()]  # 排序

3.5 soft-nms





# 降序排列 为NMS做准备  [43, 6]
pred = pred[pred[:, 4].argsort(descending=True)]
det_max = []  # 存放分数最高的框 即target
cls = pred[:, -1]
for c in cls.unique():  # 对所有的种类(不重复)
    dc = pred[cls == c]  # dc: 选出pred中所有类别是c的结果
    n = len(dc)  # 有多少个类别是c的预测框
    # soft-NMS: https://arxiv.org/abs/1704.04503
    # 推理时间:0.0030s
    if method == 'soft_nms':  
        sigma = 0.5  # soft-nms sigma parameter
        while len(dc):
            # if len(dc) == 1:  这是U版的源码 我做了个小改动
            #     det_max.append(dc)
            #     break
            # det_max.append(dc[:1])
            det_max.append(dc[:1])   # append dc的第一行  即target
            if len(dc) == 1:
            iou = bbox_iou(dc[0], dc[1:])  # 计算target与其他框的iou
            # 这里和上面的直接置0不同,置0不需要管维度
            dc = dc[1:]  # dc=target往后的所有预测框
            # dc必须不包括target及其前的预测框,因为还要和值相乘, 维度上必须相同
            # dc[:, 4]: {Tensor: 36}
            # torch.exp(-iou ** 2 / sigma): {Tensor: 36}
            dc[:, 4] *= torch.exp(-iou ** 2 / sigma)  # 得分衰减{Tensor: 36}
            # 另外一种方式来挑选预测框, 衰减完后置信度还比较高的预测框,物理含义是重复率不算太高的
            # 因为重复率高,衰减的系数也会大,置信度会变得很小
            dc = dc[dc[:, 4] > conf_thres]
# 对于同一张图的nms处理结果,进行拼接处理,并且按置信度进行降序排序(也就是从大到小排序)
if len(det_max):
    det_max = torch.cat(det_max)  # concatenate  因为之前是append进det_max的
    output[image_i] = det_max[(-det_max[:, 4]).argsort()]  # 排序

3.6 iou-nms




# 降序排列 为NMS做准备  [43, 6]
pred = pred[pred[:, 4].argsort(descending=True)]
det_max = []  # 存放分数最高的框 即target
cls = pred[:, -1]
for c in cls.unique():  # 对所有的种类(不重复)
    dc = pred[cls == c]  # dc: 选出pred中所有类别是c的结果
    n = len(dc)  # 有多少个类别是c的预测框
    # 推理时间:0.00299 是官方写的3倍
    if method == 'iou_nms':  # Hard NMS 自己写的 只支持单类别输入
        while dc.shape[0]:  # dc.shape[0]: 当前class的预测框数量
            det_max.append(dc[:1])  # 让score最大的一个预测框(排序后的第一个)为target
            if len(dc) == 1:  # 出口 dc中只剩下一个框时,break
            # dc[0] :target     dc[1:] :其他预测框
            # 做的内容就是将当前所挑选的最好的预测框与其他剩余的预测框计算iou,当iou比较高说明重复率大,可以删除
            # 这里因为要剔除重复框,所以只保留小于阈值的预测框,因为大于阈值的预测框说明是重复的
            iou = bbox_iou(dc[0], dc[1:])  # 计算 普通iou
            # remove target and iou > threshold
            # 首先需要去除分数最高的,以及其他重复的预测框,再进行下一轮的筛选
            dc = dc[1:][iou < nms_thres]
# 对于同一张图的nms处理结果,进行拼接处理,并且按置信度进行降序排序(也就是从大到小排序)
if len(det_max):
    det_max = torch.cat(det_max)  # concatenate  因为之前是append进det_max的
    output[image_i] = det_max[(-det_max[:, 4]).argsort()]  # 排序

3.7 diou_nms




# 降序排列 为NMS做准备  [43, 6]
pred = pred[pred[:, 4].argsort(descending=True)]
det_max = []  # 存放分数最高的框 即target
cls = pred[:, -1]
for c in cls.unique():  # 对所有的种类(不重复)
    dc = pred[cls == c]  # dc: 选出pred中所有类别是c的结果
    n = len(dc)  # 有多少个类别是c的预测框
    # 与iou_nms只是计算iou方面不一样而已
    if method == 'diou_nms':  # DIoU NMS  https://arxiv.org/pdf/1911.08287.pdf
    while dc.shape[0]:  # dc.shape[0]: 当前class的预测框数量
        det_max.append(dc[:1])  # 让score最大的一个预测框(排序后的第一个)为target
        if len(dc) == 1:  # 出口 dc中只剩下一个框时,break
        # dc[0] :target     dc[1:] :其他预测框
        diou = bbox_iou(dc[0], dc[1:], DIoU=True)  # 计算 diou
        dc = dc[1:][diou < nms_thres]  # remove dious > threshold  保留True 删去False
# 对于同一张图的nms处理结果,进行拼接处理,并且按置信度进行降序排序(也就是从大到小排序)
if len(det_max):
    det_max = torch.cat(det_max)  # concatenate  因为之前是append进det_max的
    output[image_i] = det_max[(-det_max[:, 4]).argsort()]  # 排序

主要不同就是将iou换成了diou ,使用的还是同一个函数,只是改变了一下参数,所以其实还可以使用giou_nms与ciou_nms,本质上没有变化。

4. NMS变体代码完整展示



def non_max_suppression(prediction, conf_thres=0.1,
                        iou_thres=0.6, multi_label=True, method='iou_nms'):
        Removes detections with lower object confidence score than 'conf_thres'
        Non-Maximum Suppression to further filter detections.
             prediction: [batch, num_anchors(3个yolo预测层), (x+y+w+h+1+num_classes)]  3个anchor的预测结果总和
             conf_thres: 先进行一轮筛选,将分数过低的预测框(<conf_thres)删除(分数置0)
             nms_thres: iou阈值, 如果其余预测框与target的iou>iou_thres, 就将那个预测框置0
             multi_label: 是否是多标签
             method: nms方法  (https://github.com/ultralytics/yolov3/issues/679)
                        -hard_nms: 普通的 (hard) nms 官方实现(c函数库),可支持gpu,只支持单类别输入
                        -hard_nms_batch: 普通的 (hard) nms 官方实现(c函数库),可支持gpu,支持多类别输入                    
                        -and_nms: 在hard-nms的逻辑基础上,增加是否为单独框的限制,删除没有重叠框的框(减少误检)。
                        -merge_nms: 在hard-nms的基础上,增加保留框位置平滑策略(重叠框位置信息求解平均值),使框的位置更加精确。
                        -soft_nms: soft nms 用一个衰减函数作用在score上来代替原来的置0
                        -iou_nms: 普通的 (hard) nms 只支持单类别输入
                        -diou_nms: 普通的 (hard) nms 的基础上引入DIoU(普通的nms用的是iou)
        Returns detections with shape:
            (x1, y1, x2, y2, object_conf, class)
    nms_thres = iou_thres
    multi_cls = multi_label
    # Box constraints
    min_wh, max_wh = 2, 4096  # (pixels) 宽度和高度的大小范围 [min_wh, max_wh]
    output = [None] * len(prediction)  # batch_size个output  存放最终筛选后的预测框结果
    for image_i, pred in enumerate(prediction):
        # 开始  pred = [12096, 25]
        # 第一层过滤   根据conf_thres虑除背景目标(obj_conf<conf_thres 0.1的目标 置信度极低的目标)
        pred = pred[pred[:, 4] > conf_thres]  # pred = [45, 25]
        # 第二层过滤   虑除超小anchor标和超大anchor  x=[45, 25]
        pred = pred[(pred[:, 2:4] > min_wh).all(1) & (pred[:, 2:4] < max_wh).all(1)]
        # 经过前两层过滤后如果该feature map没有目标框了,就结束这轮直接进行下一张图
        if len(pred) == 0:
        # 计算 score
        pred[..., 5:] *= pred[..., 4:5]  # score = cls_conf * obj_conf
        # Box (center x, center y, width, height) to (x1, y1, x2, y2)
        box = xywh2xyxy(pred[:, :4])
        # Detections matrix nx6 (xyxy, conf, cls)
        if multi_cls or conf_thres < 0.01:
            # 第三轮过滤:针对每个类别score(obj_conf * cls_conf) > conf_thres [43, 6]
            # 这里一个框是有可能有多个物体的,所以要筛选
            # nonzero: 获得矩阵中的非0(True)数据的下标  a.t(): 将a矩阵拆开
            # i: 下标 [43]   j: 类别index [43] 过滤了两个score太低的
            i, j = (pred[:, 5:] > conf_thres).nonzero(as_tuple=False).t()
            # pred = [43, xyxy+score+class] [43, 6]
            # unsqueeze(1): [43] => [43, 1] add batch dimension
            # box[i]: [43,4] xyxy
            # pred[i, j + 5].unsqueeze(1): [43,1] score  对每个i,取第(j+5)个位置的值(第j个class的值cla_conf)
            # j.float().unsqueeze(1): [43,1] class
            pred = torch.cat((box[i], pred[i, j + 5].unsqueeze(1), j.float().unsqueeze(1)), 1)
        else:  # best class only
            conf, j = pred[:, 5:].max(1)  # 一个类别直接取分数最大类的即可
            pred = torch.cat((box, conf.unsqueeze(1), j.float().unsqueeze(1)), 1)[conf > conf_thres]
        # 第三轮过滤后如果该feature map没有目标框了,就结束这轮直接进行下一个feature map
        if len(pred) == 0:
        # 第四轮过滤  这轮可有可无,一般没什么用 [43, 6] 检测数据是否为有限数
        pred = pred[torch.isfinite(pred).all(1)]
        # 降序排列 为NMS做准备  [43, 6]
        pred = pred[pred[:, 4].argsort(descending=True)]
        # Batched NMS
        # Batched NMS推理时间:0.054
        if method == 'hard_nms_batch':  # 普通的(hard)nms: 官方实现(c函数库),可支持gpu,但支持多类别输入
            # batched_nms:参数1 [43, xyxy]  参数2 [43, score]  参数3 [43, class]  参数4 [43, nms_thres]
            output[image_i] = pred[torchvision.ops.boxes.batched_nms(pred[:, :4], pred[:, 4], pred[:, 5], nms_thres)]
            # print("hard_nms_batch")
        # All other NMS methods  都是单类别输入
        det_max = []  # 存放分数最高的框 即target
        cls = pred[:, -1]
        for c in cls.unique():  # 对所有的种类(不重复)
            dc = pred[cls == c]  # dc: 选出pred中所有类别是c的结果
            n = len(dc)  # 有多少个类别是c的预测框
            if n == 1:
                # No NMS required if only 1 prediction
            elif n > 500:
                # limit to first 500 boxes: https://github.com/ultralytics/yolov3/issues/117
                # 密集性 主要考虑到NMS是一个速度慢的算法(O(n^2)),预测框太多,算法的效率太慢 所以这里筛选一下(最多500个预测框)
                dc = dc[:500]
            # 推理时间:0.001
            if method == 'hard_nms':  # 普通的(hard)nms: 只支持单类别输入
                det_max.append(dc[torchvision.ops.boxes.nms(dc[:, :4], dc[:, 4], nms_thres)])
            # 在hard-nms的逻辑基础上,增加是否为单独框的限制,删除没有重叠框的框(减少误检)。
            elif method == 'and_nms':  # requires overlap, single boxes erased
                while len(dc) > 1:
                    iou = bbox_iou(dc[0], dc[1:])  # iou with other boxes
                    if iou.max() > 0.5:  # 删除没有重叠框的框/iou小于0.5的框(减少误检)
                    dc = dc[1:][iou < nms_thres]  # remove ious > threshold
            # 在hard-nms的基础上,增加保留框位置平滑策略(重叠框位置信息求解平均值),使框的位置更加精确。
            elif method == 'merge_nms':  # weighted mixture box
                while len(dc):
                    if len(dc) == 1:
                    # 筛选出iou大于阈值的索引,这部分的预测框可以看出是重复的,对这部分筛选出来的预测框的置信度作为权重weights
                    i = bbox_iou(dc[0], dc) > nms_thres  # i = True/False的集合
                    weights = dc[i, 4:5]     # 根据i,保留所有True
                    # weights: [8, 1], dc[i, :4]: [8, 4], (weights * dc[i, :4]): [8, 4]
                    # .sum(0): {Tensor: 4}, .sum(1): {Tensor: 8}, weights.sum(): {Tensor}
                    # 就是对当前重复的预测框进行一个平均边界预测,用挑选出来的置信度与边界框进行相乘,将边界框的平均偏移量作为需要挑选的预测框,所以需要处于权重和
                    # 权重较高的预测框,更需要重视,但是也不忽视其他的预测框就是merge的核心
                    dc[0, :4] = (weights * dc[i, :4]).sum(0) / weights.sum()  # 重叠框位置信息求解平均值
                    # 将根据权重和处理好的预测框作为这批重复率高预测框的最好框,从而继续处理其他重复率高的预测框
                    # i=0表示重合度比较低,提出重复率高的预测框,接下来继续循环这些剩下的重合度较低的预测框
                    dc = dc[i == 0]
            # 推理时间:0.0030s
            elif method == 'soft_nms':  # soft-NMS      https://arxiv.org/abs/1704.04503
                sigma = 0.5  # soft-nms sigma parameter
                while len(dc):
                    # if len(dc) == 1:  这是U版的源码 我做了个小改动
                    #     det_max.append(dc)
                    #     break
                    # det_max.append(dc[:1])
                    det_max.append(dc[:1])   # append dc的第一行  即target
                    if len(dc) == 1:
                    iou = bbox_iou(dc[0], dc[1:])  # 计算target与其他框的iou
                    # 这里和上面的直接置0不同,置0不需要管维度
                    dc = dc[1:]  # dc=target往后的所有预测框
                    # dc必须不包括target及其前的预测框,因为还要和值相乘, 维度上必须相同
                    # dc[:, 4]: {Tensor: 36}
                    # torch.exp(-iou ** 2 / sigma): {Tensor: 36}
                    dc[:, 4] *= torch.exp(-iou ** 2 / sigma)  # 得分衰减{Tensor: 36}
                    # 另外一种方式来挑选预测框
                    dc = dc[dc[:, 4] > conf_thres]
            # 推理时间:0.00299 是官方写的3倍
            elif method == 'iou_nms':  # Hard NMS 只支持单类别输入
                while dc.shape[0]:  # dc.shape[0]: 当前class的预测框数量
                    det_max.append(dc[:1])  # 让score最大的一个预测框(排序后的第一个)为target
                    if len(dc) == 1:  # 出口 dc中只剩下一个框时,break
                    # dc[0] :target     dc[1:] :其他预测框
                    # 做的内容就是将当前所挑选的最好的预测框与其他剩余的预测框计算iou,当iou比较高说明重复率大,可以删除
                    # 这里因为要剔除重复框,所以只保留小于阈值的预测框,因为大于阈值的预测框说明是重复的
                    iou = bbox_iou(dc[0], dc[1:])  # 计算 普通iou
                    # remove target and iou > threshold
                    # 首先需要去除分数最高的,以及其他重复的预测框,再进行下一轮的筛选
                    dc = dc[1:][iou < nms_thres]
            # 推理时间:0.00299
            elif method == 'diou_nms':  # DIoU NMS  https://arxiv.org/pdf/1911.08287.pdf
                while dc.shape[0]:  # dc.shape[0]: 当前class的预测框数量
                    det_max.append(dc[:1])  # 让score最大的一个预测框(排序后的第一个)为target
                    if len(dc) == 1:  # 出口 dc中只剩下一个框时,break
                    # dc[0] :target     dc[1:] :其他预测框
                    # 只是将计算iou变成了计算diou
                    diou = bbox_iou(dc[0], dc[1:], DIoU=True)  # 计算 diou
                    dc = dc[1:][diou < nms_thres]  # remove dious > threshold  保留True 删去False
        # 对于同一张图的nms处理结果,进行拼接处理,并且按置信度进行降序排序(也就是从大到小排序)
        if len(det_max):
            det_max = torch.cat(det_max)  # concatenate  因为之前是append进det_max的
            output[image_i] = det_max[(-det_max[:, 4]).argsort()]  # 排序
    # output tensor [7, 6]
    return output


  • Hard-nms–直接删除相邻的同类别目标,密集目标的输出不友好。
  • Soft-nms–改变其相邻同类别目标置信度(有关iou的函数),后期通过置信度阈值进行过滤,适用于目标密集的场景。
  • or-nms–hard-nms的非官方实现形式,只支持cpu。
  • vision-nms–hard-nms的官方实现形式(c函数库),可支持gpu(cuda),只支持单类别输入。
  • vision-batched-nms–hard-nms的官方实现形式(c函数库),可支持gpu(cuda),支持多类别输入。
  • and-nms–在hard-nms的逻辑基础上,增加是否为单独框的限制,删除没有重叠框的框(减少误检)。
  • merge-nms–在hard-nms的基础上,增加保留框位置平滑策略(重叠框位置信息求解平均值),使框的位置更加精确。
  • diou-nms–在hard-nms的基础上,用diou替换iou,里有参照diou的优势。








