1. coco评价指标包调用

在训练完一个epoch数据之后，一般需要进行一个验证不过也可以全部训练完再进行验证。而在yolov3spp项目中，这里将预测结果送入官网写的计算coco评价指标包里，计算出coco评价指标：mAP@[IoU=0.50:0.95]、mAP@[IoU=0.50]、mAR@[IoU=0.50:0.95]等。而且这个工具包只需要在验证集进行调用即可，训练集训练与最后测试的过程是不需要用的。

对于这些工具类的文件代码，其实会掉包使用就好了。

调用流程如下：

# 遍历一遍验证集，将标签信息全部读取一遍，方便使用pycocotools来计算map，以下两条代码是等价的
# 根据val_dataset来获取img与target
# coco = get_coco_api_from_dataset(val_dataset)
coco = get_coco_api_from_dataset(data_loader.dataset)
coco_evaluator = CocoEvaluator(coco, iou_types)        # iou_types： ['bbox']
...
# 获取后处理后的预测信息，此处的output包含对这张图像的：预测边界框信息、预测类别、预测置信度
# 一张图像对应了这张图像的一个信息字典（每个字典包含这三个信息）
res = {img_id: output for img_id, output in zip(img_index, outputs)}
coco_evaluator.update(res)     # 对当前验证的batch进行更新保留
coco_evaluator.synchronize_between_processes()
coco_evaluator.accumulate()
coco_evaluator.summarize()
# 所需的评价指标：{list: 12}
result_info = coco_evaluator.coco_eval[iou_types[0]].stats.tolist()

对于这里的输出结果，result_info的内容，如下所示；

# COCO results:
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.597
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.816
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.658
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.221
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.467
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.688
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.477
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.684
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.698
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.352
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.587
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.769

如果对这些指标不认识的，可以查看我曾经写过的一篇博文：目标检测中的评估指标：PR曲线、AP、mAP，下面简要介绍：

# 这两行分别表示IoU取不同时候的mAP结果(分别阈值取0.5与0.75时的结果)
APIoU=.50% AP at IoU=.50 (PASCAL VOC metric)   # 该指标就是pascol voc的评价指标
APIoU=.75% AP at IoU=.75 (strict metric)
# 表示当 IoU从0.5~0.95这段范围中，间隔为0.05，一共10个IoU值上mAP的均值，这个是coco数据集上最主要的评判指标
AP% AP at IoU=.50:.05:.95 (primary challenge metric)   # 该指标是coco数据集的评价指标

所以在yolospp代码中，这里取第一，第二，第八行：

# 平均mAP指标：coco数据集上最主要的评判指标
Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] 
# IoU取0.5时的mAP结果：pascol voc的评价指标
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] 
 # 平均mAP指标：coco数据集上最主要的评判指标
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ]

如下所示：

# 遍历一遍验证集，将标签信息全部读取一遍，方便使用pycocotools来计算map
coco = get_coco_api_from_dataset(val_dataset)
# 每训练完一个epoch，都会进行验证一次，返回的结果就是输出的一些列coco的指标
result_info = train_util.evaluate(model, val_datasetloader, coco=coco, device=device)
# 获取验证结果
coco_mAP = result_info[0]
voc_mAP = result_info[1]
coco_mAR = result_info[8]

2. 日志工具代码调用

这里主要是介绍MetricLogger类的调用流程。

1）以验证过程为例，在evaluate函数使用如下：

# 构建MetricLogger类对象，选择用空格符来间隔信息，默认是\tab
metric_logger = utils.MetricLogger(delimiter="  ")
...
# 这里的img_index对于的是哪一张图像的索引
# iterable: data_loader 表示不断加载数据出来，一次取出一个batch_size的img
#           由于这里自定义了batch，设置了collate_fn函数，所以还会输出其他内容: targets, paths, shapes, img_index
# print_freq: 100 表示100轮打印一次参数
# header: 开头打印的内容，这里为"Test: "
for imgs, targets, paths, shapes, img_index in metric_logger.log_every(data_loader, 100, header):
    ...
    metric_logger.update(model_time=model_time, evaluator_time=evaluator_time)
# 收集所有进程的统计信息
metric_logger.synchronize_between_processes()
print("Averaged stats:", metric_logger)

验证过程中打印的内容：

Test:   [   0/1456]  eta: 0:06:45.600605  model_time: 0.1178 (0.1178)  evaluator_time: 0.0083 (0.0083)  time: 0.2786  data: 0.1483  max mem: 0
Test:   [ 100/1456]  eta: 0:02:30.134258  model_time: 0.1056 (0.1006)  evaluator_time: 0.0055 (0.0071)  time: 0.1101  data: 0.0001  max mem: 0
Test:   [ 200/1456]  eta: 0:02:48.761989  model_time: 0.3777 (0.1227)  evaluator_time: 0.0065 (0.0069)  time: 0.3621  data: 0.0001  max mem: 0
...
Test:  Total time: 0:08:30 (0.3506 s / it)
Averaged stats: model_time: 0.1293 (0.3195)  evaluator_time: 0.0044 (0.0064)

2）以训练过程为例，在train_one_epoch函数使用如下：

metric_logger = utils.MetricLogger(delimiter="  ")
metric_logger.add_meter('lr', utils.SmoothedValue(window_size=1, fmt='{value:.6f}'))
header = 'Epoch: [{}]'.format(epoch)
for i, (imgs, targets, paths, _, _) in enumerate(metric_logger.log_every(data_loader, print_freq, header)):
    ...
    now_lr = optimizer.param_groups[0]["lr"]
    metric_logger.update(loss=losses_reduced, **loss_dict_reduced)
    metric_logger.update(lr=now_lr)
# 收集所有进程的统计信息
metric_logger.synchronize_between_processes()
print("lr and loss stats:", metric_logger)

训练过程中打印的内容：

Epoch: [0]  [   0/1430]  eta: 0:25:21.825325  lr: 0.000001  loss: 19.4350 (19.4350)  box_loss: 7.8804 (7.8804)  obj_loss: 4.9758 (4.9758)  class_loss: 6.5787 (6.5787)  time: 1.0642  data: 0.2971  max mem: 0
Epoch: [0]  [  50/1430]  eta: 0:12:22.615927  lr: 0.000051  loss: 14.0581 (18.1472)  box_loss: 5.0861 (7.0637)  obj_loss: 3.2172 (5.4319)  class_loss: 5.7547 (5.6516)  time: 0.5375  data: 0.0001  max mem: 0
Epoch: [0]  [ 100/1430]  eta: 0:11:26.523765  lr: 0.000101  loss: 11.8562 (16.4655)  box_loss: 4.0006 (6.1806)  obj_loss: 2.8863 (4.7584)  class_loss: 4.9694 (5.5265)  time: 0.5013  data: 0.0001  max mem: 0
...
Epoch: [0] Total time: 0:11:56 (0.5012 s / it)
lr and loss stats: lr: 0.001000  loss: 2.7556 (9.9921)  box_loss: 0.7867 (4.2556)  obj_loss: 1.0168 (2.7375)  class_loss: 0.9520 (2.9990)

补充说明：

在MetricLogger类，如果有想要增加变量，并其的更新信息进行打印，其实只需要进行update即可，也就是调用函数metric_logger.update(lr=now_lr)，这个函数会自动的将参数lr变成一个字典遍历，自动构建一个类来存储其不断更新的信息，部分代码见如下：

class MetricLogger(object):
    def __init__(self, delimiter="\t"):
        # 存储参数变量，且不断更新
        self.meters = defaultdict(SmoothedValue)
        self.delimiter = delimiter
    # 自动保留字典遍历，记录其更新信息
    def update(self, **kwargs):
        for k, v in kwargs.items():
            if isinstance(v, torch.Tensor):
                v = v.item()
            assert isinstance(v, (float, int))
            self.meters[k].update(v)
    # 之后是其他的定义，这里我就不贴上来了
    ...
    # 打印本身返回的内容
    def __str__(self):
        loss_str = []
        for name, meter in self.meters.items():
            loss_str.append(
                "{}: {}".format(name, str(meter))
            )
        return self.delimiter.join(loss_str)
class SmoothedValue(object):
    """Track a series of values and provide access to smoothed values over a
    window or the global series average.
    """
    def __init__(self, window_size=20, fmt=None):
        if fmt is None:
            fmt = "{value:.4f} ({global_avg:.4f})"
        self.deque = deque(maxlen=window_size)  # deque简单理解成加强版list
        self.total = 0.0
        self.count = 0
        self.fmt = fmt
    def update(self, value, n=1):
        self.deque.append(value)
        self.count += n
        self.total += value * n
    def synchronize_between_processes(self):
        """
        Warning: does not synchronize the deque!
        """
        if not is_dist_avail_and_initialized():
            return
        t = torch.tensor([self.count, self.total], dtype=torch.float64, device="cuda")
        dist.barrier()
        dist.all_reduce(t)
        t = t.tolist()
        self.count = int(t[0])
        self.total = t[1]
    @property
    def median(self):  # @property 是装饰器，这里可简单理解为增加median属性(只读)
        d = torch.tensor(list(self.deque))
        return d.median().item()
    @property
    def avg(self):
        d = torch.tensor(list(self.deque), dtype=torch.float32)
        return d.mean().item()
    @property
    def global_avg(self):
        return self.total / self.count
    @property
    def max(self):
        return max(self.deque)
    @property
    def value(self):
        return self.deque[-1]
    def __str__(self):
        return self.fmt.format(
            median=self.median,
            avg=self.avg,
            global_avg=self.global_avg,
            max=self.max,
            value=self.value)

总结：

在验证过程中，保留的变量是model_time, evaluator_time，所以会打印其更新信息；而在训练过程中，保留的变量是loss，box_loss，obj_loss，class_loss，lr，所以会打印其更新信息。而默认的输出信息有：eta，time，data与max mem。详情可以查看上述我贴上来的部分打印信息，对比就可以清楚了解。

3. Tensorboard工具调用

tensorboard是一个可视化工具，可以在网页上动态的查看当前epoch所获得的各项参数。常见的还有另外的一个可视化工具：visdom，对于visdom的安装与简单使用可以查看我之前的博文：visdom安装与基本用法

这里以yolov3spp中的使用过程为例，来介绍tensorboard的使用，具体的使用方法这里就不再细述。在yolov3spp中训练过程的tensorboard使用流程如下所示：

from torch.utils.tensorboard import SummaryWriter
# 构建一个可视化类
tb_writer = SummaryWriter(comment=opt.name)
# 写入tensorboard，可通过网页打开查看相关信息
if tb_writer:
    tags = ['train/giou_loss', 'train/obj_loss', 'train/cls_loss', 'train/loss', "learning_rate",
            "mAP@[IoU=0.50:0.95]", "mAP@[IoU=0.5]", "mAR@[IoU=0.50:0.95]"]
    # 这里的mloss包含了4个部分：边界框损失、置信度损失、分类损失、总损失
    # mloss + [lr, coco_mAP, voc_mAP, coco_mAR] 就一共有八个指标，对应着八个标签tag
    for x, tag in zip(mloss.tolist() + [lr, coco_mAP, voc_mAP, coco_mAR], tags):
        # 以epoch为横坐标，x为竖坐标，tag为标题来进行更新
        tb_writer.add_scalar(tag, x, epoch)

详细的参数定义如下：

def add_scalar(
    self, tag, scalar_value, global_step=None, walltime=None, new_style=False
):
    """Add scalar data to summary.
    Args:
        tag (string): Data identifier
        scalar_value (float or string/blobname): Value to save
        global_step (int): Global step value to record
        walltime (float): Optional override default walltime (time.time())
          with seconds after epoch of event
        new_style (boolean): Whether to use new style (tensor field) or old
          style (simple_value field). New style could lead to faster data loading.
    Examples::
        from torch.utils.tensorboard import SummaryWriter
        writer = SummaryWriter()
        x = range(100)
        for i in x:
            writer.add_scalar('y=2x', i * 2, i)
        writer.close()
    Expected result:
    .. image:: _static/img/tensorboard/add_scalar.png
       :scale: 50 %
    """
    torch._C._log_api_usage_once("tensorboard.logging.add_scalar")
    if self._check_caffe2_blob(scalar_value):
        from caffe2.python import workspace
        scalar_value = workspace.FetchBlob(scalar_value)
    summary = scalar(tag, scalar_value, new_style=new_style)
    self._get_file_writer().add_summary(summary, global_step, walltime)

动态结果查看：

补充：

在yolov3spp项目中，除了通过MetricLogger的工具类来迭代加载想控制台输出信息，用tensorboard向网页输出信息（可以动态的显示整个训练的曲线过程，功能比较多），此外还会使用最普通的txt来存储相关的信息，这个就是最基本的操作了，和logging类似而已。

# 将coco的验证指标保存在一个txt文件当中
with open(results_file, "a") as f:
    # 记录coco的12个指标加上训练总损失和lr
    result_info = [str(round(i, 4)) for i in result_info + [mloss.tolist()[-1]]] + [str(round(lr, 6))]
    txt = "epoch:{} {}".format(epoch, '  '.join(result_info))
    f.write(txt + "\n")

总结：

这篇内容介绍了coco工具类的使用，以及两个日志记录的工具。这些代码是我从项目中截取而来，对于完整的项目流程可以查看我的其他博客，或者自行查看源码。

目标检测的Tricks | 【Trick10】工具类文件调用（coco评价指标包、日志工具、Tensorboard工具...）

1. coco评价指标包调用

2. 日志工具代码调用

3. Tensorboard工具调用

热门文章

最新文章

相关课程

相关电子书

探索云世界

热门

云计算

大数据

云原生

人工智能

数据库

开发与运维

活动广场

任务中心

训练营

直播

乘风者计划

下载

镜像站

技术资料

目标检测的Tricks | 【Trick10】工具类文件调用（coco评价指标包、日志工具、Tensorboard工具...）

1. coco评价指标包调用

2. 日志工具代码调用

3. Tensorboard工具调用

热门文章

最新文章

相关课程

相关电子书