需要源码请点赞关注收藏后评论区留言私信~~~
一、目标检测的概念
目标检测是计算机视觉和数字图像处理的一个热门方向,广泛应用于机器人导航、智能视频监控、工业检测、航空航天等诸多领域,通过计算机视觉减少对人力资本的消耗,具有重要的现实意义。因此,目标检测也就成为了近年来理论和应用的研究热点,它是图像处理和计算机视觉学科的重要分支,也是智能监控系统的核心部分,同时目标检测也是泛身份识别领域的一个基础性的算法,对后续的人脸识别、步态识别、人群计数、实例分割等任务起着至关重要的作用。
目标检测的任务是找出图像中所有感兴趣的目标,并确定它们的位置和类别,由于各类物体有不同的形状,姿态,加上成像时受光照,遮挡等因素的干扰,目标检测一直是计算机视觉领域最严峻的挑战之一
二、目标检测算法评价指标
目标检测需要预测出目标的具体位置以及目标类别,对于一个目标是否检测正确,首先要确定预测类别置信度是否达到阈值,之后确定预测框与实际框的重合度大小是否超过规定阈值,针对重合度的定义,通常采用IoU来代表,IoU是指对目标预测框与实际框之间的交集面积与两个框之间并集面积之比,IoU越大表示预测框与实际框之间重合度越高 检测得越准确
准确率为对于某个预测类别来说,预测正确的框占所有预测框的比例,而召回率为对于某个预测类别来说,预测正确的框占所有真实框的比例,两个指标计算方法如下
其中TP表示正确预测到的正样本数量,FP表示错误预测的正样本数量,FN表示错误预测的负真实样本数量
AP表示平均精准度,简单来说就是对PR曲线上的Precision值求均值,对于PR曲线来说,我们使用积分来进行计算
在实际应用中,我们并不直接对该PR曲线进行计算,而是对PR曲线进行平滑处理,即对PR曲线上的每个点,Precision的值取该点右侧最大的Precision的值
深度卷积神经网络目标检测算法性能对比如下
检测框架 |
mAP |
FPS |
R-FCN |
79.4 |
7 |
Faster R-CNN |
76.4 |
5 |
SSD500 |
76.8 |
19 |
YOLO |
63.4 |
45 |
YOLO v2 |
78.6 |
40 |
YOLO v3 |
82.3 |
39 |
三、目标检测项目实战
用到的训练集为VOC数据集,效果展示如下
数字代表了检测物体在图片中的相对坐标
分类如下
四、代码
项目结构如下
keras_frcnn文件夹里面存放着实现Faster R-CNN所用的各种类和方法
下面testing文件夹里面放着测试代码
部分代码如下 需要全部代码请点赞关注收藏后评论区私信
from keras.layers import Layer import keras.backend as K if K.backend() == 'tensorflow': import tensorflow as tf class RoiPoolingConv(Layer): '''ROI pooling layer for 2D inputs. See Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition, K. He, X. Zhang, S. Ren, J. Sun # Arguments pool_size: int Size of pooling region to use. pool_size = 7 will result in a 7x7 region. num_rois: number of regions of interest to be used # Input shape list of two 4D tensors [X_img,X_roi] with shape: X_img: `(1, channels, rows, cols)` if dim_ordering='th' or 4D tensor with shape: `(1, rows, cols, channels)` if dim_ordering='tf'. X_roi: `(1,num_rois,4)` list of rois, with ordering (x,y,w,h) # Output shape 3D tensor with shape: `(1, num_rois, channels, pool_size, pool_size)` ''' def __init__(self, pool_size, num_rois, **kwargs): self.dim_ordering = K.common.image_dim_ordering() assert self.dim_ordering in {'tf', 'th'}, 'dim_ordering must be in {tf, th}' self.pool_size = pool_size self.num_rois = num_rois super(RoiPoolingConv, self).__init__(**kwargs) def build(self, input_shape): if self.dim_ordering == 'th': self.nb_channels = input_shape[0][1] elif self.dim_ordering == 'tf': self.nb_channels = input_shape[0][3] def compute_output_shape(self, input_shape): if self.dim_ordering == 'th': return None, self.num_rois, self.nb_channels, self.pool_size, self.pool_size else: return None, self.num_rois, self.pool_size, self.pool_size, self.nb_channels def call(self, x, mask=None): assert(len(x) == 2) img = x[0] rois = x[1] input_shape = K.shape(img) outputs = [] for roi_idx in range(self.num_rois): x = rois[0, roi_idx, 0] y = rois[0, roi_idx, 1] w = rois[0, roi_idx, 2] h = rois[0, roi_idx, 3] row_length = w / float(self.pool_size) col_length = h / float(self.pool_size) num_pool_regions = self.pool_size #NOTE: the RoiPooling implementation differs between theano and tensorflow due to the lack of a resize op # in theano. The theano implementation is much less efficient and leads to long compile times if self.dim_ordering == 'th': for jy in range(num_pool_regions): for ix in range(num_pool_regions): x1 = x + ix * row_length x2 = x1 + row_length y1 = y + jy * col_length y2 = y1 + col_length x1 = K.cast(x1, 'int32') x2 = K.cast(x2, 'int32') y1 = K.cast(y1, 'int32') y2 = K.cast(y2, 'int32') x2 = x1 + K.maximum(1,x2-x1) y2 = y1 + K.maximum(1,y2-y1) new_shape = [input_shape[0], input_shape[1], y2 - y1, x2 - x1] x_crop = img[:, :, y1:y2, x1:x2] xm = K.reshape(x_crop, new_shape) pooled_val = K.max(xm, axis=(2, 3)) outputs.append(pooled_val) elif self.dim_ordering == 'tf': x = K.cast(x, 'int32') y = K.cast(y, 'int32') w = K.cast(w, 'int32') h = K.cast(h, 'int32') rs = tf.image.resize(img[:, y:y+h, x:x+w, :], (self.pool_size, self.pool_size)) outputs.append(rs) final_output = K.concatenate(outputs, axis=0) final_output = K.reshape(final_output, (1, self.num_rois, self.pool_size, self.pool_size, self.nb_channels)) if self.dim_ordering == 'th': final_output = K.permute_dimensions(final_output, (0, 1, 4, 2, 3)) else: final_output = K.permute_dimensions(final_output, (0, 1, 2, 3, 4)) return final_output def get_config(self): config = {'pool_size': self.pool_size, 'num_rois': self.num_rois} base_config = super(RoiPoolingConv, self).get_config() return dict(list(base_config.items()) + list(config.items()))
数据生成器代码如下
from __future__ import absolute_import import numpy as np import cv2 import random import copy from . import data_augment import threading import itertools def union(au, bu, area_intersection): area_a = (au[2] - au[0]) * (au[3] - au[1]) area_b = (bu[2] - bu[0]) * (bu[3] - bu[1]) area_union = area_a + area_b - area_intersection return area_union def intersection(ai, bi): x = max(ai[0], bi[0]) y = max(ai[1], bi[1]) w = min(ai[2], bi[2]) - x h = min(ai[3], bi[3]) - y if w < 0 or h < 0: return 0 return w*h def iou(a, b): # a and b should be (x1,y1,x2,y2) if a[0] >= a[2] or a[1] >= a[3] or b[0] >= b[2] or b[1] >= b[3]: return 0.0 area_i = intersection(a, b) area_u = union(a, b, area_i) return float(area_i) / float(area_u + 1e-6) def get_new_img_size(width, height, img_min_side=600): if width <= height: f = float(img_min_side) / width resized_height = int(f * height) resized_width = img_min_side else: f = float(img_min_side) / height resized_width = int(f * width) resized_height = img_min_side return resized_width, resized_height class SampleSelector: def __init__(self, class_count): # ignore classes that have zero samples self.classes = [b for b in class_count.keys() if class_count[b] > 0] self.class_cycle = itertools.cycle(self.classes) self.curr_class = next(self.class_cycle) def skip_sample_for_balanced_class(self, img_data): class_in_img = False for bbox in img_data['bboxes']: cls_name = bbox['class'] if cls_name == self.curr_class: class_in_img = True self.curr_class = next(self.class_cycle) break if class_in_img: return False else: return True def calc_rpn(C, img_data, width, height, resized_width, resized_height, img_length_calc_function): downscale = float(C.rpn_stride) anchor_sizes = C.anchor_box_scales anchor_ratios = C.anchor_box_ratios num_anchors = len(anchor_sizes) * len(anchor_ratios) # calculate the output map size based on the network architecture (output_width, output_height) = img_length_calc_function(resized_width, resized_height) n_anchratios = len(anchor_ratios) # initialise empty output objectives y_rpn_overlap = np.zeros((output_height, output_width, num_anchors)) y_is_box_valid = np.zeros((output_height, output_width, num_anchors)) y_rpn_regr = np.zeros((output_height, output_width, num_anchors * 4)) num_bboxes = len(img_data['bboxes']) num_anchors_for_bbox = np.zeros(num_bboxes).astype(int) best_anchor_for_bbox = -1*np.ones((num_bboxes, 4)).astype(int) best_iou_for_bbox = np.zeros(num_bboxes).astype(np.float32) best_x_for_bbox = np.zeros((num_bboxes, 4)).astype(int) best_dx_for_bbox = np.zeros((num_bboxes, 4)).astype(np.float32) # get the GT box coordinates, and resize to account for image resizing gta = np.zeros((num_bboxes, 4)) for bbox_num, bbox in enumerate(img_data['bboxes']): # get the GT box coordinates, and resize to account for image resizing gta[bbox_num, 0] = bbox['x1'] * (resized_width / float(width)) gta[bbox_num, 1] = bbox['x2'] * (resized_width / float(width)) gta[bbox_num, 2] = bbox['y1'] * (resized_height / float(height)) gta[bbox_num, 3] = bbox['y2'] * (resized_height / float(height)) # rpn ground truth for anchor_size_idx in range(len(anchor_sizes)): for anchor_ratio_idx in range(n_anchratios): anchor_x = anchor_sizes[anchor_size_idx] * anchor_ratios[anchor_ratio_idx][0] anchor_y = anchor_sizes[anchor_size_idx] * anchor_ratios[anchor_ratio_idx][1] for ix in range(output_width): # x-coordinates of the current anchor box x1_anc = downscale * (ix + 0.5) - anchor_x / 2 x2_anc = downscale * (ix + 0.5) + anchor_x / 2 # ignore boxes that go across image boundaries if x1_anc < 0 or x2_anc > resized_width: continue for jy in range(output_height): # y-coordinates of the current anchor box y1_anc = downscale * (jy + 0.5) - anchor_y / 2 y2_anc = downscale * (jy + 0.5) + anchor_y / 2 # ignore boxes that go across image boundaries if y1_anc < 0 or y2_anc > resized_height: continue # bbox_type indicates whether an anchor should be a target bbox_type = 'neg' # this is the best IOU for the (x,y) coord and the current anchor # note that this is different from the best IOU for a GT bbox best_iou_for_loc = 0.0 for bbox_num in range(num_bboxes): # get IOU of the current GT box and the current anchor box curr_iou = iou([gta[bbox_num, 0], gta[bbox_num, 2], gta[bbox_num, 1], gta[bbox_num, 3]], [x1_anc, y1_anc, x2_anc, y2_anc]) # calculate the regression targets if they will be needed if curr_iou > best_iou_for_bbox[bbox_num] or curr_iou > C.rpn_max_overlap: cx = (gta[bbox_num, 0] + gta[bbox_num, 1]) / 2.0 cy = (gta[bbox_num, 2] + gta[bbox_num, 3]) / 2.0 cxa = (x1_anc + x2_anc)/2.0 cya = (y1_anc + y2_anc)/2.0 tx = (cx - cxa) / (x2_anc - x1_anc) ty = (cy - cya) / (y2_anc - y1_anc) tw = np.log((gta[bbox_num, 1] - gta[bbox_num, 0]) / (x2_anc - x1_anc)) th = np.log((gta[bbox_num, 3] - gta[bbox_num, 2]) / (y2_anc - y1_anc)) if img_data['bboxes'][bbox_num]['class'] != 'bg': # all GT boxes should be mapped to an anchor box, so we keep track of which anchor box was best if curr_iou > best_iou_for_bbox[bbox_num]: best_anchor_for_bbox[bbox_num] = [jy, ix, anchor_ratio_idx, anchor_size_idx] best_iou_for_bbox[bbox_num] = curr_iou best_x_for_bbox[bbox_num,:] = [x1_anc, x2_anc, y1_anc, y2_anc] best_dx_for_bbox[bbox_num,:] = [tx, ty, tw, th] # we set the anchor to positive if the IOU is >0.7 (it does not matter if there was another better box, it just indicates overlap) if curr_iou > C.rpn_max_overlap: bbox_type = 'pos' num_anchors_for_bbox[bbox_num] += 1 # we update the regression layer target if this IOU is the best for the current (x,y) and anchor position if curr_iou > best_iou_for_loc: best_iou_for_loc = curr_iou best_regr = (tx, ty, tw, th) # if the IOU is >0.3 and <0.7, it is ambiguous and no included in the objective if C.rpn_min_overlap < curr_iou < C.rpn_max_overlap: # gray zone between neg and pos if bbox_type != 'pos': bbox_type = 'neutral' # turn on or off outputs depending on IOUs if bbox_type == 'neg': y_is_box_valid[jy, ix, anchor_ratio_idx + n_anchratios * anchor_size_idx] = 1 y_rpn_overlap[jy, ix, anchor_ratio_idx + n_anchratios * anchor_size_idx] = 0 elif bbox_type == 'neutral': y_is_box_valid[jy, ix, anchor_ratio_idx + n_anchratios * anchor_size_idx] = 0 y_rpn_overlap[jy, ix, anchor_ratio_idx + n_anchratios * anchor_size_idx] = 0 elif bbox_type == 'pos': y_is_box_valid[jy, ix, anchor_ratio_idx + n_anchratios * anchor_size_idx] = 1 y_rpn_overlap[jy, ix, anchor_ratio_idx + n_anchratios * anchor_size_idx] = 1 start = 4 * (anchor_ratio_idx + n_anchratios * anchor_size_idx) y_rpn_regr[jy, ix, start:start+4] = best_regr # we ensure that every bbox has at least one positive RPN region for idx in range(num_anchors_for_bbox.shape[0]): if num_anchors_for_bbox[idx] == 0: # no box with an IOU greater than zero ... if best_anchor_for_bbox[idx, 0] == -1: continue y_is_box_valid[ best_anchor_for_bbox[idx,0], best_anchor_for_bbox[idx,1], best_anchor_for_bbox[idx,2] + n_anchratios * best_anchor_for_bbox[idx,3]] = 1 y_rpn_overlap[ best_anchor_for_bbox[idx,0], best_anchor_for_bbox[idx,1], best_anchor_for_bbox[idx,2] + n_anchratios * best_anchor_for_bbox[idx,3]] = 1 start = 4 * (best_anchor_for_bbox[idx,2] + n_anchratios * best_anchor_for_bbox[idx,3]) y_rpn_regr[ best_anchor_for_bbox[idx,0], best_anchor_for_bbox[idx,1], start:start+4] = best_dx_for_bbox[idx, :] y_rpn_overlap = np.transpose(y_rpn_overlap, (2, 0, 1)) y_rpn_overlap = np.expand_dims(y_rpn_overlap, axis=0) y_is_box_valid = np.transpose(y_is_box_valid, (2, 0, 1)) y_is_box_valid = np.expand_dims(y_is_box_valid, axis=0) y_rpn_regr = np.transpose(y_rpn_regr, (2, 0, 1)) y_rpn_regr = np.expand_dims(y_rpn_regr, axis=0) pos_locs = np.where(np.logical_and(y_rpn_overlap[0, :, :, :] == 1, y_is_box_valid[0, :, :, :] == 1)) neg_locs = np.where(np.logical_and(y_rpn_overlap[0, :, :, :] == 0, y_is_box_valid[0, :, :, :] == 1)) num_pos = len(pos_locs[0]) # one issue is that the RPN has many more negative than positive regions, so we turn off some of the negative # regions. We also limit it to 256 regions. num_regions = 256 if len(pos_locs[0]) > num_regions/2: val_locs = random.sample(range(len(pos_locs[0])), len(pos_locs[0]) - num_regions/2) y_is_box_valid[0, pos_locs[0][val_locs], pos_locs[1][val_locs], pos_locs[2][val_locs]] = 0 num_pos = num_regions/2 if len(neg_locs[0]) + num_pos > num_regions: val_locs = random.sample(range(len(neg_locs[0])), len(neg_locs[0]) - num_pos) y_is_box_valid[0, neg_locs[0][val_locs], neg_locs[1][val_locs], neg_locs[2][val_locs]] = 0 y_rpn_cls = np.concatenate([y_is_box_valid, y_rpn_overlap], axis=1) y_rpn_regr = np.concatenate([np.repeat(y_rpn_overlap, 4, axis=1), y_rpn_regr], axis=1) return np.copy(y_rpn_cls), np.copy(y_rpn_regr) class threadsafe_iter: """Takes an iterator/generator and makes it thread-safe by serializing call to the `next` method of given iterator/generator. """ def __init__(self, it): self.it = it self.lock = threading.Lock() def __iter__(self): return self def next(self): with self.lock: return next(self.it) def threadsafe_generator(f): """A decorator that takes a generator function and makes it thread-safe. """ def g(*a, **kw): return threadsafe_iter(f(*a, **kw)) return g def get_anchor_gt(all_img_data, class_count, C, img_length_calc_function, backend, mode='train'): # The following line is not useful with Python 3.5, it is kept for the legacy # all_img_data = sorted(all_img_data) sample_selector = SampleSelector(class_count) while True: if mode == 'train': np.random.shuffle(all_img_data) for img_data in all_img_data: try: if C.balanced_classes and sample_selector.skip_sample_for_balanced_class(img_data): continue # read in image, and optionally add augmentation if mode == 'train': img_data_aug, x_img = data_augment.augment(img_data, C, augment=True) else: img_data_aug, x_img = data_augment.augment(img_data, C, augment=False) (width, height) = (img_data_aug['width'], img_data_aug['height']) (rows, cols, _) = x_img.shape assert cols == width assert rows == height # get image dimensions for resizing (resized_width, resized_height) = get_new_img_size(width, height, C.im_size) # resize the image so that smalles side is length = 600px x_img = cv2.resize(x_img, (resized_width, resized_height), interpolation=cv2.INTER_CUBIC) try: y_rpn_cls, y_rpn_regr = calc_rpn(C, img_data_aug, width, height, resized_width, resized_height, img_length_calc_function) except: continue # Zero-center by mean pixel, and preprocess image x_img = x_img[:,:, (2, 1, 0)] # BGR -> RGB x_img = x_img.astype(np.float32) x_img[:, :, 0] -= C.img_channel_mean[0] x_img[:, :, 1] -= C.img_channel_mean[1] x_img[:, :, 2] -= C.img_channel_mean[2] x_img /= C.img_scaling_factor x_img = np.transpose(x_img, (2, 0, 1)) x_img = np.expand_dims(x_img, axis=0) y_rpn_regr[:, y_rpn_regr.shape[1]//2:, :, :] *= C.std_scaling if backend == 'tf': x_img = np.transpose(x_img, (0, 2, 3, 1)) y_rpn_cls = np.transpose(y_rpn_cls, (0, 2, 3, 1)) y_rpn_regr = np.transpose(y_rpn_regr, (0, 2, 3, 1)) yield np.copy(x_img), [np.copy(y_rpn_cls), np.copy(y_rpn_regr)], img_data_aug except Exception as e: print(e) continue
创作不易 觉得有帮助请点赞关注收藏~~~