Faster RCNN算法
Faster RCNN把目标检测的4个基本步骤(提取候选框、特征提取、特征分类以及边框回归)统一到一个深度学习模型之中,同时其中的候选区域的生成使用候选区域网络(Region Proposal Network,RPN)取代了Fast RCNN中的SS算法,而特征提取、分类、Bounding-Box回归3个操作依旧沿用Fast RCNN的方法,使得候选区域框的提取和Fast RCNN后端融合在一起,形成了一个完整的卷积神经网络,也是首次真正意义上的端到端的模型。
Faster RCNN算法主要由以下2大模块组成:
1、RPN层进行候选框提取;
2、最后的分类与Bounding Box回归依然沿用Fast RCNN的检测模块,即RoI Pooling和多任务损失函数。
1 算法具体步骤
图1 Faster RCNN模型结构图
图2 Faster RCNN训练流程图
1、首先,原始图像输入卷积神经网络中,得到最后一层卷积层的特征作为后续网络层的输入,该特征分为2路,被后续的RPN层和RoI Pooling层所共享(其中RoI Pooling层是前一篇文章中所说的RoI Pooling层,详情可以参见前一篇文章)。
2、RPN层用于生成候选区域框,每张特征图生成多个候选区域。如果最后一层卷积层生成256个特征图,每张特征图生成300个候选区域,那么RPN层一共产生76800(256*300)个候选区域。其目的是代替在输入图像上进行选择性搜索(SS算法)寻找合适的候选区域框这一个耗时的操作。
3、把RPN层得到的候选区域框作为RoI Pooling层的输入,使得每个候选区域产生固定尺寸的RoI Pooling特征图。
4、最后一步与Fast RCNN一样,利用SoftMax Loss获得分类的概率和Smooth L1 Loss进行边框回归。假设步骤3中产生的RoI特征大小为(32,32,256),经过分类层输出每一个位置上9个候选区域框(anchor)属于前景和背景概率,因此分类层的输出特征为(32,32,(9*2));窗口回归层则输出每一个位置上9个候选区域对应窗口应该平移缩放的参数,因此窗口回归层的输出特征为(32,32,(9*4))。
Faster RCNN网络源码:
class FasterRCNN(GeneralizedRCNN): def __init__(self, backbone, num_classes=None, # transform parameters min_size=800, max_size=1333, image_mean=None, image_std=None, # RPN parameters rpn_anchor_generator=None, rpn_head=None, rpn_pre_nms_top_n_train=2000, rpn_pre_nms_top_n_test=1000, rpn_post_nms_top_n_train=2000, rpn_post_nms_top_n_test=1000, rpn_nms_thresh=0.7, rpn_fg_iou_thresh=0.7, rpn_bg_iou_thresh=0.3, rpn_batch_size_per_image=256, rpn_positive_fraction=0.5, # Box parameters box_roi_pool=None, box_head=None, box_predictor=None, box_score_thresh=0.05, box_nms_thresh=0.5, box_detections_per_img=100, box_fg_iou_thresh=0.5, box_bg_iou_thresh=0.5, box_batch_size_per_image=512, box_positive_fraction=0.25, bbox_reg_weights=None): out_channels = backbone.out_channels if rpn_anchor_generator is None: anchor_sizes = ((32,), (64,), (128,), (256,), (512,)) aspect_ratios = ((0.5, 1.0, 2.0),) * len(anchor_sizes) rpn_anchor_generator = AnchorGenerator( anchor_sizes, aspect_ratios ) if rpn_head is None: rpn_head = RPNHead( out_channels, rpn_anchor_generator.num_anchors_per_location()[0] ) rpn_pre_nms_top_n = dict(training=rpn_pre_nms_top_n_train, testing=rpn_pre_nms_top_n_test) rpn_post_nms_top_n = dict(training=rpn_post_nms_top_n_train, testing=rpn_post_nms_top_n_test) rpn = RegionProposalNetwork( rpn_anchor_generator, rpn_head, rpn_fg_iou_thresh, rpn_bg_iou_thresh, rpn_batch_size_per_image, rpn_positive_fraction, rpn_pre_nms_top_n, rpn_post_nms_top_n, rpn_nms_thresh) if box_roi_pool is None: box_roi_pool = MultiScaleRoIAlign( featmap_names=['0', '1', '2', '3'], output_size=7, sampling_ratio=2) if box_head is None: resolution = box_roi_pool.output_size[0] representation_size = 1024 box_head = TwoMLPHead( out_channels * resolution ** 2, representation_size) if box_predictor is None: representation_size = 1024 box_predictor = FastRCNNPredictor( representation_size, num_classes) roi_heads = RoIHeads( # Box box_roi_pool, box_head, box_predictor, box_fg_iou_thresh, box_bg_iou_thresh, box_batch_size_per_image, box_positive_fraction, bbox_reg_weights, box_score_thresh, box_nms_thresh, box_detections_per_img) if image_mean is None: image_mean = [0.485, 0.456, 0.406] if image_std is None: image_std = [0.229, 0.224, 0.225] transform = GeneralizedRCNNTransform(min_size, max_size, image_mean, image_std) super(FasterRCNN, self).__init__(backbone, rpn, roi_heads, transform)
2 RPN网络
经典的检测方法生成检测框都非常耗时,而Faster RCNN则抛弃了传统的滑动窗口和SS方法,直接使用RPN生成检测框,这也是Faster RCNN的巨大优势,能极大提升检测框的生成速度:
图3 RPN网络结构图
上图展示了RPN网络的具体结构。可以看到RPN网络实际分为2条线,上面一条通过softmax分类anchors获得positive和negative分类,下面一条用于计算对于anchors的bounding box regression偏移量,以获得精确的proposal。而最后的Proposal层则负责综合positive anchors和对应bounding box regression偏移量获取proposals,同时剔除太小和超出边界的proposals。其实整个网络到了Proposal Layer这里,就完成了相当于目标定位的功能。
RPN网络主体代码:
class RegionProposalNetwork(torch.nn.Module): ....... def forward(self, images, features, targets=None): features = list(features.values()) objectness, pred_bbox_deltas = self.head(features) anchors = self.anchor_generator(images, features) num_images = len(anchors) num_anchors_per_level_shape_tensors = [o[0].shape for o in objectness] num_anchors_per_level = [s[0] * s[1] * s[2] for s in num_anchors_per_level_shape_tensors] objectness, pred_bbox_deltas = \ concat_box_prediction_layers(objectness, pred_bbox_deltas) # apply pred_bbox_deltas to anchors to obtain the decoded proposals # note that we detach the deltas because Faster R-CNN do not backprop through # the proposals proposals = self.box_coder.decode(pred_bbox_deltas.detach(), anchors) proposals = proposals.view(num_images, -1, 4) boxes, scores = self.filter_proposals(proposals, objectness, images.image_sizes, num_anchors_per_level) losses = {} if self.training: assert targets is not None labels, matched_gt_boxes = self.assign_targets_to_anchors(anchors, targets) regression_targets = self.box_coder.encode(matched_gt_boxes, anchors) loss_objectness, loss_rpn_box_reg = self.compute_loss( objectness, pred_bbox_deltas, labels, regression_targets) losses = { "loss_objectness": loss_objectness, "loss_rpn_box_reg": loss_rpn_box_reg, } return boxes, losses
RPNHead网络head代码:
class RPNHead(nn.Module): def __init__(self, in_channels, num_anchors): super(RPNHead, self).__init__() self.conv = nn.Conv2d( in_channels, in_channels, kernel_size=3, stride=1, padding=1 ) self.cls_logits = nn.Conv2d(in_channels, num_anchors, kernel_size=1, stride=1) self.bbox_pred = nn.Conv2d( in_channels, num_anchors * 4, kernel_size=1, stride=1 ) def forward(self, x): logits = [] bbox_reg = [] for feature in x: t = F.relu(self.conv(feature)) logits.append(self.cls_logits(t)) bbox_reg.append(self.bbox_pred(t)) return logits, bbox_reg
3 Anchors
所谓anchors,实际上就是一组由RPN生成的矩形。Faster RCNN的RPN网络中每个特征点上会有n个(默认9个)矩形共有3种形状,长宽比为大约为3种,如下图。实际上通过anchors就引入了检测中常用到的多尺度方法。
图4 anchors示意图
如下图,遍历Conv layers计算获得的feature maps,为每一个点都配备这9种anchors作为初始的检测框。其实这样做获得检测框很不准确,但是后面还有2次bounding box regression可以修正检测框位置。
图5 anchors生成图
其实RPN最终就是在原图尺度上,设置了密密麻麻的候选Anchor。然后用cnn去判断哪些Anchor是里面有目标的positive anchor,哪些是没目标的negative anchor。所以,仅仅是个二分类而已!
这里举个例子,假设一张原图尺寸为800*600的图像,通过VGG16下采样16倍,然后再最后输出的Feature Map上面的每一个点设置9个Anchor,所以该图上共有:
图6 Gernerate Anchors图
可以看出总共得到17100个Anchors。
class AnchorGenerator(nn.Module): ...... def generate_anchors(self, scales, aspect_ratios, dtype=torch.float32, device="cpu"): # type: (List[int], List[float], int, Device) # noqa: F821 scales = torch.as_tensor(scales, dtype=dtype, device=device) aspect_ratios = torch.as_tensor(aspect_ratios, dtype=dtype, device=device) h_ratios = torch.sqrt(aspect_ratios) w_ratios = 1 / h_ratios ws = (w_ratios[:, None] * scales[None, :]).view(-1) hs = (h_ratios[:, None] * scales[None, :]).view(-1) base_anchors = torch.stack([-ws, -hs, ws, hs], dim=1) / 2 return base_anchors.round() def set_cell_anchors(self, dtype, device): # type: (int, Device) -> None # noqa: F821 ...... cell_anchors = [ self.generate_anchors( sizes, aspect_ratios, dtype, device ) for sizes, aspect_ratios in zip(self.sizes, self.aspect_ratios) ] self.cell_anchors = cell_anchors
4 Classification
Classification部分利用已经获得的proposal feature maps,通过full connect层与softmax计算每个proposal具体属于那个类别(如人,车,电视等),输出cls_prob概率向量;同时再次利用bounding box regression获得每个proposal的位置偏移量bbox_pred,用于回归更加精确的目标检测框。
图7 Classification部分网络结构图
从RoI Pooling获取到7x7=49大小的proposal feature maps后,送入后续网络,可以看到做了如下2件事:
1、通过全连接和softmax对proposals进行分类;
2、再次对proposals进行bounding box regression,获取更高精度的rect box。
优点:
1、更高的检测精度;
2、采用了RPN网络代替了SS缩放实现真正意义上的端到端模型。