大家好,我是迪菲赫尔曼😁,我最近将本人硕士阶段所有学习的计算机视觉基础知识进行了一个系统性的整理,编写了《目标检测蓝皮书🍀》,共计10 1010篇内容,涵盖从基础知识到论文改进的整个时间线,包含第1 11篇机器学习基础、第2 22篇深度学习基础、第3 33篇卷积神经网络、第4 44篇经典热门网络结构、第5 55篇目标检测基础、第6 66篇网络搭建及训练、第7 77篇模型优化方法及思路、第8 88篇模型超参数调整策略、第9 99篇模型改进技巧、第10 1010篇模型部署基础等,详细的目录大家可以看我的这篇文章:《目标检测蓝皮书》目录,专栏地址:点击跳转,欢迎大家订阅~
1 原理
1.1 SPP(Spatial Pyramid Pooling)
SPP模块是何凯明大神在2015年的论文《Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition》中被提出。
SPP全程为空间金字塔池化结构,主要是为了解决两个问题:
- 有效避免了对图像区域裁剪、缩放操作导致的图像失真等问题;
- 解决了卷积神经网络对图相关重复特征提取的问题,大大提高了产生候选框的速度,且节省了计算成本。
class SPP(nn.Module): # Spatial Pyramid Pooling (SPP) layer https://arxiv.org/abs/1406.4729 def __init__(self, c1, c2, k=(5, 9, 13)): super().__init__() c_ = c1 // 2 # hidden channels self.cv1 = Conv(c1, c_, 1, 1) self.cv2 = Conv(c_ * (len(k) + 1), c2, 1, 1) self.m = nn.ModuleList([nn.MaxPool2d(kernel_size=x, stride=1, padding=x // 2) for x in k]) def forward(self, x): x = self.cv1(x) with warnings.catch_warnings(): warnings.simplefilter('ignore') # suppress torch 1.9.0 max_pool2d() warning return self.cv2(torch.cat([x] + [m(x) for m in self.m], 1))
1.2 SPPF(Spatial Pyramid Pooling - Fast)
这个是YOLOv5作者Glenn Jocher基于SPP提出的,速度较SPP快很多,所以叫SPP-Fast
class SPPF(nn.Module): # Spatial Pyramid Pooling - Fast (SPPF) layer for YOLOv5 by Glenn Jocher def __init__(self, c1, c2, k=5): # equivalent to SPP(k=(5, 9, 13)) super().__init__() c_ = c1 // 2 # hidden channels self.cv1 = Conv(c1, c_, 1, 1) self.cv2 = Conv(c_ * 4, c2, 1, 1) self.m = nn.MaxPool2d(kernel_size=k, stride=1, padding=k // 2) def forward(self, x): x = self.cv1(x) with warnings.catch_warnings(): warnings.simplefilter('ignore') # suppress torch 1.9.0 max_pool2d() warning y1 = self.m(x) y2 = self.m(y1) return self.cv2(torch.cat((x, y1, y2, self.m(y2)), 1))
1.3 SimSPPF(Simplified SPPF)
美团YOLOv6提出的模块,感觉和SPPF只差了一个激活函数,简单测试了一下,单个ConvBNReLU速度要比ConvBNSiLU快18%
class SimConv(nn.Module): '''Normal Conv with ReLU activation''' def __init__(self, in_channels, out_channels, kernel_size, stride, groups=1, bias=False): super().__init__() padding = kernel_size // 2 self.conv = nn.Conv2d( in_channels, out_channels, kernel_size=kernel_size, stride=stride, padding=padding, groups=groups, bias=bias, ) self.bn = nn.BatchNorm2d(out_channels) self.act = nn.ReLU() def forward(self, x): return self.act(self.bn(self.conv(x))) def forward_fuse(self, x): return self.act(self.conv(x)) class SimSPPF(nn.Module): '''Simplified SPPF with ReLU activation''' def __init__(self, in_channels, out_channels, kernel_size=5): super().__init__() c_ = in_channels // 2 # hidden channels self.cv1 = SimConv(in_channels, c_, 1, 1) self.cv2 = SimConv(c_ * 4, out_channels, 1, 1) self.m = nn.MaxPool2d(kernel_size=kernel_size, stride=1, padding=kernel_size // 2) def forward(self, x): x = self.cv1(x) with warnings.catch_warnings(): warnings.simplefilter('ignore') y1 = self.m(x) y2 = self.m(y1) return self.cv2(torch.cat([x, y1, y2, self.m(y2)], 1))
1.4 ASPP(Atrous Spatial Pyramid Pooling)
受到SPP的启发,语义分割模型DeepLabv2中提出了ASPP模块(空洞空间卷积池化金字塔),该模块使用具有不同采样率的多个并行空洞卷积层。为每个采样率提取的特征在单独的分支中进一步处理,并融合以生成最终结果。该模块通过不同的空洞率构建不同感受野的卷积核,用来获取多尺度物体信息,具体结构比较简单如下图所示:
ASPP是在DeepLab中提出来的,在后续的DeepLab版本中对其做了改进,如加入BN层、加入深度可分离卷积等,但基本的思路还是没变。
# without BN version class ASPP(nn.Module): def __init__(self, in_channel=512, out_channel=256): super(ASPP, self).__init__() self.mean = nn.AdaptiveAvgPool2d((1, 1)) # (1,1)means ouput_dim self.conv = nn.Conv2d(in_channel,out_channel, 1, 1) self.atrous_block1 = nn.Conv2d(in_channel, out_channel, 1, 1) self.atrous_block6 = nn.Conv2d(in_channel, out_channel, 3, 1, padding=6, dilation=6) self.atrous_block12 = nn.Conv2d(in_channel, out_channel, 3, 1, padding=12, dilation=12) self.atrous_block18 = nn.Conv2d(in_channel, out_channel, 3, 1, padding=18, dilation=18) self.conv_1x1_output = nn.Conv2d(out_channel * 5, out_channel, 1, 1) def forward(self, x): size = x.shape[2:] image_features = self.mean(x) image_features = self.conv(image_features) image_features = F.upsample(image_features, size=size, mode='bilinear') atrous_block1 = self.atrous_block1(x) atrous_block6 = self.atrous_block6(x) atrous_block12 = self.atrous_block12(x) atrous_block18 = self.atrous_block18(x) net = self.conv_1x1_output(torch.cat([image_features, atrous_block1, atrous_block6, atrous_block12, atrous_block18], dim=1)) return net