【论文复现】针对yoloV5-L部分的YoloBody部分重构(Slim-neck by GSConv)

简介: 【论文复现】针对yoloV5-L部分的YoloBody部分重构(Slim-neck by GSConv)

前言

  本文将着重于实操讲解,对于yolov5-l的网络主题部分进行重构,因此将省略掉原理部分的讲解。若读者对原理部分感兴趣的话可以参考论文【1】中的方法进行构建网络主体部分。本文中的网络主体架构是从bubbliiiing的 yoloV5 中剥离出来的结构。

  结合了论文中的流程参数图以及bubbliiiing的 yoloV5 的YoloBody部分进行重构,大家可以自行更换掉YoloBody部分代码进行实验。代码在结尾部分。

函数拆解

  下图中的左侧表格中为论文中提出的方法,右侧为官方原版的yoloV5中的YoloBody部分,分析左右两表中的差异,我们可以根据论文中的连接,将GSConv和VoV-GSCSP提取出来进行实验分析使用方法以及作用(SPPF&Concat&Upsample函数作用相同):

image.png

GSConv函数

python

复制代码

import torch
import torch.nn as nn
import torch.nn.functional as F
def autopad(k, p=None):  # kernel, padding
    # Pad to 'same'
    if p is None:
        p = k // 2 if isinstance(k, int) else [x // 2 for x in k]  # auto-pad
    return p
class Mish(nn.Module):
    def __init__(self):
        super().__init__()
    def forward(self,x):
        x = x * (torch.tanh(F.softplus(x)))
        return x
class Conv(nn.Module):
    # Standard convolution
    def __init__(self, c1, c2, k=1, s=1, p=None, g=1, act=True):  # ch_in, ch_out, kernel, stride, padding, groups
        super().__init__()
        self.conv = nn.Conv2d(c1, c2, k, s, autopad(k, p), groups=g, bias=False)
        self.bn = nn.BatchNorm2d(c2)
        self.act = Mish() if act else nn.Identity()
    def forward(self, x):
        return self.act(self.bn(self.conv(x)))
    def forward_fuse(self, x):
        return self.act(self.conv(x))
class GSConv(nn.Module):
    # GSConv https://github.com/AlanLi1997/slim-neck-by-gsconv
    def __init__(self, c1, c2, k=1, s=1, g=1, act=True):
        super().__init__()
        c_ = c2 // 2
        self.cv1 = Conv(c1, c_, k, s, None, g, act)
        self.cv2 = Conv(c_, c_, 5, 1, None, c_, act)
    def forward(self, x):
        x1 = self.cv1(x)
        x2 = torch.cat((x1, self.cv2(x1)), 1)
        # shuffle
        # y = x2.reshape(x2.shape[0], 2, x2.shape[1] // 2, x2.shape[2], x2.shape[3])
        # y = y.permute(0, 2, 1, 3, 4)
        # return y.reshape(y.shape[0], -1, y.shape[3], y.shape[4])
        b, n, h, w = x2.data.size()
        b_n = b * n // 2
        y = x2.reshape(b_n, 2, h * w)
        y = y.permute(1, 0, 2)
        y = y.reshape(2, -1, n // 2, h, w)
        return torch.cat((y[0], y[1]), 1)
if __name__ == "__main__":
    base_channels = 64
    P3Shape = (1, 256, 80, 80)
    P3 = torch.ones(P3Shape)
    gsc = GSConv(base_channels * 4, base_channels * 4, 3, 2)
    P = gsc(P3)
    print(P.shape)

  通过上述的代码我们可以得到在GSConv的输入参数这边输入的维度为256,当k=3,s=2时,H和S将会降一半;当H和S为默认的1时则不变。

VoV-GSCSP函数

  VoV-GSCSP函数是建立在GSConv函数上演变而来的,我们结合论文中的流程参数表可知,在VoV-GSCSP函数中我们仅需要确保输出与输出即可。

ini

复制代码

class GSBottleneck(nn.Module):
    # GS Bottleneck https://github.com/AlanLi1997/slim-neck-by-gsconv
    def __init__(self, c1, c2, k=3, s=1, e=0.5):
        super().__init__()
        c_ = int(c2*e)
        # for lighting
        self.conv_lighting = nn.Sequential(
            GSConv(c1, c_, 1, 1),
            GSConv(c_, c2, 3, 1, act=False))
        self.shortcut = Conv(c1, c2, 1, 1, act=False)
    def forward(self, x):
        return self.conv_lighting(x) + self.shortcut(x)
class VoVGSCSP(nn.Module):
    # VoVGSCSP module with GSBottleneck
    def __init__(self, c1, c2, n=1, shortcut=True, g=1, e=0.5):
        super().__init__()
        c_ = int(c2 * e)  # hidden channels
        self.cv1 = Conv(c1, c_, 1, 1)
        self.cv2 = Conv(c1, c_, 1, 1)
        # self.gc1 = GSConv(c_, c_, 1, 1)
        # self.gc2 = GSConv(c_, c_, 1, 1)
        # self.gsb = GSBottleneck(c_, c_, 1, 1)
        self.gsb = nn.Sequential(*(GSBottleneck(c_, c_, e=1.0) for _ in range(n)))
        self.res = Conv(c_, c_, 3, 1, act=False)
        self.cv3 = Conv(2 * c_, c2, 1)  #
    def forward(self, x):
        x1 = self.gsb(self.cv1(x))
        y = self.cv2(x)
        return self.cv3(torch.cat((y, x1), dim=1))
if __name__ == "__main__":
    base_channels = 64
    P3Shape = (1, 256, 80, 80)
    P3 = torch.ones(P3Shape)
    VOV = VoVGSCSP(base_channels * 4, base_channels * 4)
    P = VOV(P3)

YoloBady构造

  由上述的网络基础函数部分的构建结合yoloV5经过 backbone 层后得到的feat1、feat1和 feat1(也即:P5、P4和P3)到最后网络的输出部分,可以得到从输入部分的H和W是保持同P层相同,那么可以结合论文中的流程参数图以及输入输出的关系得到如下的流程参数图。

image.png

  根据上文中流程表格参数图,我在重构的过程中省略了S=3的设定(本人的显存不够),另外根据论文中的流程示意图,我暂时无法没有找到下图中的红圈部分的参数,若大家需要可以自行添加,输入输出保持不变即可。

image.png

image.png

YoloBody部分

python

复制代码

from torch import nn
from nets.CSPdarknet import CSPDarknet, SPPF, Concat, GSConv, VoVGSCSP
class YoloBody(nn.Module):
    def __init__(self, anchors_mask, num_classes, phi, pretrained=False, input_shape=[640, 640]):
        super(YoloBody, self).__init__()
        depth_dict = {'n': 0.33, 's': 0.33, 'm': 0.67, 'l': 1.00, 'x': 1.33, }
        width_dict = {'n': 0.25, 's': 0.50, 'm': 0.75, 'l': 1.00, 'x': 1.25, }
        dep_mul, wid_mul = depth_dict[phi], width_dict[phi]
        base_channels = int(wid_mul * 64)  # 64
        base_depth = max(round(dep_mul * 3), 1)  # 3
        # -----------------------------------------------#
        #   输入图片是640, 640, 3
        #   初始的基本通道是64
        # -----------------------------------------------#
        self.backbone = CSPDarknet(base_channels, base_depth, phi, pretrained)
        self.upsample = nn.Upsample(scale_factor=2, mode="nearest")
        self.concat = Concat(dimension=1)
        self.SPPF = SPPF(base_channels * 16, base_channels * 16)  # 1024 ---> 1024
        self.P5GSConv = GSConv(base_channels * 16, base_channels * 8)  # 1,1024,20,20 ---> 1,512,20,20
        self.P4VoV = VoVGSCSP(base_channels * 16, base_channels * 8)  # 1,512,40,40 ---> 1,1024,40,40
        """
        self.P4VoV = nn.Sequential(VoVGSCSP(base_channels * 16, base_channels * 8),
                           VoVGSCSP(base_channels * 8, base_channels * 8),
                           VoVGSCSP(base_channels * 8, base_channels * 8))
        """
        self.P4GSConv = GSConv(base_channels * 8, base_channels * 4)  # 1,512,40,40 ---> 1,256,40,40
        self.Head1VoV = VoVGSCSP(base_channels * 8, base_channels * 4)  # 1,512,80,80 ---> 1,256,80,80
        """
        self.Head1VoV = nn.Sequential(VoVGSCSP(base_channels * 8, base_channels * 4),
                                      VoVGSCSP(base_channels * 4, base_channels * 4),
                                      VoVGSCSP(base_channels * 4, base_channels * 4))
        """
        self.P3GSConv = GSConv(base_channels * 4, base_channels * 4, 3, 2)  # 1,256,80,80 ---> 1,256,40,40
        self.Head2VoV = VoVGSCSP(base_channels * 8, base_channels * 8)  # 1,512,40,40 ---> 1,512,40,40
        """
        self.Head2VoV = nn.Sequential(VoVGSCSP(base_channels * 8, base_channels * 8),
                              VoVGSCSP(base_channels * 8, base_channels * 8),
                              VoVGSCSP(base_channels * 8, base_channels * 8))
        """
        self.Head2GSConv = GSConv(base_channels * 8, base_channels * 8, 3, 2)  # 1,512,40,40 ---> 1,512,20,20
        self.Head3VoV = VoVGSCSP(base_channels * 16, base_channels * 16)  # 1,1024,20,20 ---> 1,1024,20,20
        """
        self.Head3VoV = nn.Sequential(VoVGSCSP(base_channels * 16, base_channels * 16),
                                        VoVGSCSP(base_channels * 16, base_channels * 16),
                                        VoVGSCSP(base_channels * 16, base_channels * 16))
        
        """
        self.yolo_head_P3 = nn.Conv2d(base_channels * 4, len(anchors_mask[2]) * (5 + num_classes), 1)
        self.yolo_head_P4 = nn.Conv2d(base_channels * 8, len(anchors_mask[1]) * (5 + num_classes), 1)
        self.yolo_head_P5 = nn.Conv2d(base_channels * 16, len(anchors_mask[0]) * (5 + num_classes), 1)
    def forward(self, x):
        P3, P4, P5 = self.backbone(x)
        P5 = self.SPPF(P5)
        P5 = self.P5GSConv(P5)
        P5_Up = self.upsample(P5)
        P4 = self.concat([P4, P5_Up])
        P4 = self.P4VoV(P4)
        P4 = self.P4GSConv(P4)
        P4_Up = self.upsample(P4)
        P3 = self.concat([P3, P4_Up])
        head1 = self.Head1VoV(P3)
        P3 = self.P3GSConv(head1)
        P34_Cat = self.concat([P3, P4])
        head2 = self.Head2VoV(P34_Cat)
        PHG = self.Head2GSConv(head2)
        PHG_Cat = self.concat([PHG, P5])
        head3 = self.Head3VoV(PHG_Cat)
        Out1 = self.yolo_head_P3(head1)  # 1,255,80,80
        Out2 = self.yolo_head_P4(head2)  # 1,255,40,40
        Out3 = self.yolo_head_P5(head3)  # 1,255,20,20
        return Out3, Out2, Out1
# if __name__ == "__main__":
#     anchors_mask = [[6, 7, 8], [3, 4, 5], [0, 1, 2]]
#     num_classes = 80
#     phi = 's'
#     model = YoloBody(anchors_mask, num_classes, phi, pretrained=False)
#     x = torch.ones((1, 3, 640, 640))
#     Out3, Out2, Out1 = model(x)
#     print()

结尾

  由于本人能力有限若文中有纰漏还请多多指正,感谢大家的阅读,希望本文对大家有所帮助,需要代码可以进入我的仓库自取。结合bubbliiiing的代码和我重构的YoloBady即可。

文中的参数流程图可私信获取原版PPT图

参考:

[1] 《Slim-neck by GSConv: A better design paradigm of detector architectures for autonomous vehicles》 [2] github.com/alanli1997/…

[3] link.juejin.cn/?target=htt…


相关文章
|
6月前
|
计算机视觉
【论文复现】经典再现:yolov4的主干网络重构(结合Slim-neck by GSConv)
【论文复现】经典再现:yolov4的主干网络重构(结合Slim-neck by GSConv)
130 0
【论文复现】经典再现:yolov4的主干网络重构(结合Slim-neck by GSConv)
|
机器学习/深度学习 编解码 算法
yolo原理系列——yolov1--yolov5详细解释
yolo原理系列——yolov1--yolov5详细解释
1211 0
yolo原理系列——yolov1--yolov5详细解释
|
PyTorch 算法框架/工具
ShuffleNet v2网络结构复现(Pytorch版)
ShuffleNet v2网络结构复现(Pytorch版)
ShuffleNet v2网络结构复现(Pytorch版)
|
PyTorch 算法框架/工具
GoogLeNet InceptionV1代码复现+超详细注释(PyTorch)
GoogLeNet InceptionV1代码复现+超详细注释(PyTorch)
323 0
|
PyTorch 算法框架/工具 机器学习/深度学习
GoogLeNet InceptionV3代码复现+超详细注释(PyTorch)
GoogLeNet InceptionV3代码复现+超详细注释(PyTorch)
373 0
|
人工智能 自动驾驶 安全
YOLO v8!| 附教程+代码 以及 vs YOLOv6 v3.0
YOLO v8!| 附教程+代码 以及 vs YOLOv6 v3.0
|
计算机视觉
【YOLOV5-6.x讲解】YOLO5.0VS6.0版本对比+模型设计
【YOLOV5-6.x讲解】YOLO5.0VS6.0版本对比+模型设计
1040 0
【YOLOV5-6.x讲解】YOLO5.0VS6.0版本对比+模型设计
|
数据可视化
改进Yolov5 | 用 GSConv+Slim Neck 一步步把 Yolov5 提升到极致!!!(二)
改进Yolov5 | 用 GSConv+Slim Neck 一步步把 Yolov5 提升到极致!!!(二)
198 0
|
机器学习/深度学习 边缘计算 自动驾驶
改进Yolov5 | 用 GSConv+Slim Neck 一步步把 Yolov5 提升到极致!!!(一)
改进Yolov5 | 用 GSConv+Slim Neck 一步步把 Yolov5 提升到极致!!!(一)
928 0
|
编解码 Go 数据库
你的YOLO V4该换了 | YOLO V4原班人马改进Scaled YOLO V4,已开源(附论文+源码)
你的YOLO V4该换了 | YOLO V4原班人马改进Scaled YOLO V4,已开源(附论文+源码)
244 0