基于Pytorch的YoLoV4模型代码及作品欣赏-阿里云开发者社区

一、前言

在正式文章开始前，先写几句废话把。。。

距前面写完YoLoV4架构已经两个月过去，在这两个月中断断续续写代码，再debug。终于完成了YoLoV4的模型构建及运行，还得到了一些挺有意思的结果。

先说下学习感受：

第一，YoLo其实并不难：它有点像大学时候学的工程制图，其中，每个子构件，每条线，每个注释都比较简单易懂，但是合成一个整体，却是一个非常复杂的有机体。

第二，Debug要比Coding消耗更多的时间：所有的代码，基本上半个月就写完了。但是Debug用了一个半月。Debug过程确实对于YoLo还有Python的学习都是很有好处的，理解更深入一个层次。

二、YoLoV4的作品欣赏

我把源码部分放在最后，因为它实在是有点长。而且本着学以致用的精神，不妨可以先看下YoLo可以实现什么功能。

①第一张图：实拍路景

可以看到YoLo很好地识别到了Truck和Person，但是对于最左边的Truck，可能因为不完整，没有识别出来。

②第二章图：实拍路景

比较简单的一个构图，识别起来毫无压力。

③第三张图：实拍路景

对于这种车辆比较密集的情况，不能每台车都框出来，但是总的区域还是比较准确的。

上面说的“不能都框出来”的问题实际上是YoLo的缺点之一，叫label rewriting。可以搜索下Poly-YoLo

接下来，再搞点不一样的

④二次元图像

动漫bleach中的乌尔奇奥拉，被识别成了盆栽。（不过确实有点像）

⑤游戏壁纸

影魔被识别成Cake（难道学习样本中有影魔形状的Cake？）

⑥抖音艺术图（侵删）

这是有点意外的，居然还能识别成person。

⑦抖音艺术图（侵删）

这张图我觉得很好地诠释了YoLo的强大，因为它输出的结果，比我人脑乍一看识别出的内容更加丰富！而且框选的也很准确！

三、YoLo模型的源代码

按照YoLo的架构，我将把YoLo模型的代码分为Backbone，Neck，Head三个部分。

关于代码的理解写在源码注释中。

在放上代码之前，先声明一点：这里的源代码，只包含YoLo的模型部分，要想完整地运行出上面的结果，还需要：

①读取图像，画出Bounding box，生成新图像的python代码辅助部分；

②训练神经源网络，得到预训练后的权值，并导入到模型中。

（这两部分我是在别人的帮助下完成的，在这里就不附上了）

这里在多说一句，关于②导入预训练的权值，如果试下不导入，输出的结果就是下面这种混乱的状态。

这也很好理解，因为神经元模型并没有学习过很多样本，它也不知道什么样的结果应该输出Person，什么样的结果应该输出Cake。可见预训练对于神经元模型的重要性。

Backbone:

import torch
import torch.nn.functional as F
import torch.nn as nn
import math
import numpy as np
class Mish(nn.Module):  #定义激活函数Mish
    def __init__(self):
        super(Mish,self).__init__()
    def forward(self,x):
        return x*torch.tanh(F.softplus(x))
class CBM(nn.Module):  #定义CBM模块
    def __init__(self,in_channels, out_channels, kernel_size, stride=1):
        super(CBM,self).__init__()
        self.conv = nn.Conv2d(in_channels, out_channels, kernel_size, stride, kernel_size//2, bias=False)  #定义卷积层
        self.bn = nn.BatchNorm2d(out_channels)
        self.activation = Mish()
    def forward(self,x):
        x = self.conv(x)
        x = self.bn(x)
        x = self.activation(x)
        return x
class Res_Unit(nn.Module): #构建Res Unit,回忆下Res Unit的结构=CBM*2+残差相加
    def __init__(self,channels,hidden_channels = None):
        super(Res_Unit,self).__init__()
        if hidden_channels == None:
            hidden_channels = channels
        self.block = nn.Sequential(
            CBM(channels, hidden_channels, 1) , #第一个CBM的卷积核为1×1，作用是降维提高运算效率
            CBM(hidden_channels,channels, 3)
        )
    def forward(self,x):
        return x+self.block(x)   #计算残差
class CSPNet(nn.Module):
# -----------CSPNet原理图-----------------------------------------------------------------------------------
#
#    CSPX ==>  [downsample]--> [CBM(split_conv0)]--------------------->[concat]->[downsample]
#                          |                                                ↑
#                          |-> [CBM(split_conv1)]-->[Res Unit(block_conv)]--|
#
# -----------CSPNet原理图------------------------------------------------------------------------------------
    def __init__(self,in_channels, out_channels, num_Res_Unit, first):    #num_Res_Unit表示在CSPNet中，Res_Unit的个数
        super(CSPNet,self).__init__()
        self.downsample_conv = CBM(in_channels, out_channels, 3, stride=2)  #在CSP模块最开始，有一个下采样
        if first == True:     #第一个CSP和后面的CSP不同，后面的CSP输出会降维，所以需要区分是否是first CSP
            self.split_conv0 = CBM(out_channels, out_channels, 1) #对应结构图中CPSX模块中，上面的那个CBM。输入为下采样之后的输出。
            self.split_conv1 = CBM(out_channels, out_channels, 1) #对应结构图中CPSX模块中，下面第一个的那个CBM
            self.block_conv = nn.Sequential(
                Res_Unit(channels=out_channels, hidden_channels=out_channels//2),
                CBM(out_channels,out_channels,1)  #第一个CSPNet是CSP1，所以这里Res_Unit刚好只有一个
            ) #对应结构图中CPSX模块中，下面的那个Res_Unit 及后面的CBM
            self.concat_downsample = CBM(out_channels*2, out_channels,1)  #这是concat之后的下采样模块，因为concat，输入变成out_channels的2倍
        else:
            self.split_conv0 = CBM(out_channels, out_channels//2, 1)  # 对于非第一个CSP模块，输出会降维，所以增加//2
            self.split_conv1 = CBM(out_channels, out_channels//2, 1)
            self.block_conv = nn.Sequential(
                *[Res_Unit(channels=out_channels//2) for _ in range(num_Res_Unit)]
                ,CBM(out_channels//2, out_channels//2, 1)
            )  # 用*[]构建num_Res_Unit个输入参数（）
            self.concat_downsample = CBM(out_channels, out_channels,1)  # 这是concat之后的下采样模块
    def forward(self,x):   #x为要处理的tensor
        x = self.downsample_conv(x)
        x0 = self.split_conv0(x)
        x1 = self.split_conv1(x)
        x1 = self.block_conv(x1)
        x = torch.cat([x1,x0],dim = 1)
        x = self.concat_downsample(x)
        return x
#======================backbone结构构建=============================================
class Yolo_backbone(nn.Module):
    def __init__(self):
        super(Yolo_backbone, self).__init__()
#---------构建Darknet 53的结构------------------------------------
        self.conv1 = CBM(3, 32, kernel_size=3, stride=1) #对应第一个CBM，输入原始图像通道为3，输出通道为32
        self.structure = nn.ModuleList([
            CSPNet(32, 64, num_Res_Unit = 1, first= True),
            CSPNet(64, 128, num_Res_Unit = 2, first=False),
            CSPNet(128, 256, num_Res_Unit = 8, first=False),
            CSPNet(256, 512, num_Res_Unit = 8, first=False),
            CSPNet(512, 1024, num_Res_Unit = 4, first=False),
        ]) #每经过一个CSP，输出通道数翻倍
#-----------------------------------------------------------------
#-------------下面进行网络初始化---------------------------------
        for m in self.modules():
            if isinstance(m,nn.Conv2d) :  #卷积核初始化,还有其他的卷积核初始化方法
                n = m.kernel_size[0]* m.kernel_size[1]* m.out_channels
                m.weight.data.normal_(0,math.sqrt(2./n))
            if isinstance(m,nn.BatchNorm2d):    #BN层初始化
                m.weight.data.fill_(1)
                m.bias.data.zero_()
    def forward(self, x):
        x = self.conv1(x)
        x = self.structure[0](x)
        x = self.structure[1](x)
        out1 = self.structure[2](x)   #参考YOLO的整体框架结构，从此开始有3个输出，输出顺序out1,out2,out3
        out2 = self.structure[3](out1)
        out3 = self.structure[4](out2)
        return out1,out2,out3
#==============以上，backbone构建完成 =============================
def load_model_path(model,pth):  #model是网络模型，pth是预训练权重的文件路径
    device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
    model_dict = model.state_dict() #所有权重weight的label和parameter将输入到字典model_dict中。
    # 在pytorch中，torch.nn.Module模块中的state_dict变量存放训练过程中需要学习的权重和偏置系数
    # model_dict是还未训练的权值文件
    pretrain_dict = torch.load(pth,map_location=device)  #导入已经训练好的权值文件
    matched_dict = {}
    for k,v in model_dict.items():
        if k.find('backbone') == -1:
            key = 'backbone.'+k.replace('structure','stages').replace('block_conv','blocks_conv').replace('concat_downsample','concat_conv')
            #上面这里是发现已经训练好的权值，和要替换的未训练的权值名称有些差异，所以要替换下这些名称
            if np.shape(pretrain_dict[key]) == np.shape(v):
                matched_dict[k] = v
    #上面这个for的作用：这里可以先打印下pretrain_dict中的变量名，可以看到几乎都是backbone开头，这些都是训练好的backbone的权值
    #model_dict是未训练的模型的权值文件，通过变量key找到在pretrain_dict中的对应变量，把训练好的变量赋值给matched_dict
    model_dict.update(matched_dict)  #更新我们自己YOLO的权值文件（也就是model_dict）
    model.load_state_dict(model_dict)
    return model
def YOLO_backbone_pretrain(pretrain):  #进行网络预训练
    model = Yolo_backbone()
    load_model_path(model, pretrain)
    return model
if __name__ == "__main__":
    backbone = YOLO_backbone_pretrain('pth/yolo4_weights_my.pth')  #这个路径放入预训练好的权值文件

Neck:

这部分我认为是YoLo中最复杂、最容易出错、需要耐心的一部分，必须要严格参照YoLo的构架图。

import torch
import torch.nn as nn
from collections import OrderedDict
from YOLO_backbone import *
def CBL(channel_in,channel_out,kernel_size,stride=1):  #编辑NECK中的CBL模块
    if kernel_size:
        pad = (kernel_size-1)//2
    else:
        pad = 0
    return nn.Sequential(OrderedDict([
                             ('conv',nn.Conv2d(channel_in,channel_out,kernel_size=kernel_size,stride=stride,padding=pad)),
                             ('bn',nn.BatchNorm2d(channel_out)),
                             ('leaky_relu',nn.LeakyReLU(0.1))
                         ]))
class SPP(nn.Module):   #定义SPP模块
    def __init__(self,pool_sizes=[13,9,5]):  #13,9,5怎是工程经验值
        super(SPP, self).__init__()
        self.maxpools = nn.ModuleList([nn.MaxPool2d(pool_size, 1, pool_size//2) for pool_size in pool_sizes])
    def forward(self,x):
        features = [maxpool(x) for maxpool in self.maxpools]
        features = torch.cat(features+[x],dim=1)   #注意：虽然经过maxpooling之后features的维度会比x小很多，但是因为concat的broadcast机制，仍然可以拼接得上
        return features
class upsample(nn.Module):  #定义卷积+上采样模块
    def __init__(self, in_channels, out_channels):
        super(upsample,self).__init__()
        self.up_sample = nn.Sequential(
            CBL(in_channels,out_channels,1),
            nn.Upsample(scale_factor=2,mode='nearest')
        )
    def forward(self,x):
        return self.up_sample(x)
def CBL_3(channel_list,channel_in): #定义CBL*3模块(通道数调用的时候需要输入)
    m = nn.Sequential(
        CBL(channel_in, channel_list[0],1),
        CBL(channel_list[0],channel_list[1],3),
        CBL(channel_list[1],channel_list[0],1)
    )
    return m
def CBL_5(channel_list,channel_in): #定义CBL*5模块
    m= nn.Sequential(
        CBL(channel_in, channel_list[0],1),   #降维，1*1卷积核降维？
        CBL(channel_list[0],channel_list[1],3),  #升维，3*3卷积核升维？
        CBL(channel_list[1],channel_list[0],1),#降维
        CBL(channel_list[0], channel_list[1], 3),#升维
        CBL(channel_list[1], channel_list[0], 1)#降维
    )
    return m
def YOLO_before_head(channel_list,channel_in):  #定义head前面的CBL+conv模块
    m = nn.Sequential(
        CBL(channel_in, channel_list[0],3),
        nn.Conv2d(channel_list[0],channel_list[1],1)
    )
    return m
#-------------------下面通过上面构造的各个子组件，构建整个neck的网络-------------------------------
class YOLO_neck_body(nn.Module):
     def __init__(self, num_anchors, num_classes):
         super(YOLO_neck_body, self).__init__()
         # self.backbone = YOLO_backbone_pretrain('pth/yolo4_weights_my.pth')  #!!!这里也会显示预训练数据名冲突，所以线注释掉！！！用下面的替换
         self.backbone = Yolo_backbone()
         # 以下内容需要参考Neck的架构图
         # 首先编写①
         self.CBL31_1 = CBL_3([512,1024],1024)  #通道数按照YOLO的规定
         self.SPP_1 = SPP()
         self.CBL32_1 = CBL_3([512,1024],2048)
         #继续写②
         self.upsample_2 = upsample(512,256)  #通道数同样是按YOLO的架构规定，不用纠结。注意：upsample内部已经包含前面的CBL了
         self.CBL_2 = CBL(512,256,1)
         self.CBL5_2 = CBL_5([256,512],512)
         #继续写③
         self.upsample_3 = upsample(256,128)
         self.CBL_3 = CBL(256,128,1)     #这里第一个256原来是512
         self.CBL5_3 = CBL_5([128,256],256)
         #写第一个head输出（76*76这个）
         final_out_channels = num_anchors*(5+num_classes)   #等于255
         self.yolo_head76 = YOLO_before_head([256,final_out_channels],128)
         #继续写④
         self.downsample_4 = CBL(128,256,3,stride=2)  #在④的CBL之前有个downsample,所以stride=2
         self.CBL5_4 = CBL_5([256,512],256)
         self.downsample2_4 = CBL(256,256,3,stride=2)  #R5从④出来后有一个downsample
         self.yolo_head38 = YOLO_before_head([512,final_out_channels],256) #这是第2个head，38*38
         #继续写⑤
         self.downsample_5 = CBL(256,512,3,stride=2)#同样，在⑤的CBL之前有个downsample,所以stride=2
         self.CBL5_5 = CBL_5([512,1024],512)
         self.CBL_5 = CBL(1024,512,3,stride=1)
         self.yolo_head19 = YOLO_before_head([1024,final_out_channels],512)  #这是第3个head,19*19
     def forward(self,x):
         #forward要严格按照YOLO的架构顺序，是个比较精细的工作
         x2,x1,x0 = self.backbone(x)   #x2,x1,x0对应backbone的三个输出,注意：对应backbone的结构，输出顺序为x2->x1->x0
         # 先写R1，R1为最上面的路径
         R1 = self.CBL31_1(x0)
         R1 = self.SPP_1(R1)
         R1_before_upsample = self.CBL32_1(R1)  #引出一条之路，给后面的R6用
         R1 = self.upsample_2(R1_before_upsample)
         print('开始neck部分debug')
         print('R1_before_upsample:',R1_before_upsample.shape)
         print('R1:',R1.shape)
         #写R2，上面第二条路径
         R2 = self.CBL_2(x1) #引入x1,仍然是按照yolo的架构
         print('R2:',R2.shape)
         R1_and_2 = torch.cat([R1,R2],axis=1)
         print('R1_and_2:',R1_and_2.shape)
         R1_and_2 = self.CBL5_2(R1_and_2)
         print('R1_and_2:',R1_and_2.shape)
         R1_and_2 = self.upsample_3(R1_and_2)   #upsample包括前面的CBL
         print('R1_and_2:',R1_and_2.shape)
         #写第三条路径，R3
         R3 = self.CBL_3(x2) #引入x2
         R1_and_2_and_3 = torch.cat([R3,R1_and_2],axis=1)
         R1_and_2_and_3 = self.CBL5_3(R1_and_2_and_3)
         print('R3:',R3.shape)
         print('R1_and_2_and_3:',R1_and_2_and_3.shape)
         #写第四条路径，R4
         R4 = R1_and_2_and_3
         print('R4:',R4.shape)
         #写第五条路径，R5
         R5 = torch.cat([R4,R1_and_2],axis=1)
         R5 = self.CBL5_4(R5)
         R5_beforehead = self.downsample2_4(R5) #创造一个断点给输出
         print('R5_beforehead:',R5_beforehead.shape)
         #写第六条路径，R6
         R5 = self.downsample_5(R5_beforehead)
         print('R5:',R5.shape)
         print('R1_before_upsample:',R1_before_upsample.shape)
         R6 = torch.cat([R5,R1_before_upsample],axis=1)
         R6 = self.CBL_5(R6)
         print('R6:',R6.shape)
         #写输出
         print('进入头部前再检查一遍neck的输入张量')
         print('76*76:',R1_and_2_and_3.shape)
         print('38*38:',R5_beforehead.shape)
         print('19*19:',R6.shape)
         out76 = self.yolo_head76(R1_and_2_and_3)
         out38 = self.yolo_head38(R5_beforehead)
         out19 = self.yolo_head19(R6)
         return out76,out38,out19
if __name__ == '__main__':
    model = YOLO_neck_body(3,80)
    # load_model_path(model,'pth/yolo4_weights_my.pth')

Head:

import torch.nn as nn
import torch.nn.functional as F
import torch
def yolo_decode(output,num_class, anchors, num_anchors, scale):
    #anchors释意见Yololayer；
    #output=[B,A*n_ch,H,W],这个是head输出的张量，这个张量中的数据顺序、结构是由neck的模型结构决定的
    device = None
    if output.is_cuda:
        device = output.get_device()
    n_ch = 4+1+num_class  #tx,ty,th,tw,obj
    A = num_anchors  #num_anchors = 3
    B = output.size(0) #output第一个B是batch，代表每次处理图片的批次数
    H = output.size(2) #output第三个是H，纵向网格数 = 19或38或76
    W = output.size(3) #output第四个是W，横向网格数 = 19或38或76
    output = output.view(B,A,n_ch,H,W).permute(0,1,3,4,2).contiguous() #重组output中数据的顺序，变成[B,A,H,W,n_ch]，为了下一步方便取bx,by,bw,bh
    tx = output[...,0] #n_ch=85=[tx,ty,tw,th,obj,coco]
    ty = output[...,1]
    tw = output[...,2]
    th = output[...,3]
    obj = output[...,4]
    cls = output[...,5:]
    print('tw:',tw.shape)
    print('th:',th.shape)
     #bx,by,bw,bh是bounding box中心点位置及框的长宽的值；tx,ty,tw,th是网络学习得到的值（其实是list），需要用tx,ty,tw,th解码得出bx,by,bw,bh。
    #注意！！！这里的bx,by,bw,bh的单位是网格数（也就是按19*19，38*38，76*76分割的网格数），而不是实际的像素数
    obj = torch.sigmoid(obj)
    cls = torch.sigmoid(cls)
    grid_x = torch.arange(W,dtype=torch.float).repeat(1,3,W,1).to(device)
    # grid_y = torch.arange(H, dtype=torch.float).repeat(1, 3, 1, H).to(device)
    grid_y = torch.arange(H, dtype=torch.float).repeat(1, 3, H, 1).permute(0, 1, 3, 2).to(device)
    print('开始头部debug')
    print('tx:',tx.shape)
    print('ty:',ty.shape)
    print('grid_x:',grid_x.shape)
    print('grid_y:',grid_y.shape)
    bx = grid_x + torch.sigmoid(tx)
    by = grid_y + torch.sigmoid(ty)
    for i in range(num_anchors):
        tw[:,i,:,:] = (torch.exp(tw[:,i,:,:])*scale -0.5*(scale-1))*anchors[i*2]  #计算bw,因为Anchor在第二维（output中的A），i也对应要在第二维；因为anchor要跳一位去宽度值，所以要*2
        tw[:,i,:,:] = (torch.exp(th[:,i,:,:])*scale - 0.5*(scale-1))*anchors[i*2+1]  #与上面同理
    bx = (bx / W).unsqueeze(-1)  #进行归一化，为了方便模型训练
    by = (by / H).unsqueeze(-1)  #同上
    bw = (tw / W).unsqueeze(-1)  #同上
    bh = (th / H).unsqueeze(-1)  #同上
    print('bx:',bx.shape)
    print('by:',by.shape)
    print('bw:',bw.shape)
    print('bh:',bh.shape)
    box = torch.cat((bx,by,bw,bh),dim=-1).reshape(B,A*H*W,4)
    obj = obj.unsqueeze(-1).reshape(B,A*H*W,1) #索引的时候，损失了最后一个维度，要补回来
    cls = cls.reshape(B,A*H*W,num_class)
    head_output = torch.cat([box, obj, cls],dim = -1)
    return head_output
class YoloLayer(nn.Module):
    def __init__(self,img_size, anchor_mask=[],num_class =80, anchors = [],num_anchors=9, stride =32, scale = 1):
        super(YoloLayer,self).__init__()
        self.anchor_mask = anchor_mask  #anchor_mask，[[6,7,8],[3,4,5],[0,1,2]]，是一个二维数组
        self.num_class = num_class  #coco类别有80类
        self.anchors = anchors  #每个head有3个anchor，共9个anchor，anchors = [12, 16, 19, 36, 40, 28, 36, 75, 76, 55, 72, 146, 142, 110, 192,243, 459, 401]
        # 共9组数据12，16  19，36  40，28；  36，75  76，55  72，146；  142，110  192，243  459，401
        self.num_anchors = num_anchors  #每个head有3个anchor，共9个anchor，num_anchors=9
        self.anchor_step = len(anchors) // num_anchors   #anchor_step = 18/9 = 2
        self.stride = stride  #stride是网格的像素数，stride*网格数=608
        self.scale = scale   #scale 一般取1
        self.feature_length = [img_size[0]//8,img_size[0]//16,img_size[0]//32]
        self.img_size = img_size
    def forward(self,output):
        if self.training:   #????
            return output
        masked_anchors = []
        for m in [0,1,2,3,4,5,6,7,8]:
            masked_anchors += self.anchors[m*self.anchor_step:(m+1)*self.anchor_step]  #取所有mask对应的所有的anchor
        masked_anchors = [anchor/self.stride for anchor in masked_anchors]  #这个操作我理解是进行单位转换，把像素数除以每个网格的像素数，得到网格数，传入上面yolo_decode
        data = yolo_decode(output, self.num_class, masked_anchors, len(self.anchor_mask),scale = self.scale)
        return data

四、写在最后

对于上面的图像输出结果，其实最开始我是很诧异的，因为YoLo神经元网络模型，其实数学本质上就是非线性运算（Leaky Relu），并不复杂。没想到经过一系列的组合、堆叠，居然能实现这么复杂的功能！

然而，刚好在最近看到的一本书里看到了一个理论说明了这个现象，送给大家：

如果让计算机反复地计算极其简单的运算法则，那么就可以使之发展成为异常复杂的模型，并可以解释自然界中的所有现象，支配宇宙的原理无非就是区区几行程序代码。
-Stephen Wolfram

基于Pytorch的YoLoV4模型代码及作品欣赏

一、前言

二、YoLoV4的作品欣赏

三、YoLo模型的源代码

Backbone:

Neck:

Head:

四、写在最后

热门文章

最新文章

相关课程

相关电子书

相关实验场景

推荐镜像

热门

活动广场

任务中心

开发者评测

高校计划

乘风者计划

训练营

阿里云MVP

话题

直播

下载

镜像站

技术资料

插件

基于Pytorch的YoLoV4模型代码及作品欣赏

一、前言

二、YoLoV4的作品欣赏

三、YoLo模型的源代码

Backbone:

Neck:

Head:

四、写在最后

热门文章

最新文章

相关课程

相关电子书

相关实验场景

推荐镜像