0. 概述
在本节实验中,我们将基于 PyTorch 实现残差神经网络 ResNet,并在一个难度稍大的图片数据集(CIFAR-10)上进行训练和测试。
具体包括如下几个部分:
(1) 熟悉新数据集 CIFAR-10,并和 MNIST 对比分类难度;
(2) 学习残差神经网络,特别是 Block 的概念;
(3) 构建残差神经网络,并基于此实现 CIFAR-10 的训练与测试。
Ref: https://arxiv.org/pdf/1512.03385.pdf
https://zhuanlan.zhihu.com/p/106764370
1. 数据集介绍
CIFAR-10 数据集样例和 10 个类别如下所示:
官方说明及下载地址:http://www.cs.toronto.edu/~kriz/cifar.html
1.1 数据集准备
我们首先来准备数据集,方法与 MNIST 类似。CIFAR-10 数据集。
import torch import torchvision import torch.nn as nn import torch.nn.functional as F import torch.optim as optim print(torch.manual_seed(1))
<torch._C.Generator object at 0x0000015FF0CD6B50>
batch_size = 250 # 设置训练集和测试集的 batch size,即每批次将参与运算的样本数 # 训练集 train_set = torchvision.datasets.CIFAR10('./dataset_cifar10', train=True, download=True, transform=torchvision.transforms.Compose([ torchvision.transforms.ToTensor(), torchvision.transforms.Normalize( (0.4914,0.4822,0.4465), (0.2023,0.1994,0.2010) ) ]) ) # 测试集 test_set = torchvision.datasets.CIFAR10('./dataset_cifar10', train=False, download=True, transform=torchvision.transforms.Compose([ torchvision.transforms.ToTensor(), torchvision.transforms.Normalize( (0.4914,0.4822,0.4465), (0.2023,0.1994,0.2010) ) ])) train_loader = torch.utils.data.DataLoader(train_set, batch_size=batch_size, shuffle=True) test_loader = torch.utils.data.DataLoader(test_set, batch_size=batch_size, shuffle=True)
1.2 分析分类难度:CIFAR-10 vs MNIST
下面我们用实验二中定义过的卷积神经网络在 CIFAR-10 数据集上训练并测试。请注意:由于 CIFAR-10 的图片格式与 MNIST 稍有不同,所以网络结构中 conv1 的输入通道数和 fc1 的输入向量长度都进行了调整。调整后的神经网络比原先拥有更多的参数,理论上有助于增加网络的学习能力。
原先的卷积神经网络在 MNIST 数据集上取得的测试准确率在 98.9% 左右。通过如下对比,我们可以看出 CIFAR-10 的分类难度相对于 MNIST 来说有了显著增加。
class CNN5(nn.Module): def __init__(self): super(CNN5, self).__init__() self.conv1 = nn.Conv2d(in_channels=3, out_channels=6, kernel_size=5) # in_channels 由 1 改变为 3 self.conv2 = nn.Conv2d(in_channels=6, out_channels=12, kernel_size=5) self.fc1 = nn.Linear(in_features=12*5*5, out_features=120) # in_features 由 12*4*4 改变为 12*5*5 self.fc2 = nn.Linear(in_features=120, out_features=60) self.out = nn.Linear(in_features=60, out_features=10) def forward(self, t): # conv1 t = self.conv1(t) t = F.relu(t) t = F.max_pool2d(t, kernel_size=2, stride=2) # conv2 t = self.conv2(t) t = F.relu(t) t = F.max_pool2d(t, kernel_size=2, stride=2) t = t.reshape(batch_size, 12*5*5) # dim1 由 12*4*4 改变为 12*5*5 # fc1 t = self.fc1(t) t = F.relu(t) # fc2 t = self.fc2(t) t = F.relu(t) # output layer t = self.out(t) return t
network = CNN5() network.cuda() loss_func = nn.CrossEntropyLoss() # 损失函数 optimizer = optim.SGD(network.parameters(), lr=0.1) # 优化器 def get_num_correct(preds, labels): # get the number of correct times return preds.argmax(dim=1).eq(labels).sum().item()
开始训练
total_epochs = 10 for epoch in range(total_epochs): total_loss = 0 total_train_correct = 0 for batch in train_loader: images, labels = batch images = images.cuda() labels = labels.cuda() preds = network(images) loss = loss_func(preds, labels) optimizer.zero_grad() loss.backward() optimizer.step() total_loss += loss.item() total_train_correct += get_num_correct(preds, labels) print("epoch:", epoch, "correct times:", total_train_correct, f"training accuracy:", "%.3f" %(total_train_correct/len(train_set)*100), "%", "total_loss:", "%.3f" %total_loss)
epoch: 0 correct times: 12042 training accuracy: 24.084 % total_loss: 412.031 epoch: 1 correct times: 19445 training accuracy: 38.890 % total_loss: 337.925 epoch: 2 correct times: 22616 training accuracy: 45.232 % total_loss: 304.583 epoch: 3 correct times: 24415 training accuracy: 48.830 % total_loss: 286.545 epoch: 4 correct times: 25799 training accuracy: 51.598 % total_loss: 271.240 epoch: 5 correct times: 26835 training accuracy: 53.670 % total_loss: 261.250 epoch: 6 correct times: 27906 training accuracy: 55.812 % total_loss: 249.286 epoch: 7 correct times: 28499 training accuracy: 56.998 % total_loss: 242.985 epoch: 8 correct times: 29324 training accuracy: 58.648 % total_loss: 235.174 epoch: 9 correct times: 29982 training accuracy: 59.964 % total_loss: 227.551
测试结果(测试准确率约 56% 左右)
total_test_correct = 0 total_loss = 0 for batch in test_loader: images, labels = batch images = images.cuda() labels = labels.cuda() preds = network(images) loss = loss_func(preds, labels) total_loss += loss total_test_correct += get_num_correct(preds, labels) print("correct times:", total_test_correct, f"test accuracy:", "%.3f" %(total_test_correct/len(test_set)*100), "%", "total_loss:", "%.3f" %total_loss)
correct times: 5671 test accuracy: 56.710 % total_loss: 49.122
2. 残差神经网络
2.1 残差神经网络基础
从以上结果可以看出,CIFAR-10 数据集的分类难度远高于 MNIST 数据集。理论上,增加其准确率的一个有效方法即增加神经网络的深度(层数),例如从上面的 6 层神经网络增加至 20 层左右。网络的深度越深,可抽取的特征层次就越丰富越抽象。
然而,事实证明有时网络层数并不是越深越好。如下图所示,是两个普通的深层卷积神经网络 (plain CNN) 在 CIFAR-10 上的训练和测试结果。两个神经网络的深度分别是 20 层和 56 层。
图片来源:Kaiming He et al. “Deep Residual Learning for Image Recognition”, 2015.
我们选择加深神经网络的层数是希望深层网络的表现能比浅层好,或者是希望它的表现至少和浅层网络持平,可实际的结果却不是这样的。从结果中可以看到,56 层的神经网络在训练集和测试集上的表现均明显差于 20 层的神经网络。这一现象被称为退化问题(degradation problem)。退化问题出现的原因是随着网络变深,网络优化变得更加困难。
深度残差网络 (Deep residual network, ResNet) 正是为了解决这个问题而提出的,它的提出是计算机视觉领域的一件里程碑式的事件。残差网络解决退化问题的关键即引入恒等映射 (identity mapping)。什么是恒等映射呢?我们来看一个简单的例子:
上图中,右边的神经网络可以理解为左边的浅层网络增加了三层框起来的部分。假如我们希望右边的深层网络与左边的浅层网络相比准确率可以持平,那么额外加上的三个神经层应当输入等于输出。我们假设这三层的输入为 𝑥,输出为 𝐻(𝑥),那么为了左右网络准确率持平,我们希望 𝐻(𝑥)=𝑥,这也就是所谓的恒等映射 (identity mapping)。ResNet 的初衷,就是让网络拥有这种恒等映射的能力,能够在加深网络的时候,至少保证深层网络的表现能够和浅层网络持平。
然而,普通的深层卷积神经网络 (plain CNN) 很难拟合潜在的恒等映射函数 𝐻(𝑥)=𝑥。如果可以直接把神经网络设计为 𝐻(𝑥)=𝐹(𝑥)+𝑥(即直接把恒等映射作为网络的一部分),就可以把这个问题转化为学习一个残差函数 𝐹(𝑥)=𝐻(𝑥)−𝑥。当 𝐹(𝑥)=0时,就构成了一个恒等映射。事实证明,𝐹(𝑥)的学习要比 𝐻(𝑥)容易的多。
残差结构(residual block)与正常结构的对比图如下所示:
和正常结构相比,残差结构 (residual block) 多了右侧的曲线,我们将这个曲线称作 shortcut connection。它将上一层(或几层)的输出“跳接”到本层,在本层的计算结果进入到激活函数 ReLU 之前与之相加,并将相加的结果一起输入到激活函数作为本层的最终输出。深度残差网络正是由许多这样的残差结构构成的。
下图展示了一个完整的 ResNet(最右侧的网络,共33个卷积层,1个全连接层)。为了方便对比,图中也画出了 VGG-19 和不带 shortcut connections 的 plain CNN。
在《Deep Residual Learning for Image Recognition》一文中,一共提出了五种 ResNet 网络结构,分别是 18 层、34 层、50层、101层和 152 层。五种结构的细节如下所示:
下面我们就来构建这五种 ResNet。虽然它们深度不同,但都有一个共同特点,即都是由两种简单的残差结构 (residual block) 组成的。
(1) 第一种残差结构(适用于 ResNet18 和 ResNet34):含两层 kernel_size=3 的卷积层,block 内输出通道数维持不变;
(2) 第二种残差结构(适用于 ResNet50、ResNet101 和 ResNet152):含三层卷积层,kernel_size 分别取 1、3、1,最后一层输出通道数扩展为原先的 4 倍。
2.2 构建两种 Residual Blocks
(1) 第一种残差结构:BasicBlock;
(2) 第二种残差结构:BottleneckBlock。
class BasicBlock(nn.Module): channel_expansion = 1 # {扩展后的最终输出通道数} / {扩展前的输出通道数(blk_mid_channels)} def __init__(self, blk_in_channels, blk_mid_channels, stride=1): super(BasicBlock, self).__init__() self.conv1 = nn.Conv2d(in_channels=blk_in_channels, # blk_in_channels:block 中第一个 conv 层的输入通道数 out_channels=blk_mid_channels, # blk_mid_channels:block 中第一个 conv 层的输出通道数 kernel_size=3, padding=1, stride=stride) # stride 可以任意指定 self.bn1 = nn.BatchNorm2d(blk_mid_channels) self.conv2 = nn.Conv2d(in_channels=blk_mid_channels, # block 中第二个 conv 层的输入通道数 out_channels=blk_mid_channels*self.channel_expansion, # 扩展后的最终输出通道数 kernel_size=3, padding=1, stride=1) # stride 恒为 1 self.bn2 = nn.BatchNorm2d(blk_mid_channels*self.channel_expansion) # 实现 shortcut connection: # 假如 block 的输入 x 和 conv2/bn2 的输出形状相同:直接相加 # 假如 block 的输入 x 和 conv2/bn2 的输出形状不同:在 shortcut connection 上增加一次对 x 的 conv/bn 变换 if stride != 1 or blk_in_channels != self.channel_expansion*blk_mid_channels: # 形状不同 self.shortcut = nn.Sequential( nn.Conv2d(in_channels=blk_in_channels, out_channels=self.channel_expansion*blk_mid_channels, # 变换通道数 kernel_size=1, padding=0, stride=stride), # 变换空间维度 nn.BatchNorm2d(self.channel_expansion*blk_mid_channels) ) else: # 形状相同 self.shortcut = nn.Sequential() def forward(self, t): # conv1 out = self.conv1(t) out = self.bn1(out) out = F.relu(out) ################### Please finish the following code ################### # conv2 & shortcut out = self.conv2(out) out = self.bn2(out) out = out+self.shortcut(t) out = F.relu(out) ######################################################################## return out
代码解释:BasicBlock残差结构拥有两层卷积层,两层批归一化层,填写conv2的相关代码可以仿照conv1即可。重点说明shortcut的填写,BasicBlock残差结构要求其经过交替出现的两层卷积,批归一化在最后激活之前与之相加的结果一起输入到激活函数作为本层的最终输出。而为了可以与经过卷积运算的out相加需要经过self.shortcut(t)的处理。最后填写代码“out = self.conv2(out);out = self.bn2(out);out = out+self.shortcut(t);out = F.relu(out)”。
class BottleneckBlock(nn.Module): channel_expansion = 4 # {扩展后的最终输出通道数} / {扩展前的输出通道数(blk_mid_channels)} def __init__(self, blk_in_channels, blk_mid_channels, stride=1): super(BottleneckBlock, self).__init__() self.conv1 = nn.Conv2d(in_channels=blk_in_channels, # blk_in_channels:block 中第一个 conv 层的输入通道数 out_channels=blk_mid_channels, # blk_mid_channels:block 中第一个 conv 层的输出通道数 kernel_size=1, padding=0, stride=1) # stride 恒为 1 self.bn1 = nn.BatchNorm2d(blk_mid_channels) self.conv2 = nn.Conv2d(in_channels=blk_mid_channels, # block 中第二个 conv 层的输入通道数 out_channels=blk_mid_channels, # block 中第二个 conv 层的输出通道数 kernel_size=3, padding=1, stride=stride) # stride 可以任意指定 self.bn2 = nn.BatchNorm2d(blk_mid_channels) self.conv3 = nn.Conv2d(in_channels=blk_mid_channels, # block 中第三个 conv 层的输入通道数 out_channels=blk_mid_channels*self.channel_expansion, # 扩展后的最终输出通道数 kernel_size=1, padding=0, stride=1) # stride 恒为 1 self.bn3 = nn.BatchNorm2d(blk_mid_channels*self.channel_expansion) # 实现 shortcut connection: # 假如 block 的输入 x 和 conv3/bn3 的输出形状相同:直接相加 # 假如 block 的输入 x 和 conv3/bn3 的输出形状不同:在 shortcut connection 上增加一次对 x 的 conv/bn 变换 if stride != 1 or blk_in_channels != blk_mid_channels*self.channel_expansion: # 形状不同 self.shortcut = nn.Sequential( nn.Conv2d(in_channels=blk_in_channels, out_channels=blk_mid_channels*self.channel_expansion, # 变换空间维度 kernel_size=1, padding=0, stride=stride), # 变换空间维度 nn.BatchNorm2d(blk_mid_channels*self.channel_expansion) ) else: # 形状相同 self.shortcut = nn.Sequential() def forward(self, t): ################### Please finish the following code ################### # conv1 out = self.conv1(t) out = self.bn1(out) out = F.relu(out) # conv2 out = self.conv2(out) out = self.bn2(out) out = F.relu(out) # conv3 & shortcut out = self.conv3(out) out = self.bn3(out) out = out+self.shortcut(t) out = F.relu(out) ######################################################################## return out
代码解释:BottleneckBlock残差结构拥有三层卷积层,三层批归一化层,填写conv1,conv2和conv3的相关代。conv1,conv2要求按照先卷积后批归一化再激活的顺序进行。在conv3中需要填写shortcut实现跳接,conv3的批归一化在最后激活之前经过shortcut改变形状的原始输入相加的结果一起输入到激活函数作为本层的最终输出。而为了可以与经过卷积运算的out相加需要经过self.shortcut(t)的处理。
2.3 构建完整的残差神经网络
接下来我们来基于 BasicBlock 和 BottleneckBlock 构建完整的残差神经网络。
稍后我们将以 ResNet18 为例在 CIFAR-10 上训练并测试:
class ResNet(nn.Module): def __init__(self, block, num_blocks, num_classes): super(ResNet, self).__init__() self.residual_layers = 4 # 每个 "residual layer" 含多个 blocks,对应上面列表中的一行 (即 conv2_x, conv3_x, conv4_x 或 conv5_x) self.blk1_in_channels = 32 # 按照上面的列表,此处应填 64,但由于大网络训练起来耗时太长,此处我们酌情把全部通道都除以 2 self.blk_mid_channels = [32, 64, 128, 256] # 原先的通道数:[64, 128, 256, 512] self.blk_channels = [self.blk1_in_channels] + self.blk_mid_channels # [32, 32, 64, 128, 256] self.blk_stride = [1,2,2,2] # 每个 residual layer 的 stride self.blk_channel_expansion = block.channel_expansion # 第一个卷积层(独立于 residual layers 之外) self.conv1 = nn.Conv2d(in_channels=3, out_channels=self.blk_channels[0], kernel_size=3, padding=1, stride=1) self.bn1 = nn.BatchNorm2d(self.blk1_in_channels) # residual layers (打包在 self.layers 中) self.layers = nn.Sequential() for i in range(self.residual_layers): blk_in_channels = self.blk_channels[i] if i==0 else self.blk_channels[i]*block.channel_expansion blk_mid_channels = self.blk_channels[i+1] self.layers.add_module(f"residule_layer{i}", self._make_layer(block=block, # block 种类:BasicBlock 或 BottleneckBlock blk_in_channels=blk_in_channels, blk_mid_channels=blk_mid_channels, num_blocks=num_blocks[i], # 该 residual layer 有多少个 blocks stride=self.blk_stride[i]) ) # 最后的全连接层 self.linear = nn.Linear(in_features=self.blk_channels[self.residual_layers]*block.channel_expansion, out_features=num_classes) def _make_layer(self, block, blk_in_channels, blk_mid_channels, num_blocks, stride): block_list = [] stride_list = [stride] + [1]*(num_blocks-1) # 每个 block 的 stride for block_idx in range(num_blocks): if block_idx != 0: # 对于 residual layer 中非第一个 block: 调整其 blk_in_channels blk_in_channels = blk_mid_channels*block.channel_expansion block_list.append( block(blk_in_channels=blk_in_channels, blk_mid_channels=blk_mid_channels, stride=stride_list[block_idx]) ) return nn.Sequential(*block_list) # 返回一个 residual layer def forward(self, t): ################### Please finish the following code ################### # conv1 # ... out = self.conv1(t) out = self.bn1(out) out = F.relu(out) # "residual layers"(打包在 self.layers 中) out = self.layers(out) # average pooling out = F.avg_pool2d(out, 4) # shape of "out" before pooling (ResNet18): (batch_size, 256, 4, 4) # linear layer # out = out.reshape(XXX, XXX) # out = self.linear(out) out = out.reshape(batch_size, 256) out = self.linear(out) ######################################################################## return out
代码解释:ResNet的第一个卷积层(独立于 residual layers 之外) ,单独书写即可。重点是“out = out.reshape(batch_size, 256)”的填写,首先分析ResNet是对多张图片向量化处理,因此第一维应该是“batch_size”。现在分析第二维的填写,上一层是对(batch_size, 256, 4, 4)的张量用4*4大小的池化层进行均值池化,因此其输入张量为(batch_size, 256, 1, 1),将每张图片对应的张量展平应为(256,1),一个批次的张量是(batch_size, 256)。
请完成如下代码,构建五种 ResNet。
################## Please finish the following code ################### # def ResNet18(): # return ResNet(block=XXX, num_blocks=XXX, num_classes=XXX) def ResNet18(): return ResNet(block=BasicBlock, num_blocks=[2,2,2,2], num_classes=10) # def ResNet34(): # return ResNet(block=XXX, num_blocks=XXX, num_classes=XXX) def ResNet34(): return ResNet(block=BasicBlock, num_blocks=[3,4,6,3], num_classes=10) # def ResNet50(): # return ResNet(block=XXX, num_blocks=XXX, num_classes=XXX) def ResNet50(): return ResNet(block=BottleneckBlock, num_blocks=[3,4,6,3], num_classes=10) # def ResNet101(): # return ResNet(block=XXX, num_blocks=XXX, num_classes=XXX) def ResNet101(): return ResNet(block=BottleneckBlock, num_blocks=[3,4,23,3], num_classes=10) # def ResNet152(): # return ResNet(block=XXX, num_blocks=XXX, num_classes=XXX) def ResNet152(): return ResNet(block=BottleneckBlock, num_blocks=[3,8,36,3], num_classes=10) ########################################################################
代码解释:首先解释ResNet(block=, num_blocks=, num_classes=)三个参数的含义。第一个参数block接收一个类名,用来指定选择BasicBlock或者BottleneckBlock哪一种残差结构;第二个参数num_blocks接收一个python的list,用来指明conv2_x,conv3_x,conv4_x,conv5_x各有几个BasicBlock或者BottleneckBlock残差结构;第三个参数num_classes接收一个整数,用来指明该网络完成几分类任务,对于此任务均填10。ResNet18 和 ResNet34适用于BasicBlock残差结构,因此参数block等于BasicBlock,由前面给出的ResNet网络细节,参数num_blocks分别等于[2,2,2,2]和[3,4,6,3]。ResNet50、ResNet101 和 ResNet152适用于BottleneckBlock残差结构,因此参数blockBottleneckBlock,由前面给出的ResNet网络细节,参数num_blocks分别等于 [3,4,6,3],[3,4,23,3]和[3,8,36,3]。
开始训练前,我们先用下面的代码简单测试一下网络结构和输出结果的形状是不是和预想的一样。
def test_output_shape(): net = ResNet18() x = torch.randn(batch_size,3,32,32) # 模拟输入 y = net(x) print(net) # 查看网络结构 print("") print(y.shape) # 查看输出形状
ResNet( (conv1): Conv2d(3, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (bn1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (layers): Sequential( (residule_layer0): Sequential( (0): BasicBlock( (conv1): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (bn1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv2): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (bn2): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (shortcut): Sequential() ) (1): BasicBlock( (conv1): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (bn1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv2): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (bn2): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (shortcut): Sequential() ) ) (residule_layer1): Sequential( (0): BasicBlock( (conv1): Conv2d(32, 64, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1)) (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (shortcut): Sequential( (0): Conv2d(32, 64, kernel_size=(1, 1), stride=(2, 2)) (1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) ) (1): BasicBlock( (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (shortcut): Sequential() ) ) (residule_layer2): Sequential( (0): BasicBlock( (conv1): Conv2d(64, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1)) (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (shortcut): Sequential( (0): Conv2d(64, 128, kernel_size=(1, 1), stride=(2, 2)) (1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) ) (1): BasicBlock( (conv1): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (shortcut): Sequential() ) ) (residule_layer3): Sequential( (0): BasicBlock( (conv1): Conv2d(128, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1)) (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (shortcut): Sequential( (0): Conv2d(128, 256, kernel_size=(1, 1), stride=(2, 2)) (1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) ) (1): BasicBlock( (conv1): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (shortcut): Sequential() ) ) ) (linear): Linear(in_features=256, out_features=10, bias=True) ) torch.Size([250, 10])
从显示的结果可以看到,整个网络一共有 8 个 shortcut connections,与上面 ResNet18 的结构图对应。其中,第 3 个、第 5 个、第 7 个 shortcut connection 上由于 block 的输入 x 和 conv2/bn2 的输出形状不同,额外增加了一次对 x 的 conv/bn 变换。对应的 shortcut connections 在 ResNet18 的结构图中用虚线表示。
2.4 训练与测试
将构建好的 ResNet18 在 CIFAR-10 数据集上训练并测试。
network = ResNet18() network = network.cuda() # 将模型转移到 GPU 上
loss_func = nn.CrossEntropyLoss() # 损失函数:交叉熵损失 optimizer = torch.optim.Adam(network.parameters(), lr=0.001) # 优化器 def get_num_correct(preds, labels): # 计算正确分类的次数 return preds.argmax(dim=1).eq(labels).sum().item()
total_epochs = 5 # 由于训练耗时较长,这次我们只训练 5 个周期来看一下结果 for epoch in range(total_epochs): total_loss = 0 total_train_correct = 0 for batch in train_loader: # 抓取一个 batch # 读取样本数据 images, labels = batch images = images.cuda() # 数据转移到 GPU 上 labels = labels.cuda() # 标签转移到 GPU 上 # 完成正向传播,计算损失 preds = network(images) loss = loss_func(preds, labels) # 偏导归零 optimizer.zero_grad() # 反向传播 loss.backward() # 更新参数 optimizer.step() total_loss += loss.item() total_train_correct += get_num_correct(preds, labels) print("epoch: ", epoch, "correct times:", total_train_correct, "training accuracy:", "%.3f" %(total_train_correct/len(train_set)*100), "%", "total_loss:", "%.3f" %total_loss)
epoch: 0 correct times: 26564 training accuracy: 53.128 % total_loss: 258.822 epoch: 1 correct times: 35462 training accuracy: 70.924 % total_loss: 163.276 epoch: 2 correct times: 39420 training accuracy: 78.840 % total_loss: 120.100 epoch: 3 correct times: 41693 training accuracy: 83.386 % total_loss: 94.823 epoch: 4 correct times: 43504 training accuracy: 87.008 % total_loss: 74.431
保存模型
torch.save(network.cpu(), "resnet18.pt")
在测试集上测试模型
network = ResNet18() network = torch.load("resnet18.pt") num_correct = 0 for i, batch in enumerate(test_loader): images, labels = batch preds = network(images) if i == 0: print("preds.shape: ", preds.shape) # 检查 pred 的形状 pred_labels = torch.max(preds, dim=1)[1].data.numpy().squeeze() # 得到全部测试样本的分类结果 print("pred_labels.shape: ", pred_labels.shape) print("predicted labels (first 10 samples): ", pred_labels[:10]) # 打印前 10 个样本的分类结果 print("real labels (first 10 samples): ", labels[:10]) # 打印前 10 个样本的分类结果 num_correct += get_num_correct(preds, labels)
preds.shape: torch.Size([250, 10]) pred_labels.shape: (250,) predicted labels (first 10 samples): [0 5 0 0 0 5 6 5 4 2] real labels (first 10 samples): tensor([0, 5, 0, 0, 0, 5, 6, 5, 4, 2])
test_accuracy = num_correct/10000 print("test accuracy: ", test_accuracy)
test accuracy: 0.8038