一、题目背景
任务的目标:
是对10个类别的“时装”图像进行分类,使用FashionMNIST数据集(https://github.com/zalandoresearch/fashion-mnist )。上图给出了FashionMNIST中数据的若干样例图,其中每个小图对应一个样本。
数据集说明:
FashionMNIST数据集中包含已经预先划分好的训练集和测试集,其中训练集共60,000张图像,测试集共10,000张图像。每张图像均为单通道黑白图像,大小为32*32pixel,分属10个类别。
二、代码实现
2.1 数据导入和处理
有两种数据下载的方法:
(1)下载并使用pytorch的内置数据集。一般是常见的数据集,如MNIST、CIFAR10等,这种方法适合快速测试某个idea是否有效。
(2)从网站下载并且以csv格式存储的数据,读入并转成预期的格式。该方法需要自己构建Dataset,是重点。
对于数据的变换部分可以使用torchvison完:如将图片同一为一致的大小(便于后续能够输入网络训练),需要将数据格式转为tensor类等。
# 数据读入和加载 # 首先设置数据变换 image_size = 28 data_transform = transforms.Compose([ # 这一步取决于后续的数据读取方式,如果使用内置数据集则不需要 # transforms.ToPILImage(), transforms.Resize(image_size), transforms.ToTensor() ]) # 读取方式一:使用torchvision自带数据集,下载需要一段时间 train_data = datasets.FashionMNIST(root = './', train = True, download = True, transform = data_transform) test_data = datasets.FashionMNIST(root = './', train = False, download = True, transform = data_transform) # 定义dataloader类,以便在训练和测试时加载数据 train_loader = DataLoader(train_data, batch_size = batch_size, shuffle = True, num_workers = num_workers, drop_last = True) # 没有像训练数据集一样有drop_last test_loader = DataLoader(test_data, batch_size = batch_size, shuffle = False, num_workers = num_workers) # 数据可视化,验证读入的数据是否正确 image, label = next(iter(train_loader)) print(image.shape, label.shape) plt.imshow(image[0][0], cmap = "gray")
2.2 模型设计
这里其实我们还是用之前的CNN,但是可以为了方便在forward
部分,使用nn.Sequential
。
# 搭建一个CNN,后面将模型放到GPU上训练 class CNNnet(nn.Module): def __init__(self): super(CNNnet, self).__init__() self.conv = nn.Sequential( # 黑白图片的in_channels = 1 nn.Conv2d(1, 32, 5), nn.ReLU(), nn.MaxPool2d(2, stride = 2), # 防止过拟合 nn.Dropout(0.3), nn.Conv2d(32, 64, 5), nn.ReLU(), # 下面第一个2为kernel_size的大小 nn.MaxPool2d(2, stride = 2), nn.Dropout(0.3) ) self.fc = nn.Sequential( nn.Linear(64 * 4 * 4, 512), nn.ReLU(), nn.Linear(512, 10) ) def forward(self, x): x = self.conv(x) x = x.view(-1, 64 * 4 * 4) x = self.fc(x) # x = nn.functional.normalize(x) return x
可以打印出网络对应的结构为:
CNNnet( (conv): Sequential( (0): Conv2d(1, 32, kernel_size=(5, 5), stride=(1, 1)) (1): ReLU() (2): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False) (3): Dropout(p=0.3, inplace=False) (4): Conv2d(32, 64, kernel_size=(5, 5), stride=(1, 1)) (5): ReLU() (6): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False) (7): Dropout(p=0.3, inplace=False) ) (fc): Sequential( (0): Linear(in_features=1024, out_features=512, bias=True) (1): ReLU() (2): Linear(in_features=512, out_features=10, bias=True) ) )
最后会贴出所有代码,下图是训练结果:
2.3 损失函数和优化器
CrossEntropyLoss会自动把数值型的label转成one-hot型,用于计算CE loss。
需要确保label是从0开始ide,并且模型不加softmax层(在最后的Torch.nn.CrossEntropyLoss已经包括了激活函数softmax),所以注意pytorch训练中各个部分不是独立的,需要通盘考虑。
对于训练和测试环节,注意两者之间的差异:
模型状态设置
是否需要初始化优化器
是否需要将loss传回到网络
是否需要每步更新optimize
2.4 保存模型
训练完成后,可以使用torch.save
保存模型,当然可以在训练过程中保存模型(下次会提及这个)。
save_path = "./FahionModel.pkl" torch.save(model, save_path)
三、小报错的解决
报错1:
pic should be Tensor or ndarray. Got <class 'PIL.Image.Image'>.,后来发现是因为在spyder上下载不了数据集,然后就把jupyter notebook下载到的数据集复制过来了。使用内置数据集时,不需要在transforms.Compose上加上transforms.ToPILImage(),。
这里注意数据集如果下载不了可以自己先下载后加载路径,或者自己构建dataset用本地的csv数据。
另外可以参考:TypeError: pic should be PIL Image or ndarray. Got <class ‘numpy.ndarray’>
报错2:
RuntimeError: DataLoader worker (pid(s) 19384, 11964, 16832, 19252) exited unexpectedly
后来参考[解决方案] pytorch中RuntimeError: DataLoader worker (pid(s) 27292) exited unexpectedly这篇博客后吧num_workers从4设置为0就可以。
报错3:
可能会遇到这个问题:RuntimeError:CUDA error:out of memory问题解决。
四、全部代码
详细看注释。
# -*- coding: utf-8 -*- """ Created on Fri Oct 22 19:20:28 2021 @author: 86493 """ import os import numpy as np import pandas as pd import torch import torch.nn as nn import torch.optim as optim from torch.utils.data import Dataset, DataLoader from torchvision import transforms from torchvision import datasets import matplotlib.pyplot as plt # 配置训练环境和超参数 # 配置GPU,这里有两种方式 ## 方案一:使用os.environ os.environ['CUDA_VISIBLE_DEVICES'] = '0' ## 方案二:使用device,后续对要使用GPU的变量使用to(device)即可 device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu") # 配置其他超参数,如batch_size等 batch_size = 256 num_workers = 0 lr = 1e-4 epochs = 20 # 数据读入和加载 # 首先设置数据变换 image_size = 28 data_transform = transforms.Compose([ # 这一步取决于后续的数据读取方式,如果使用内置数据集则不需要 # transforms.ToPILImage(), transforms.Resize(image_size), transforms.ToTensor() ]) # 读取方式一:使用torchvision自带数据集,下载需要一段时间 train_data = datasets.FashionMNIST(root = './', train = True, download = True, transform = data_transform) test_data = datasets.FashionMNIST(root = './', train = False, download = True, transform = data_transform) # 定义dataloader类,以便在训练和测试时加载数据 train_loader = DataLoader(train_data, batch_size = batch_size, shuffle = True, num_workers = num_workers, drop_last = True) # 没有像训练数据集一样有drop_last test_loader = DataLoader(test_data, batch_size = batch_size, shuffle = False, num_workers = num_workers) # 数据可视化,验证读入的数据是否正确 image, label = next(iter(train_loader)) print(image.shape, label.shape) """ plt.imshow(image[0][0], cmap = "gray") """ # 搭建一个CNN,后面将模型放到GPU上训练 class CNNnet(nn.Module): def __init__(self): super(CNNnet, self).__init__() self.conv = nn.Sequential( # 黑白图片的in_channels = 1 nn.Conv2d(1, 32, 5), nn.ReLU(), nn.MaxPool2d(2, stride = 2), # 防止过拟合 nn.Dropout(0.3), nn.Conv2d(32, 64, 5), nn.ReLU(), # 下面第一个2为kernel_size的大小 nn.MaxPool2d(2, stride = 2), nn.Dropout(0.3) ) self.fc = nn.Sequential( nn.Linear(64 * 4 * 4, 512), nn.ReLU(), nn.Linear(512, 10) ) def forward(self, x): x = self.conv(x) x = x.view(-1, 64 * 4 * 4) x = self.fc(x) # x = nn.functional.normalize(x) return x """ model = CNNnet() print(model) """ model = CNNnet() model = model.cuda() # 多卡训练还可以这样写 model = nn.DataParallel(model).cuda() # 设计损失函数 criterion = nn.CrossEntropyLoss() # criterion = nn.CrossEntropyLoss(weight = [1, 1, 1, 1, 3, 1, 1, 1, 1, 1]) # 设计优化器 optimizer = optim.Adam(model.parameters(), lr = 0.001) def train(epoch): model.train() running_loss = 0.0 for batch_idx, data in enumerate(train_loader, 0): # 1.准备数据 inputs, target = data # 迁移到GPU inputs, target = inputs.to(device), target.to(device) # 2.前向传递 outputs = model(inputs) loss = criterion(outputs, target) # 3.反向传播 optimizer.zero_grad() loss.backward() # 4.更新参数 optimizer.step() running_loss += loss.item() if batch_idx % 30 == 29: print('[%d, %5d] loss: %.3f'%( epoch + 1, batch_idx + 1, running_loss / 30)) running_loss = 0.0 def test(): correct = 0 total = 0 with torch.no_grad(): for data in test_loader: images, labels = data images, labels = images.to(device), labels.to(device) outputs = model(images) # 求出每一行(样本)的最大值的下标,dim=1即行的维度 # 返回最大值和最大值所在的下标 _, predicted = torch.max(outputs.data, dim = 1) # label矩阵为N × 1 total += labels.size(0) correct += (predicted == labels).sum().item() print('accuracy on test set :%d %% ' % (100 * correct / total)) return correct / total if __name__ == '__main__': epoch_list = [] acc_list = [] for epoch in range(10): train(epoch) acc = test() epoch_list.append(epoch) acc_list.append(acc) plt.plot(epoch_list, acc_list) plt.ylabel('accuracy') plt.xlabel('epoch') plt.show()
print:
torch.Size([256, 1, 28, 28]) torch.Size([256]) [1, 30] loss: 1.229 [1, 60] loss: 0.725 [1, 90] loss: 0.648 [1, 120] loss: 0.585 [1, 150] loss: 0.537 [1, 180] loss: 0.492 [1, 210] loss: 0.493 accuracy on test set :82 % [2, 30] loss: 0.445 [2, 60] loss: 0.420 [2, 90] loss: 0.435 [2, 120] loss: 0.413 [2, 150] loss: 0.409 [2, 180] loss: 0.379 [2, 210] loss: 0.390 accuracy on test set :85 % [3, 30] loss: 0.371 [3, 60] loss: 0.360 [3, 90] loss: 0.369 [3, 120] loss: 0.358 [3, 150] loss: 0.350 [3, 180] loss: 0.340 [3, 210] loss: 0.344 accuracy on test set :86 % [4, 30] loss: 0.330 [4, 60] loss: 0.327 [4, 90] loss: 0.314 [4, 120] loss: 0.325 [4, 150] loss: 0.322 [4, 180] loss: 0.303 [4, 210] loss: 0.315 accuracy on test set :87 % [5, 30] loss: 0.312 [5, 60] loss: 0.300 [5, 90] loss: 0.289 [5, 120] loss: 0.308 [5, 150] loss: 0.297 [5, 180] loss: 0.295 [5, 210] loss: 0.276 accuracy on test set :87 % [6, 30] loss: 0.295 [6, 60] loss: 0.280 [6, 90] loss: 0.296 [6, 120] loss: 0.273 [6, 150] loss: 0.300 [6, 180] loss: 0.285 [6, 210] loss: 0.267 accuracy on test set :88 % [7, 30] loss: 0.272 [7, 60] loss: 0.270 [7, 90] loss: 0.258 [7, 120] loss: 0.272 [7, 150] loss: 0.265 [7, 180] loss: 0.276 [7, 210] loss: 0.271 accuracy on test set :88 % [8, 30] loss: 0.264 [8, 60] loss: 0.254 [8, 90] loss: 0.255 [8, 120] loss: 0.249 [8, 150] loss: 0.252 [8, 180] loss: 0.253 [8, 210] loss: 0.259 accuracy on test set :88 % [9, 30] loss: 0.239 [9, 60] loss: 0.245 [9, 90] loss: 0.252 [9, 120] loss: 0.245 [9, 150] loss: 0.259 [9, 180] loss: 0.227 [9, 210] loss: 0.247 accuracy on test set :88 % [10, 30] loss: 0.240 [10, 60] loss: 0.232 [10, 90] loss: 0.247 [10, 120] loss: 0.234 [10, 150] loss: 0.236 [10, 180] loss: 0.234 [10, 210] loss: 0.238 accuracy on test set :89 %