Pytorch 基于VGG-16的服饰识别（使用Fashion-MNIST数据集）

2022-11-08 674

版权

本文内容由阿里云实名注册用户自发贡献，版权归原作者所有，阿里云开发者社区不拥有其著作权，亦不承担相应法律责任。具体规则请查看《阿里云开发者社区用户服务协议》和《阿里云开发者社区知识产权保护指引》。如果您发现本社区中有涉嫌抄袭的内容，填写侵权投诉表单进行举报，一经查实，本社区将立刻删除涉嫌侵权内容。

简介： Pytorch 基于VGG-16的服饰识别（使用Fashion-MNIST数据集）

✅作者简介：人工智能专业本科在读，喜欢计算机与编程，写博客记录自己的学习历程。
🍎个人主页：小嗷犬的博客
🍊个人信条：为天地立心，为生民立命，为往圣继绝学，为万世开太平。
🥭本文内容：Pytorch 基于VGG的服饰识别（使用Fashion-MNIST数据集）
更多内容请见👇

Python sklearn实现K-means鸢尾花聚类

Pytorch 基于LeNet的手写数字识别

Pytorch 基于AlexNet的服饰识别（使用Fashion-MNIST数据集）

介绍

使用到的库：

Pytorch

matplotlib

d2l

d2l 为斯坦福大学李沐教授打包的一个库，其中包含一些深度学习中常用的函数方法。

安装：

pip install matplotlib
pip install d2l

Pytorch 环境请自行配置。
数据集：
Fashion-MNIST 是一个替代 MNIST 手写数字集的图像数据集。它是由 Zalando（一家德国的时尚科技公司）旗下的研究部门提供。其涵盖了来自 10 种类别的共 7 万个不同商品的正面图片。
Fashion-MNIST 的大小、格式和训练集/测试集划分与原始的 MNIST 完全一致。60000/10000 的训练测试数据划分，28x28 的灰度图片。你可以直接用它来测试你的机器学习和深度学习算法性能，且不需要改动任何的代码。

下载地址：
本文使用 Pytorch 自动下载。
VGG-16 网络是14年牛津大学计算机视觉组和 Google DeepMind 公司研究员一起研发的深度网络模型。该网络一共有16个训练参数的网络，它的兄弟版本如下图所示，清晰的展示了每一级别的参数量，从11层的网络一直到19层的网络。VGG-16 网络取得了 ILSVRC 2014 比赛分类项目的第2名，定位项目的第1名。VGGNet 网络结构简洁，迁移到其他图片数据上的泛化性能非常好。VGGNet 现在依然经常被用来提取图像特征，该网络训练后的模型参数在其官网上开源了，可以用来在图像分类任务上进行在训练，即：提供了非常好的初始化权重，使用较为广泛。结构图如下：

1.导入相关库

import torch
from torch import nn
from torchvision import datasets, transforms
from torch.utils.data import DataLoader
import matplotlib.pyplot as plt
from d2l import torch as d2l

2.定义 VGG-16 网络结构

# 定义VGG块
def vgg_block(num_convs, in_channels, out_channels):
    """
    Args:
        num_convs (int): 卷积层的数量
        in_channels (int): 输入通道的数量
        out_channels (int): 输出通道的数量
    """
    layers = []
    for _ in range(num_convs):
        layers.append(nn.Conv2d(in_channels, out_channels,
                                kernel_size=3, padding=1))
        layers.append(nn.ReLU())
        in_channels = out_channels
    layers.append(nn.MaxPool2d(kernel_size=2, stride=2))
    return nn.Sequential(*layers)

# 定义VGG网络
def vgg(conv_arch):
    """
    Args:
        conv_arch (tuple): 每个VGG块里卷积层个数和输出通道数
    """
    conv_blks = []
    in_channels = 1
    # 卷积层部分
    for (num_convs, out_channels) in conv_arch:
        conv_blks.append(vgg_block(num_convs, in_channels, out_channels))
        in_channels = out_channels

    return nn.Sequential(
        *conv_blks, nn.Flatten(),
        # 全连接层部分
        nn.Linear(out_channels * 7 * 7, 4096), nn.ReLU(), nn.Dropout(0.5),
        nn.Linear(4096, 4096), nn.ReLU(), nn.Dropout(0.5),
        nn.Linear(4096, 10))

# 生成网络
conv_arch = ((2, 16), (2, 32), (3, 64), (3, 128), (3, 128))
net = vgg(conv_arch)

这里由于设备限制，笔者将各层网络的输出通道数将为了原来的 1/4 ，可以满足 Fashion-MNIST 的任务，降低了网络的复杂度。

3.下载并配置数据集和加载器

由于 VGG-16 是为处理 ImageNet 数据集设计的，所以输入图片尺寸应为 224*224，这里我们将 28*28 的 Fashion-MNIST 图片拉大到 224*224。

# 下载并配置数据集
trans = [transforms.ToTensor()]
trans.insert(0, transforms.Resize(224))
trans = transforms.Compose(trans)
train_dataset = datasets.FashionMNIST(root='./dataset', train=True,
                                      transform=trans, download=True)
test_dataset = datasets.FashionMNIST(root='./dataset', train=False,
                                     transform=trans, download=True)

# 配置数据加载器
batch_size = 64
train_loader = DataLoader(dataset=train_dataset,
                          batch_size=batch_size, shuffle=True)
test_loader = DataLoader(dataset=test_dataset,
                         batch_size=batch_size, shuffle=True)

4.定义训练函数

训练完成后会保存模型，可以修改模型的保存路径。

def train(net, train_iter, test_iter, epochs, lr, device):
    def init_weights(m):
        if type(m) == nn.Linear or type(m) == nn.Conv2d:
            nn.init.xavier_uniform_(m.weight)
    net.apply(init_weights)
    print(f'Training on:[{device}]')
    net.to(device)
    optimizer = torch.optim.SGD(net.parameters(), lr=lr)
    loss = nn.CrossEntropyLoss()
    timer, num_batches = d2l.Timer(), len(train_iter)
    for epoch in range(epochs):
        # 训练损失之和，训练准确率之和，样本数
        metric = d2l.Accumulator(3)
        net.train()
        for i, (X, y) in enumerate(train_iter):
            timer.start()
            optimizer.zero_grad()
            X, y = X.to(device), y.to(device)
            y_hat = net(X)
            l = loss(y_hat, y)
            l.backward()
            optimizer.step()
            with torch.no_grad():
                metric.add(l * X.shape[0], d2l.accuracy(y_hat, y), X.shape[0])
            timer.stop()
            train_l = metric[0] / metric[2]
            train_acc = metric[1] / metric[2]
            if (i + 1) % (num_batches // 30) == 0 or i == num_batches - 1:
                print(f'Epoch: {epoch+1}, Step: {i+1}, Loss: {train_l:.4f}')
        test_acc = d2l.evaluate_accuracy_gpu(net, test_iter)
        print(
            f'Train Accuracy: {train_acc*100:.2f}%, Test Accuracy: {test_acc*100:.2f}%')
    print(f'{metric[2] * epochs / timer.sum():.1f} examples/sec '
          f'on: [{str(device)}]')
    torch.save(net.state_dict(),
               f"./model/VGG-16_Epoch{epochs}_Accuracy{test_acc*100:.2f}%.pth")

5.训练模型（或加载模型）

如果环境正确配置了CUDA，则会由GPU进行训练。
加载模型需要根据自身情况修改路径。

epochs, lr = 10, 0.05
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
# 训练模型
train(net, train_loader, test_loader, epochs, lr, device)
# 加载保存的模型
# net.load_state_dict(torch.load("./model/VGG-16_Epoch10_Accuracy92.08%.pth"))

6.可视化展示

def show_predict():
    # 预测结果图像可视化
    net.to(device)
    loader = DataLoader(dataset=test_dataset, batch_size=1, shuffle=True)
    plt.figure(figsize=(12, 8))
    name = ['T-shirt', 'Trouser', 'Pullover', 'Dress', 'Coat',
            'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot']
    for i in range(9):
        (images, labels) = next(iter(loader))
        images = images.to(device)
        labels = labels.to(device)
        outputs = net(images)
        _, predicted = torch.max(outputs.data, 1)
        title = f"Predicted: {name[int(predicted[0])]}, True: {name[int(labels[0])]}"
        plt.subplot(3, 3, i + 1)
        plt.imshow(images.cpu()[0].squeeze())
        plt.title(title)
        plt.xticks([])
        plt.yticks([])
    plt.show()


show_predict()

7.预测图

结果来自训练轮数 epochs=10，准确率 Accuracy=92.08%的 VGG-16 模型：

Pytorch 基于VGG-16的服饰识别（使用Fashion-MNIST数据集）

介绍

1.导入相关库

2.定义 VGG-16 网络结构

3.下载并配置数据集和加载器

4.定义训练函数

5.训练模型（或加载模型）

6.可视化展示

7.预测图

热门文章

最新文章

相关电子书

推荐镜像

探索云世界

热门

云计算

大数据

云原生

人工智能

数据库

开发与运维

活动广场

任务中心

训练营

直播

乘风者计划

下载

镜像站

技术资料

Pytorch 基于VGG-16的服饰识别（使用Fashion-MNIST数据集）

介绍

1.导入相关库

2.定义 VGG-16 网络结构

3.下载并配置数据集和加载器

4.定义训练函数

5.训练模型（或加载模型）

6.可视化展示

7.预测图

热门文章

最新文章

相关电子书

推荐镜像