PyTorch 2.2 中文官方教程(四)(4)

本文涉及的产品
函数计算FC,每月15万CU 3个月
简介: PyTorch 2.2 中文官方教程(四)

PyTorch 2.2 中文官方教程(四)(3)https://developer.aliyun.com/article/1482494


训练模型

现在,让我们编写一个通用的函数来训练一个模型。在这里,我们将说明:

  • 调整学习率
  • 保存最佳模型

在下面,参数scheduler是来自torch.optim.lr_scheduler的 LR 调度器对象。

def train_model(model, criterion, optimizer, scheduler, num_epochs=25):
    since = time.time()
    # Create a temporary directory to save training checkpoints
    with TemporaryDirectory() as tempdir:
        best_model_params_path = os.path.join(tempdir, 'best_model_params.pt')
        torch.save(model.state_dict(), best_model_params_path)
        best_acc = 0.0
        for epoch in range(num_epochs):
            print(f'Epoch {epoch}/{num_epochs  -  1}')
            print('-' * 10)
            # Each epoch has a training and validation phase
            for phase in ['train', 'val']:
                if phase == 'train':
                    model.train()  # Set model to training mode
                else:
                    model.eval()   # Set model to evaluate mode
                running_loss = 0.0
                running_corrects = 0
                # Iterate over data.
                for inputs, labels in dataloaders[phase]:
                    inputs = inputs.to(device)
                    labels = labels.to(device)
                    # zero the parameter gradients
                    optimizer.zero_grad()
                    # forward
                    # track history if only in train
                    with torch.set_grad_enabled(phase == 'train'):
                        outputs = model(inputs)
                        _, preds = torch.max(outputs, 1)
                        loss = criterion(outputs, labels)
                        # backward + optimize only if in training phase
                        if phase == 'train':
                            loss.backward()
                            optimizer.step()
                    # statistics
                    running_loss += loss.item() * inputs.size(0)
                    running_corrects += torch.sum(preds == labels.data)
                if phase == 'train':
                    scheduler.step()
                epoch_loss = running_loss / dataset_sizes[phase]
                epoch_acc = running_corrects.double() / dataset_sizes[phase]
                print(f'{phase} Loss: {epoch_loss:.4f} Acc: {epoch_acc:.4f}')
                # deep copy the model
                if phase == 'val' and epoch_acc > best_acc:
                    best_acc = epoch_acc
                    torch.save(model.state_dict(), best_model_params_path)
            print()
        time_elapsed = time.time() - since
        print(f'Training complete in {time_elapsed  //  60:.0f}m {time_elapsed  %  60:.0f}s')
        print(f'Best val Acc: {best_acc:4f}')
        # load best model weights
        model.load_state_dict(torch.load(best_model_params_path))
    return model 

可视化模型预测

用于显示几张图像预测的通用函数

def visualize_model(model, num_images=6):
    was_training = model.training
    model.eval()
    images_so_far = 0
    fig = plt.figure()
    with torch.no_grad():
        for i, (inputs, labels) in enumerate(dataloaders['val']):
            inputs = inputs.to(device)
            labels = labels.to(device)
            outputs = model(inputs)
            _, preds = torch.max(outputs, 1)
            for j in range(inputs.size()[0]):
                images_so_far += 1
                ax = plt.subplot(num_images//2, 2, images_so_far)
                ax.axis('off')
                ax.set_title(f'predicted: {class_names[preds[j]]}')
                imshow(inputs.cpu().data[j])
                if images_so_far == num_images:
                    model.train(mode=was_training)
                    return
        model.train(mode=was_training) 

微调卷积网络

加载一个预训练模型并重置最终的全连接层。

model_ft = models.resnet18(weights='IMAGENET1K_V1')
num_ftrs = model_ft.fc.in_features
# Here the size of each output sample is set to 2.
# Alternatively, it can be generalized to ``nn.Linear(num_ftrs, len(class_names))``.
model_ft.fc = nn.Linear(num_ftrs, 2)
model_ft = model_ft.to(device)
criterion = nn.CrossEntropyLoss()
# Observe that all parameters are being optimized
optimizer_ft = optim.SGD(model_ft.parameters(), lr=0.001, momentum=0.9)
# Decay LR by a factor of 0.1 every 7 epochs
exp_lr_scheduler = lr_scheduler.StepLR(optimizer_ft, step_size=7, gamma=0.1) 
Downloading: "https://download.pytorch.org/models/resnet18-f37072fd.pth" to /var/lib/jenkins/.cache/torch/hub/checkpoints/resnet18-f37072fd.pth
  0%|          | 0.00/44.7M [00:00<?, ?B/s]
 31%|###       | 13.7M/44.7M [00:00<00:00, 143MB/s]
 62%|######2   | 27.8M/44.7M [00:00<00:00, 146MB/s]
 94%|#########3| 41.9M/44.7M [00:00<00:00, 147MB/s]
100%|##########| 44.7M/44.7M [00:00<00:00, 146MB/s] 

训练和评估

在 CPU 上应该需要大约 15-25 分钟。但在 GPU 上,不到一分钟。

model_ft = train_model(model_ft, criterion, optimizer_ft, exp_lr_scheduler,
                       num_epochs=25) 
Epoch 0/24
----------
train Loss: 0.4785 Acc: 0.7582
val Loss: 0.2864 Acc: 0.8758
Epoch 1/24
----------
train Loss: 0.5262 Acc: 0.8074
val Loss: 0.5643 Acc: 0.7778
Epoch 2/24
----------
train Loss: 0.4336 Acc: 0.8156
val Loss: 0.2852 Acc: 0.9020
Epoch 3/24
----------
train Loss: 0.6358 Acc: 0.7582
val Loss: 0.4226 Acc: 0.8627
Epoch 4/24
----------
train Loss: 0.4319 Acc: 0.8525
val Loss: 0.3289 Acc: 0.8824
Epoch 5/24
----------
train Loss: 0.4856 Acc: 0.7869
val Loss: 0.3162 Acc: 0.8758
Epoch 6/24
----------
train Loss: 0.3984 Acc: 0.8197
val Loss: 0.4864 Acc: 0.8235
Epoch 7/24
----------
train Loss: 0.3621 Acc: 0.8238
val Loss: 0.2516 Acc: 0.8889
Epoch 8/24
----------
train Loss: 0.2331 Acc: 0.9016
val Loss: 0.2395 Acc: 0.9085
Epoch 9/24
----------
train Loss: 0.2571 Acc: 0.9016
val Loss: 0.2579 Acc: 0.9281
Epoch 10/24
----------
train Loss: 0.3528 Acc: 0.8320
val Loss: 0.2281 Acc: 0.9150
Epoch 11/24
----------
train Loss: 0.3108 Acc: 0.8320
val Loss: 0.2832 Acc: 0.9020
Epoch 12/24
----------
train Loss: 0.2189 Acc: 0.8975
val Loss: 0.2734 Acc: 0.8824
Epoch 13/24
----------
train Loss: 0.2872 Acc: 0.8648
val Loss: 0.2274 Acc: 0.9281
Epoch 14/24
----------
train Loss: 0.2745 Acc: 0.8689
val Loss: 0.2712 Acc: 0.8954
Epoch 15/24
----------
train Loss: 0.3152 Acc: 0.8689
val Loss: 0.3225 Acc: 0.8954
Epoch 16/24
----------
train Loss: 0.2069 Acc: 0.9016
val Loss: 0.2486 Acc: 0.9085
Epoch 17/24
----------
train Loss: 0.2447 Acc: 0.9016
val Loss: 0.2282 Acc: 0.9281
Epoch 18/24
----------
train Loss: 0.2709 Acc: 0.8811
val Loss: 0.2590 Acc: 0.9020
Epoch 19/24
----------
train Loss: 0.1959 Acc: 0.9139
val Loss: 0.2282 Acc: 0.9150
Epoch 20/24
----------
train Loss: 0.2432 Acc: 0.8852
val Loss: 0.2623 Acc: 0.9150
Epoch 21/24
----------
train Loss: 0.2643 Acc: 0.8770
val Loss: 0.2776 Acc: 0.9150
Epoch 22/24
----------
train Loss: 0.2973 Acc: 0.8770
val Loss: 0.2362 Acc: 0.9020
Epoch 23/24
----------
train Loss: 0.2859 Acc: 0.8648
val Loss: 0.2551 Acc: 0.9085
Epoch 24/24
----------
train Loss: 0.3264 Acc: 0.8811
val Loss: 0.2317 Acc: 0.9150
Training complete in 1m 3s
Best val Acc: 0.928105 
visualize_model(model_ft) 

卷积网络作为固定特征提取器

在这里,我们需要冻结除最后一层之外的所有网络。我们需要将requires_grad = False设置为冻结参数,以便在backward()中不计算梯度。

您可以在文档中阅读更多信息这里

model_conv = torchvision.models.resnet18(weights='IMAGENET1K_V1')
for param in model_conv.parameters():
    param.requires_grad = False
# Parameters of newly constructed modules have requires_grad=True by default
num_ftrs = model_conv.fc.in_features
model_conv.fc = nn.Linear(num_ftrs, 2)
model_conv = model_conv.to(device)
criterion = nn.CrossEntropyLoss()
# Observe that only parameters of final layer are being optimized as
# opposed to before.
optimizer_conv = optim.SGD(model_conv.fc.parameters(), lr=0.001, momentum=0.9)
# Decay LR by a factor of 0.1 every 7 epochs
exp_lr_scheduler = lr_scheduler.StepLR(optimizer_conv, step_size=7, gamma=0.1) 

训练和评估

在 CPU 上,这将比以前的情况快大约一半的时间。这是预期的,因为大部分网络不需要计算梯度。然而,前向计算是需要的。

model_conv = train_model(model_conv, criterion, optimizer_conv,
                         exp_lr_scheduler, num_epochs=25) 
Epoch 0/24
----------
train Loss: 0.6996 Acc: 0.6516
val Loss: 0.2014 Acc: 0.9346
Epoch 1/24
----------
train Loss: 0.4233 Acc: 0.8033
val Loss: 0.2656 Acc: 0.8758
Epoch 2/24
----------
train Loss: 0.4603 Acc: 0.7869
val Loss: 0.1847 Acc: 0.9477
Epoch 3/24
----------
train Loss: 0.3096 Acc: 0.8566
val Loss: 0.1747 Acc: 0.9477
Epoch 4/24
----------
train Loss: 0.4427 Acc: 0.8156
val Loss: 0.1630 Acc: 0.9477
Epoch 5/24
----------
train Loss: 0.5505 Acc: 0.7828
val Loss: 0.1643 Acc: 0.9477
Epoch 6/24
----------
train Loss: 0.3004 Acc: 0.8607
val Loss: 0.1744 Acc: 0.9542
Epoch 7/24
----------
train Loss: 0.4083 Acc: 0.8361
val Loss: 0.1892 Acc: 0.9412
Epoch 8/24
----------
train Loss: 0.4483 Acc: 0.7910
val Loss: 0.1984 Acc: 0.9477
Epoch 9/24
----------
train Loss: 0.3335 Acc: 0.8279
val Loss: 0.1942 Acc: 0.9412
Epoch 10/24
----------
train Loss: 0.2413 Acc: 0.8934
val Loss: 0.2001 Acc: 0.9477
Epoch 11/24
----------
train Loss: 0.3107 Acc: 0.8689
val Loss: 0.1801 Acc: 0.9412
Epoch 12/24
----------
train Loss: 0.3032 Acc: 0.8689
val Loss: 0.1669 Acc: 0.9477
Epoch 13/24
----------
train Loss: 0.3587 Acc: 0.8525
val Loss: 0.1900 Acc: 0.9477
Epoch 14/24
----------
train Loss: 0.2771 Acc: 0.8893
val Loss: 0.2317 Acc: 0.9216
Epoch 15/24
----------
train Loss: 0.3064 Acc: 0.8852
val Loss: 0.1909 Acc: 0.9477
Epoch 16/24
----------
train Loss: 0.4243 Acc: 0.8238
val Loss: 0.2227 Acc: 0.9346
Epoch 17/24
----------
train Loss: 0.3297 Acc: 0.8238
val Loss: 0.1916 Acc: 0.9412
Epoch 18/24
----------
train Loss: 0.4235 Acc: 0.8238
val Loss: 0.1766 Acc: 0.9477
Epoch 19/24
----------
train Loss: 0.2500 Acc: 0.8934
val Loss: 0.2003 Acc: 0.9477
Epoch 20/24
----------
train Loss: 0.2413 Acc: 0.8934
val Loss: 0.1821 Acc: 0.9477
Epoch 21/24
----------
train Loss: 0.3762 Acc: 0.8115
val Loss: 0.1842 Acc: 0.9412
Epoch 22/24
----------
train Loss: 0.3485 Acc: 0.8566
val Loss: 0.2166 Acc: 0.9281
Epoch 23/24
----------
train Loss: 0.3625 Acc: 0.8361
val Loss: 0.1747 Acc: 0.9412
Epoch 24/24
----------
train Loss: 0.3840 Acc: 0.8320
val Loss: 0.1768 Acc: 0.9412
Training complete in 0m 31s
Best val Acc: 0.954248 
visualize_model(model_conv)
plt.ioff()
plt.show() 

对自定义图像进行推断

使用训练好的模型对自定义图像进行预测并可视化预测的类标签以及图像。

def visualize_model_predictions(model,img_path):
    was_training = model.training
    model.eval()
    img = Image.open(img_path)
    img = data_transforms'val'
    img = img.unsqueeze(0)
    img = img.to(device)
    with torch.no_grad():
        outputs = model(img)
        _, preds = torch.max(outputs, 1)
        ax = plt.subplot(2,2,1)
        ax.axis('off')
        ax.set_title(f'Predicted: {class_names[preds[0]]}')
        imshow(img.cpu().data[0])
        model.train(mode=was_training) 
visualize_model_predictions(
    model_conv,
    img_path='data/hymenoptera_data/val/bees/72100438_73de9f17af.jpg'
)
plt.ioff()
plt.show() 

进一步学习

如果您想了解更多关于迁移学习应用的信息,请查看我们的计算机视觉迁移学习量化教程

脚本的总运行时间:(1 分钟 36.689 秒)

下载 Python 源代码:transfer_learning_tutorial.py

下载 Jupyter 笔记本:transfer_learning_tutorial.ipynb

Sphinx-Gallery 生成的图库

相关实践学习
【AI破次元壁合照】少年白马醉春风,函数计算一键部署AI绘画平台
本次实验基于阿里云函数计算产品能力开发AI绘画平台,可让您实现“破次元壁”与角色合照,为角色换背景效果,用AI绘图技术绘出属于自己的少年江湖。
从 0 入门函数计算
在函数计算的架构中,开发者只需要编写业务代码,并监控业务运行情况就可以了。这将开发者从繁重的运维工作中解放出来,将精力投入到更有意义的开发任务上。
相关文章
|
11月前
|
存储 物联网 PyTorch
基于PyTorch的大语言模型微调指南:Torchtune完整教程与代码示例
**Torchtune**是由PyTorch团队开发的一个专门用于LLM微调的库。它旨在简化LLM的微调流程,提供了一系列高级API和预置的最佳实践
595 59
基于PyTorch的大语言模型微调指南:Torchtune完整教程与代码示例
|
11月前
|
并行计算 监控 搜索推荐
使用 PyTorch-BigGraph 构建和部署大规模图嵌入的完整教程
当处理大规模图数据时,复杂性难以避免。PyTorch-BigGraph (PBG) 是一款专为此设计的工具,能够高效处理数十亿节点和边的图数据。PBG通过多GPU或节点无缝扩展,利用高效的分区技术,生成准确的嵌入表示,适用于社交网络、推荐系统和知识图谱等领域。本文详细介绍PBG的设置、训练和优化方法,涵盖环境配置、数据准备、模型训练、性能优化和实际应用案例,帮助读者高效处理大规模图数据。
230 5
|
并行计算 Ubuntu PyTorch
Ubuntu下CUDA、Conda、Pytorch联合教程
本文是一份Ubuntu系统下安装和配置CUDA、Conda和Pytorch的教程,涵盖了查看显卡驱动、下载安装CUDA、添加环境变量、卸载CUDA、Anaconda的下载安装、环境管理以及Pytorch的安装和验证等步骤。
2841 1
Ubuntu下CUDA、Conda、Pytorch联合教程
|
PyTorch 算法框架/工具 异构计算
PyTorch 2.2 中文官方教程(十九)(1)
PyTorch 2.2 中文官方教程(十九)
251 1
PyTorch 2.2 中文官方教程(十九)(1)
|
机器学习/深度学习 PyTorch 算法框架/工具
PyTorch 2.2 中文官方教程(十八)(4)
PyTorch 2.2 中文官方教程(十八)
219 1
|
PyTorch 算法框架/工具 异构计算
PyTorch 2.2 中文官方教程(二十)(4)
PyTorch 2.2 中文官方教程(二十)
305 0
PyTorch 2.2 中文官方教程(二十)(4)
|
Android开发 PyTorch 算法框架/工具
PyTorch 2.2 中文官方教程(二十)(2)
PyTorch 2.2 中文官方教程(二十)
277 0
PyTorch 2.2 中文官方教程(二十)(2)
|
iOS开发 PyTorch 算法框架/工具
PyTorch 2.2 中文官方教程(二十)(1)
PyTorch 2.2 中文官方教程(二十)
232 0
PyTorch 2.2 中文官方教程(二十)(1)
|
PyTorch 算法框架/工具 异构计算
PyTorch 2.2 中文官方教程(十九)(3)
PyTorch 2.2 中文官方教程(十九)
260 0
PyTorch 2.2 中文官方教程(十九)(3)
|
异构计算 PyTorch 算法框架/工具
PyTorch 2.2 中文官方教程(十九)(2)
PyTorch 2.2 中文官方教程(十九)
229 0
PyTorch 2.2 中文官方教程(十九)(2)

推荐镜像

更多