PyTorch 2.2 中文官方教程（四）（4）-阿里云开发者社区

PyTorch 2.2 中文官方教程（四）（3）https://developer.aliyun.com/article/1482494

训练模型

现在，让我们编写一个通用的函数来训练一个模型。在这里，我们将说明：

调整学习率
保存最佳模型

在下面，参数scheduler是来自torch.optim.lr_scheduler的 LR 调度器对象。

def train_model(model, criterion, optimizer, scheduler, num_epochs=25):
    since = time.time()
    # Create a temporary directory to save training checkpoints
    with TemporaryDirectory() as tempdir:
        best_model_params_path = os.path.join(tempdir, 'best_model_params.pt')
        torch.save(model.state_dict(), best_model_params_path)
        best_acc = 0.0
        for epoch in range(num_epochs):
            print(f'Epoch {epoch}/{num_epochs  -  1}')
            print('-' * 10)
            # Each epoch has a training and validation phase
            for phase in ['train', 'val']:
                if phase == 'train':
                    model.train()  # Set model to training mode
                else:
                    model.eval()   # Set model to evaluate mode
                running_loss = 0.0
                running_corrects = 0
                # Iterate over data.
                for inputs, labels in dataloaders[phase]:
                    inputs = inputs.to(device)
                    labels = labels.to(device)
                    # zero the parameter gradients
                    optimizer.zero_grad()
                    # forward
                    # track history if only in train
                    with torch.set_grad_enabled(phase == 'train'):
                        outputs = model(inputs)
                        _, preds = torch.max(outputs, 1)
                        loss = criterion(outputs, labels)
                        # backward + optimize only if in training phase
                        if phase == 'train':
                            loss.backward()
                            optimizer.step()
                    # statistics
                    running_loss += loss.item() * inputs.size(0)
                    running_corrects += torch.sum(preds == labels.data)
                if phase == 'train':
                    scheduler.step()
                epoch_loss = running_loss / dataset_sizes[phase]
                epoch_acc = running_corrects.double() / dataset_sizes[phase]
                print(f'{phase} Loss: {epoch_loss:.4f} Acc: {epoch_acc:.4f}')
                # deep copy the model
                if phase == 'val' and epoch_acc > best_acc:
                    best_acc = epoch_acc
                    torch.save(model.state_dict(), best_model_params_path)
            print()
        time_elapsed = time.time() - since
        print(f'Training complete in {time_elapsed  //  60:.0f}m {time_elapsed  %  60:.0f}s')
        print(f'Best val Acc: {best_acc:4f}')
        # load best model weights
        model.load_state_dict(torch.load(best_model_params_path))
    return model

可视化模型预测

用于显示几张图像预测的通用函数

def visualize_model(model, num_images=6):
    was_training = model.training
    model.eval()
    images_so_far = 0
    fig = plt.figure()
    with torch.no_grad():
        for i, (inputs, labels) in enumerate(dataloaders['val']):
            inputs = inputs.to(device)
            labels = labels.to(device)
            outputs = model(inputs)
            _, preds = torch.max(outputs, 1)
            for j in range(inputs.size()[0]):
                images_so_far += 1
                ax = plt.subplot(num_images//2, 2, images_so_far)
                ax.axis('off')
                ax.set_title(f'predicted: {class_names[preds[j]]}')
                imshow(inputs.cpu().data[j])
                if images_so_far == num_images:
                    model.train(mode=was_training)
                    return
        model.train(mode=was_training)

微调卷积网络

加载一个预训练模型并重置最终的全连接层。

model_ft = models.resnet18(weights='IMAGENET1K_V1')
num_ftrs = model_ft.fc.in_features
# Here the size of each output sample is set to 2.
# Alternatively, it can be generalized to ``nn.Linear(num_ftrs, len(class_names))``.
model_ft.fc = nn.Linear(num_ftrs, 2)
model_ft = model_ft.to(device)
criterion = nn.CrossEntropyLoss()
# Observe that all parameters are being optimized
optimizer_ft = optim.SGD(model_ft.parameters(), lr=0.001, momentum=0.9)
# Decay LR by a factor of 0.1 every 7 epochs
exp_lr_scheduler = lr_scheduler.StepLR(optimizer_ft, step_size=7, gamma=0.1)

Downloading: "https://download.pytorch.org/models/resnet18-f37072fd.pth" to /var/lib/jenkins/.cache/torch/hub/checkpoints/resnet18-f37072fd.pth
  0%|          | 0.00/44.7M [00:00<?, ?B/s]
 31%|###       | 13.7M/44.7M [00:00<00:00, 143MB/s]
 62%|######2   | 27.8M/44.7M [00:00<00:00, 146MB/s]
 94%|#########3| 41.9M/44.7M [00:00<00:00, 147MB/s]
100%|##########| 44.7M/44.7M [00:00<00:00, 146MB/s]

训练和评估

在 CPU 上应该需要大约 15-25 分钟。但在 GPU 上，不到一分钟。

model_ft = train_model(model_ft, criterion, optimizer_ft, exp_lr_scheduler,
                       num_epochs=25)

Epoch 0/24
----------
train Loss: 0.4785 Acc: 0.7582
val Loss: 0.2864 Acc: 0.8758
Epoch 1/24
----------
train Loss: 0.5262 Acc: 0.8074
val Loss: 0.5643 Acc: 0.7778
Epoch 2/24
----------
train Loss: 0.4336 Acc: 0.8156
val Loss: 0.2852 Acc: 0.9020
Epoch 3/24
----------
train Loss: 0.6358 Acc: 0.7582
val Loss: 0.4226 Acc: 0.8627
Epoch 4/24
----------
train Loss: 0.4319 Acc: 0.8525
val Loss: 0.3289 Acc: 0.8824
Epoch 5/24
----------
train Loss: 0.4856 Acc: 0.7869
val Loss: 0.3162 Acc: 0.8758
Epoch 6/24
----------
train Loss: 0.3984 Acc: 0.8197
val Loss: 0.4864 Acc: 0.8235
Epoch 7/24
----------
train Loss: 0.3621 Acc: 0.8238
val Loss: 0.2516 Acc: 0.8889
Epoch 8/24
----------
train Loss: 0.2331 Acc: 0.9016
val Loss: 0.2395 Acc: 0.9085
Epoch 9/24
----------
train Loss: 0.2571 Acc: 0.9016
val Loss: 0.2579 Acc: 0.9281
Epoch 10/24
----------
train Loss: 0.3528 Acc: 0.8320
val Loss: 0.2281 Acc: 0.9150
Epoch 11/24
----------
train Loss: 0.3108 Acc: 0.8320
val Loss: 0.2832 Acc: 0.9020
Epoch 12/24
----------
train Loss: 0.2189 Acc: 0.8975
val Loss: 0.2734 Acc: 0.8824
Epoch 13/24
----------
train Loss: 0.2872 Acc: 0.8648
val Loss: 0.2274 Acc: 0.9281
Epoch 14/24
----------
train Loss: 0.2745 Acc: 0.8689
val Loss: 0.2712 Acc: 0.8954
Epoch 15/24
----------
train Loss: 0.3152 Acc: 0.8689
val Loss: 0.3225 Acc: 0.8954
Epoch 16/24
----------
train Loss: 0.2069 Acc: 0.9016
val Loss: 0.2486 Acc: 0.9085
Epoch 17/24
----------
train Loss: 0.2447 Acc: 0.9016
val Loss: 0.2282 Acc: 0.9281
Epoch 18/24
----------
train Loss: 0.2709 Acc: 0.8811
val Loss: 0.2590 Acc: 0.9020
Epoch 19/24
----------
train Loss: 0.1959 Acc: 0.9139
val Loss: 0.2282 Acc: 0.9150
Epoch 20/24
----------
train Loss: 0.2432 Acc: 0.8852
val Loss: 0.2623 Acc: 0.9150
Epoch 21/24
----------
train Loss: 0.2643 Acc: 0.8770
val Loss: 0.2776 Acc: 0.9150
Epoch 22/24
----------
train Loss: 0.2973 Acc: 0.8770
val Loss: 0.2362 Acc: 0.9020
Epoch 23/24
----------
train Loss: 0.2859 Acc: 0.8648
val Loss: 0.2551 Acc: 0.9085
Epoch 24/24
----------
train Loss: 0.3264 Acc: 0.8811
val Loss: 0.2317 Acc: 0.9150
Training complete in 1m 3s
Best val Acc: 0.928105

visualize_model(model_ft)

卷积网络作为固定特征提取器

在这里，我们需要冻结除最后一层之外的所有网络。我们需要将requires_grad = False设置为冻结参数，以便在backward()中不计算梯度。

您可以在文档中阅读更多信息这里。

model_conv = torchvision.models.resnet18(weights='IMAGENET1K_V1')
for param in model_conv.parameters():
    param.requires_grad = False
# Parameters of newly constructed modules have requires_grad=True by default
num_ftrs = model_conv.fc.in_features
model_conv.fc = nn.Linear(num_ftrs, 2)
model_conv = model_conv.to(device)
criterion = nn.CrossEntropyLoss()
# Observe that only parameters of final layer are being optimized as
# opposed to before.
optimizer_conv = optim.SGD(model_conv.fc.parameters(), lr=0.001, momentum=0.9)
# Decay LR by a factor of 0.1 every 7 epochs
exp_lr_scheduler = lr_scheduler.StepLR(optimizer_conv, step_size=7, gamma=0.1)

训练和评估

在 CPU 上，这将比以前的情况快大约一半的时间。这是预期的，因为大部分网络不需要计算梯度。然而，前向计算是需要的。

model_conv = train_model(model_conv, criterion, optimizer_conv,
                         exp_lr_scheduler, num_epochs=25)

Epoch 0/24
----------
train Loss: 0.6996 Acc: 0.6516
val Loss: 0.2014 Acc: 0.9346
Epoch 1/24
----------
train Loss: 0.4233 Acc: 0.8033
val Loss: 0.2656 Acc: 0.8758
Epoch 2/24
----------
train Loss: 0.4603 Acc: 0.7869
val Loss: 0.1847 Acc: 0.9477
Epoch 3/24
----------
train Loss: 0.3096 Acc: 0.8566
val Loss: 0.1747 Acc: 0.9477
Epoch 4/24
----------
train Loss: 0.4427 Acc: 0.8156
val Loss: 0.1630 Acc: 0.9477
Epoch 5/24
----------
train Loss: 0.5505 Acc: 0.7828
val Loss: 0.1643 Acc: 0.9477
Epoch 6/24
----------
train Loss: 0.3004 Acc: 0.8607
val Loss: 0.1744 Acc: 0.9542
Epoch 7/24
----------
train Loss: 0.4083 Acc: 0.8361
val Loss: 0.1892 Acc: 0.9412
Epoch 8/24
----------
train Loss: 0.4483 Acc: 0.7910
val Loss: 0.1984 Acc: 0.9477
Epoch 9/24
----------
train Loss: 0.3335 Acc: 0.8279
val Loss: 0.1942 Acc: 0.9412
Epoch 10/24
----------
train Loss: 0.2413 Acc: 0.8934
val Loss: 0.2001 Acc: 0.9477
Epoch 11/24
----------
train Loss: 0.3107 Acc: 0.8689
val Loss: 0.1801 Acc: 0.9412
Epoch 12/24
----------
train Loss: 0.3032 Acc: 0.8689
val Loss: 0.1669 Acc: 0.9477
Epoch 13/24
----------
train Loss: 0.3587 Acc: 0.8525
val Loss: 0.1900 Acc: 0.9477
Epoch 14/24
----------
train Loss: 0.2771 Acc: 0.8893
val Loss: 0.2317 Acc: 0.9216
Epoch 15/24
----------
train Loss: 0.3064 Acc: 0.8852
val Loss: 0.1909 Acc: 0.9477
Epoch 16/24
----------
train Loss: 0.4243 Acc: 0.8238
val Loss: 0.2227 Acc: 0.9346
Epoch 17/24
----------
train Loss: 0.3297 Acc: 0.8238
val Loss: 0.1916 Acc: 0.9412
Epoch 18/24
----------
train Loss: 0.4235 Acc: 0.8238
val Loss: 0.1766 Acc: 0.9477
Epoch 19/24
----------
train Loss: 0.2500 Acc: 0.8934
val Loss: 0.2003 Acc: 0.9477
Epoch 20/24
----------
train Loss: 0.2413 Acc: 0.8934
val Loss: 0.1821 Acc: 0.9477
Epoch 21/24
----------
train Loss: 0.3762 Acc: 0.8115
val Loss: 0.1842 Acc: 0.9412
Epoch 22/24
----------
train Loss: 0.3485 Acc: 0.8566
val Loss: 0.2166 Acc: 0.9281
Epoch 23/24
----------
train Loss: 0.3625 Acc: 0.8361
val Loss: 0.1747 Acc: 0.9412
Epoch 24/24
----------
train Loss: 0.3840 Acc: 0.8320
val Loss: 0.1768 Acc: 0.9412
Training complete in 0m 31s
Best val Acc: 0.954248

visualize_model(model_conv)
plt.ioff()
plt.show()

对自定义图像进行推断

使用训练好的模型对自定义图像进行预测并可视化预测的类标签以及图像。

def visualize_model_predictions(model,img_path):
    was_training = model.training
    model.eval()
    img = Image.open(img_path)
    img = data_transforms'val'
    img = img.unsqueeze(0)
    img = img.to(device)
    with torch.no_grad():
        outputs = model(img)
        _, preds = torch.max(outputs, 1)
        ax = plt.subplot(2,2,1)
        ax.axis('off')
        ax.set_title(f'Predicted: {class_names[preds[0]]}')
        imshow(img.cpu().data[0])
        model.train(mode=was_training)

visualize_model_predictions(
    model_conv,
    img_path='data/hymenoptera_data/val/bees/72100438_73de9f17af.jpg'
)
plt.ioff()
plt.show()

进一步学习

如果您想了解更多关于迁移学习应用的信息，请查看我们的计算机视觉迁移学习量化教程。

脚本的总运行时间：（1 分钟 36.689 秒）

下载 Python 源代码：transfer_learning_tutorial.py

下载 Jupyter 笔记本：transfer_learning_tutorial.ipynb

Sphinx-Gallery 生成的图库

PyTorch 2.2 中文官方教程（四）（4）

训练模型

可视化模型预测

微调卷积网络

训练和评估

卷积网络作为固定特征提取器

训练和评估

对自定义图像进行推断

进一步学习

热门文章

最新文章

相关电子书

推荐镜像

探索云世界

热门

云计算

大数据

云原生

人工智能

数据库

开发与运维

活动广场

任务中心

训练营

直播

乘风者计划

下载

镜像站

技术资料

PyTorch 2.2 中文官方教程（四）（4）

训练模型

可视化模型预测

微调卷积网络

训练和评估

卷积网络作为固定特征提取器

训练和评估

对自定义图像进行推断

进一步学习

热门文章

最新文章

相关电子书

推荐镜像