PyTorch 2.2 中文官方教程(一)(3)https://developer.aliyun.com/article/1482479
优化模型参数
原文:
pytorch.org/tutorials/beginner/basics/optimization_tutorial.html
译者:飞龙
注意
点击这里下载完整示例代码
学习基础知识 || 快速入门 || 张量 || 数据集和数据加载器 || 转换 || 构建模型 || 自动求导 || 优化 || 保存和加载模型
现在我们有了模型和数据,是时候通过优化其参数在数据上训练、验证和测试我们的模型了。训练模型是一个迭代过程;在每次迭代中,模型对输出进行猜测,计算其猜测的错误(损失),收集关于其参数的错误的导数(正如我们在上一节中看到的),并使用梯度下降优化这些参数。要了解此过程的更详细步骤,请查看这个关于3Blue1Brown 的反向传播视频。
先决条件代码
我们加载了前几节关于数据集和数据加载器和构建模型的代码。
import torch from torch import nn from torch.utils.data import DataLoader from torchvision import datasets from torchvision.transforms import ToTensor training_data = datasets.FashionMNIST( root="data", train=True, download=True, transform=ToTensor() ) test_data = datasets.FashionMNIST( root="data", train=False, download=True, transform=ToTensor() ) train_dataloader = DataLoader(training_data, batch_size=64) test_dataloader = DataLoader(test_data, batch_size=64) class NeuralNetwork(nn.Module): def __init__(self): super().__init__() self.flatten = nn.Flatten() self.linear_relu_stack = nn.Sequential( nn.Linear(28*28, 512), nn.ReLU(), nn.Linear(512, 512), nn.ReLU(), nn.Linear(512, 10), ) def forward(self, x): x = self.flatten(x) logits = self.linear_relu_stack(x) return logits model = NeuralNetwork()
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-images-idx3-ubyte.gz Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-images-idx3-ubyte.gz to data/FashionMNIST/raw/train-images-idx3-ubyte.gz 0%| | 0/26421880 [00:00<?, ?it/s] 0%| | 65536/26421880 [00:00<01:12, 365539.83it/s] 1%| | 229376/26421880 [00:00<00:38, 684511.48it/s] 3%|3 | 884736/26421880 [00:00<00:12, 2030637.83it/s] 12%|#1 | 3080192/26421880 [00:00<00:03, 6027159.86it/s] 31%|### | 8060928/26421880 [00:00<00:01, 16445259.09it/s] 42%|####1 | 11075584/26421880 [00:00<00:00, 16871356.21it/s] 64%|######3 | 16908288/26421880 [00:01<00:00, 24452744.90it/s] 76%|#######6 | 20086784/26421880 [00:01<00:00, 24276135.68it/s] 98%|#########7| 25788416/26421880 [00:01<00:00, 32055536.22it/s] 100%|##########| 26421880/26421880 [00:01<00:00, 18270891.31it/s] Extracting data/FashionMNIST/raw/train-images-idx3-ubyte.gz to data/FashionMNIST/raw Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-labels-idx1-ubyte.gz Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-labels-idx1-ubyte.gz to data/FashionMNIST/raw/train-labels-idx1-ubyte.gz 0%| | 0/29515 [00:00<?, ?it/s] 100%|##########| 29515/29515 [00:00<00:00, 326183.74it/s] Extracting data/FashionMNIST/raw/train-labels-idx1-ubyte.gz to data/FashionMNIST/raw Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-images-idx3-ubyte.gz Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-images-idx3-ubyte.gz to data/FashionMNIST/raw/t10k-images-idx3-ubyte.gz 0%| | 0/4422102 [00:00<?, ?it/s] 1%|1 | 65536/4422102 [00:00<00:12, 361771.90it/s] 5%|5 | 229376/4422102 [00:00<00:06, 680798.45it/s] 21%|## | 917504/4422102 [00:00<00:01, 2100976.96it/s] 70%|####### | 3112960/4422102 [00:00<00:00, 6040440.05it/s] 100%|##########| 4422102/4422102 [00:00<00:00, 6047736.61it/s] Extracting data/FashionMNIST/raw/t10k-images-idx3-ubyte.gz to data/FashionMNIST/raw Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-labels-idx1-ubyte.gz Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-labels-idx1-ubyte.gz to data/FashionMNIST/raw/t10k-labels-idx1-ubyte.gz 0%| | 0/5148 [00:00<?, ?it/s] 100%|##########| 5148/5148 [00:00<00:00, 36846889.06it/s] Extracting data/FashionMNIST/raw/t10k-labels-idx1-ubyte.gz to data/FashionMNIST/raw
超参数
超参数是可调参数,让您控制模型优化过程。不同的超参数值可能会影响模型训练和收敛速度(了解更多关于超参数调整)
我们为训练定义以下超参数:
- Epoch 的数量 - 数据集迭代的次数
- 批量大小 - 在更新参数之前通过网络传播的数据样本数量
- 学习率 - 每个批次/epoch 更新模型参数的量。较小的值会导致学习速度较慢,而较大的值可能会导致训练过程中出现不可预测的行为。
learning_rate = 1e-3 batch_size = 64 epochs = 5
优化循环
一旦设置了超参数,我们就可以通过优化循环训练和优化我们的模型。优化循环的每次迭代称为epoch。
每个 epoch 包括两个主要部分:
- 训练循环 - 迭代训练数据集并尝试收敛到最佳参数。
- 验证/测试循环 - 迭代测试数据集以检查模型性能是否改善。
让我们简要了解一下训练循环中使用的一些概念。跳转到完整实现以查看优化循环。
损失函数
当给定一些训练数据时,我们未经训练的网络可能不会给出正确答案。损失函数衡量获得的结果与目标值的不相似程度,我们希望在训练过程中最小化损失函数。为了计算损失,我们使用给定数据样本的输入进行预测,并将其与真实数据标签值进行比较。
常见的损失函数包括nn.MSELoss(均方误差)用于回归任务,以及nn.NLLLoss(负对数似然)用于分类。nn.CrossEntropyLoss结合了nn.LogSoftmax
和nn.NLLLoss
。
我们将模型的输出 logits 传递给nn.CrossEntropyLoss
,它将对 logits 进行归一化并计算预测错误。
# Initialize the loss function loss_fn = nn.CrossEntropyLoss()
优化器
优化是调整模型参数以减少每个训练步骤中模型误差的过程。优化算法定义了如何执行这个过程(在这个例子中我们使用随机梯度下降)。所有的优化逻辑都封装在optimizer
对象中。在这里,我们使用 SGD 优化器;此外,PyTorch 还有许多不同的优化器可供选择,如 ADAM 和 RMSProp,适用于不同类型的模型和数据。
我们通过注册需要训练的模型参数并传入学习率超参数来初始化优化器。
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)
在训练循环中,优化分为三个步骤:
- 调用
optimizer.zero_grad()
来重置模型参数的梯度。梯度默认会累加;为了防止重复计算,我们在每次迭代时明确将其归零。 - 通过调用
loss.backward()
来反向传播预测损失。PyTorch 会将损失相对于每个参数的梯度存储起来。 - 一旦我们有了梯度,我们调用
optimizer.step()
来根据反向传播中收集的梯度调整参数。
完整实现
我们定义train_loop
循环优化代码,并定义test_loop
评估模型在测试数据上的性能。
def train_loop(dataloader, model, loss_fn, optimizer): size = len(dataloader.dataset) # Set the model to training mode - important for batch normalization and dropout layers # Unnecessary in this situation but added for best practices model.train() for batch, (X, y) in enumerate(dataloader): # Compute prediction and loss pred = model(X) loss = loss_fn(pred, y) # Backpropagation loss.backward() optimizer.step() optimizer.zero_grad() if batch % 100 == 0: loss, current = loss.item(), batch * batch_size + len(X) print(f"loss: {loss:>7f} [{current:>5d}/{size:>5d}]") def test_loop(dataloader, model, loss_fn): # Set the model to evaluation mode - important for batch normalization and dropout layers # Unnecessary in this situation but added for best practices model.eval() size = len(dataloader.dataset) num_batches = len(dataloader) test_loss, correct = 0, 0 # Evaluating the model with torch.no_grad() ensures that no gradients are computed during test mode # also serves to reduce unnecessary gradient computations and memory usage for tensors with requires_grad=True with torch.no_grad(): for X, y in dataloader: pred = model(X) test_loss += loss_fn(pred, y).item() correct += (pred.argmax(1) == y).type(torch.float).sum().item() test_loss /= num_batches correct /= size print(f"Test Error: \n Accuracy: {(100*correct):>0.1f}%, Avg loss: {test_loss:>8f} \n")
我们初始化损失函数和优化器,并将其传递给train_loop
和test_loop
。可以增加 epoch 的数量来跟踪模型的性能改进。
loss_fn = nn.CrossEntropyLoss() optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate) epochs = 10 for t in range(epochs): print(f"Epoch {t+1}\n-------------------------------") train_loop(train_dataloader, model, loss_fn, optimizer) test_loop(test_dataloader, model, loss_fn) print("Done!")
Epoch 1 ------------------------------- loss: 2.298730 [ 64/60000] loss: 2.289123 [ 6464/60000] loss: 2.273286 [12864/60000] loss: 2.269406 [19264/60000] loss: 2.249603 [25664/60000] loss: 2.229407 [32064/60000] loss: 2.227368 [38464/60000] loss: 2.204261 [44864/60000] loss: 2.206193 [51264/60000] loss: 2.166651 [57664/60000] Test Error: Accuracy: 50.9%, Avg loss: 2.166725 Epoch 2 ------------------------------- loss: 2.176750 [ 64/60000] loss: 2.169595 [ 6464/60000] loss: 2.117500 [12864/60000] loss: 2.129272 [19264/60000] loss: 2.079674 [25664/60000] loss: 2.032928 [32064/60000] loss: 2.050115 [38464/60000] loss: 1.985236 [44864/60000] loss: 1.987887 [51264/60000] loss: 1.907162 [57664/60000] Test Error: Accuracy: 55.9%, Avg loss: 1.915486 Epoch 3 ------------------------------- loss: 1.951612 [ 64/60000] loss: 1.928685 [ 6464/60000] loss: 1.815709 [12864/60000] loss: 1.841552 [19264/60000] loss: 1.732467 [25664/60000] loss: 1.692914 [32064/60000] loss: 1.701714 [38464/60000] loss: 1.610632 [44864/60000] loss: 1.632870 [51264/60000] loss: 1.514263 [57664/60000] Test Error: Accuracy: 58.8%, Avg loss: 1.541525 Epoch 4 ------------------------------- loss: 1.616448 [ 64/60000] loss: 1.582892 [ 6464/60000] loss: 1.427595 [12864/60000] loss: 1.487950 [19264/60000] loss: 1.359332 [25664/60000] loss: 1.364817 [32064/60000] loss: 1.371491 [38464/60000] loss: 1.298706 [44864/60000] loss: 1.336201 [51264/60000] loss: 1.232145 [57664/60000] Test Error: Accuracy: 62.2%, Avg loss: 1.260237 Epoch 5 ------------------------------- loss: 1.345538 [ 64/60000] loss: 1.327798 [ 6464/60000] loss: 1.153802 [12864/60000] loss: 1.254829 [19264/60000] loss: 1.117322 [25664/60000] loss: 1.153248 [32064/60000] loss: 1.171765 [38464/60000] loss: 1.110263 [44864/60000] loss: 1.154467 [51264/60000] loss: 1.070921 [57664/60000] Test Error: Accuracy: 64.1%, Avg loss: 1.089831 Epoch 6 ------------------------------- loss: 1.166889 [ 64/60000] loss: 1.170514 [ 6464/60000] loss: 0.979435 [12864/60000] loss: 1.113774 [19264/60000] loss: 0.973411 [25664/60000] loss: 1.015192 [32064/60000] loss: 1.051113 [38464/60000] loss: 0.993591 [44864/60000] loss: 1.039709 [51264/60000] loss: 0.971077 [57664/60000] Test Error: Accuracy: 65.8%, Avg loss: 0.982440 Epoch 7 ------------------------------- loss: 1.045165 [ 64/60000] loss: 1.070583 [ 6464/60000] loss: 0.862304 [12864/60000] loss: 1.022265 [19264/60000] loss: 0.885213 [25664/60000] loss: 0.919528 [32064/60000] loss: 0.972762 [38464/60000] loss: 0.918728 [44864/60000] loss: 0.961629 [51264/60000] loss: 0.904379 [57664/60000] Test Error: Accuracy: 66.9%, Avg loss: 0.910167 Epoch 8 ------------------------------- loss: 0.956964 [ 64/60000] loss: 1.002171 [ 6464/60000] loss: 0.779057 [12864/60000] loss: 0.958409 [19264/60000] loss: 0.827240 [25664/60000] loss: 0.850262 [32064/60000] loss: 0.917320 [38464/60000] loss: 0.868384 [44864/60000] loss: 0.905506 [51264/60000] loss: 0.856353 [57664/60000] Test Error: Accuracy: 68.3%, Avg loss: 0.858248 Epoch 9 ------------------------------- loss: 0.889765 [ 64/60000] loss: 0.951220 [ 6464/60000] loss: 0.717035 [12864/60000] loss: 0.911042 [19264/60000] loss: 0.786085 [25664/60000] loss: 0.798370 [32064/60000] loss: 0.874939 [38464/60000] loss: 0.832796 [44864/60000] loss: 0.863254 [51264/60000] loss: 0.819742 [57664/60000] Test Error: Accuracy: 69.5%, Avg loss: 0.818780 Epoch 10 ------------------------------- loss: 0.836395 [ 64/60000] loss: 0.910220 [ 6464/60000] loss: 0.668506 [12864/60000] loss: 0.874338 [19264/60000] loss: 0.754805 [25664/60000] loss: 0.758453 [32064/60000] loss: 0.840451 [38464/60000] loss: 0.806153 [44864/60000] loss: 0.830360 [51264/60000] loss: 0.790281 [57664/60000] Test Error: Accuracy: 71.0%, Avg loss: 0.787271 Done! • 151
进一步阅读
脚本的总运行时间:(2 分钟 0.365 秒)
下载 Python 源代码:optimization_tutorial.py
下载 Jupyter 笔记本:optimization_tutorial.ipynb
保存和加载模型
原文:
pytorch.org/tutorials/beginner/basics/saveloadrun_tutorial.html
译者:飞龙
注意
点击这里下载完整示例代码
学习基础知识 || 快速入门 || 张量 || 数据集和数据加载器 || 转换 || 构建模型 || 自动求导 || 优化 || 保存和加载模型
在本节中,我们将看看如何通过保存、加载和运行模型预测来持久化模型状态。
import torch import torchvision.models as models
保存和加载模型权重
PyTorch 模型将学习到的参数存储在内部状态字典中,称为 state_dict
。这些可以通过 torch.save
方法进行持久化:
model = models.vgg16(weights='IMAGENET1K_V1') torch.save(model.state_dict(), 'model_weights.pth')
Downloading: "https://download.pytorch.org/models/vgg16-397923af.pth" to /var/lib/jenkins/.cache/torch/hub/checkpoints/vgg16-397923af.pth 0%| | 0.00/528M [00:00<?, ?B/s] 2%|2 | 12.7M/528M [00:00<00:04, 133MB/s] 5%|4 | 25.9M/528M [00:00<00:03, 136MB/s] 8%|7 | 40.3M/528M [00:00<00:03, 143MB/s] 10%|# | 54.0M/528M [00:00<00:03, 141MB/s] 13%|#2 | 67.4M/528M [00:00<00:03, 138MB/s] 15%|#5 | 81.8M/528M [00:00<00:03, 142MB/s] 18%|#8 | 96.2M/528M [00:00<00:03, 145MB/s] 21%|## | 110M/528M [00:00<00:03, 145MB/s] 24%|##3 | 124M/528M [00:00<00:02, 147MB/s] 26%|##6 | 139M/528M [00:01<00:02, 148MB/s] 29%|##9 | 153M/528M [00:01<00:02, 149MB/s] 32%|###1 | 168M/528M [00:01<00:02, 150MB/s] 35%|###4 | 182M/528M [00:01<00:02, 151MB/s] 37%|###7 | 197M/528M [00:01<00:02, 123MB/s] 40%|###9 | 210M/528M [00:01<00:02, 127MB/s] 42%|####2 | 223M/528M [00:01<00:02, 113MB/s] 44%|####4 | 234M/528M [00:01<00:02, 112MB/s] 47%|####6 | 248M/528M [00:01<00:02, 119MB/s] 50%|####9 | 262M/528M [00:02<00:02, 128MB/s] 52%|#####2 | 275M/528M [00:02<00:02, 129MB/s] 55%|#####4 | 288M/528M [00:02<00:01, 132MB/s] 57%|#####7 | 302M/528M [00:02<00:01, 136MB/s] 60%|#####9 | 316M/528M [00:02<00:01, 140MB/s] 63%|######2 | 331M/528M [00:02<00:01, 144MB/s] 65%|######5 | 345M/528M [00:02<00:01, 146MB/s] 68%|######8 | 360M/528M [00:02<00:01, 148MB/s] 71%|####### | 374M/528M [00:02<00:01, 149MB/s] 74%|#######3 | 389M/528M [00:02<00:00, 150MB/s] 76%|#######6 | 403M/528M [00:03<00:00, 151MB/s] 79%|#######9 | 418M/528M [00:03<00:00, 151MB/s] 82%|########1 | 432M/528M [00:03<00:00, 151MB/s] 85%|########4 | 447M/528M [00:03<00:00, 152MB/s] 87%|########7 | 461M/528M [00:03<00:00, 152MB/s] 90%|######### | 476M/528M [00:03<00:00, 152MB/s] 93%|#########2| 490M/528M [00:03<00:00, 152MB/s] 96%|#########5| 505M/528M [00:03<00:00, 151MB/s] 98%|#########8| 519M/528M [00:03<00:00, 151MB/s] 100%|##########| 528M/528M [00:03<00:00, 142MB/s]
要加载模型权重,您需要首先创建相同模型的实例,然后使用 load_state_dict()
方法加载参数。
model = models.vgg16() # we do not specify ``weights``, i.e. create untrained model model.load_state_dict(torch.load('model_weights.pth')) model.eval()
VGG( (features): Sequential( (0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (1): ReLU(inplace=True) (2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (3): ReLU(inplace=True) (4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False) (5): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (6): ReLU(inplace=True) (7): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (8): ReLU(inplace=True) (9): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False) (10): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (11): ReLU(inplace=True) (12): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (13): ReLU(inplace=True) (14): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (15): ReLU(inplace=True) (16): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False) (17): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (18): ReLU(inplace=True) (19): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (20): ReLU(inplace=True) (21): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (22): ReLU(inplace=True) (23): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False) (24): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (25): ReLU(inplace=True) (26): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (27): ReLU(inplace=True) (28): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (29): ReLU(inplace=True) (30): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False) ) (avgpool): AdaptiveAvgPool2d(output_size=(7, 7)) (classifier): Sequential( (0): Linear(in_features=25088, out_features=4096, bias=True) (1): ReLU(inplace=True) (2): Dropout(p=0.5, inplace=False) (3): Linear(in_features=4096, out_features=4096, bias=True) (4): ReLU(inplace=True) (5): Dropout(p=0.5, inplace=False) (6): Linear(in_features=4096, out_features=1000, bias=True) ) )
注意
在进行推理之前,请务必调用 model.eval()
方法,将丢弃和批量归一化层设置为评估模式。如果不这样做,将导致不一致的推理结果。
保存和加载带有形状的模型
在加载模型权重时,我们需要首先实例化模型类,因为类定义了网络的结构。我们可能希望将此类的结构与模型一起保存,这样我们可以将 model
(而不是 model.state_dict()
)传递给保存函数:
torch.save(model, 'model.pth')
我们可以像这样加载模型:
model = torch.load('model.pth')
注意
此方法在序列化模型时使用 Python pickle 模块,因此在加载模型时依赖于实际的类定义。
相关教程
脚本的总运行时间:(0 分钟 9.335 秒)
下载 Python 源代码:saveloadrun_tutorial.py
下载 Jupyter 笔记本:saveloadrun_tutorial.ipynb