PyTorch 2.2 中文官方教程(三)(1)https://developer.aliyun.com/article/1482487
数据集和数据加载器
Dataset
和DataLoader
类封装了从存储中提取数据并将其以批量形式暴露给训练循环的过程。
Dataset
负责访问和处理单个数据实例。
DataLoader
从Dataset
中获取数据实例(自动或使用您定义的采样器),将它们收集到批次中,并返回给您的训练循环消费。DataLoader
适用于所有类型的数据集,无论它们包含的数据类型是什么。
在本教程中,我们将使用 TorchVision 提供的 Fashion-MNIST 数据集。我们使用torchvision.transforms.Normalize()
来将图像块内容的分布归零并进行归一化,并下载训练和验证数据拆分。
import torch import torchvision import torchvision.transforms as transforms # PyTorch TensorBoard support from torch.utils.tensorboard import SummaryWriter from datetime import datetime transform = transforms.Compose( [transforms.ToTensor(), transforms.Normalize((0.5,), (0.5,))]) # Create datasets for training & validation, download if necessary training_set = torchvision.datasets.FashionMNIST('./data', train=True, transform=transform, download=True) validation_set = torchvision.datasets.FashionMNIST('./data', train=False, transform=transform, download=True) # Create data loaders for our datasets; shuffle for training, not for validation training_loader = torch.utils.data.DataLoader(training_set, batch_size=4, shuffle=True) validation_loader = torch.utils.data.DataLoader(validation_set, batch_size=4, shuffle=False) # Class labels classes = ('T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat', 'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle Boot') # Report split sizes print('Training set has {} instances'.format(len(training_set))) print('Validation set has {} instances'.format(len(validation_set)))
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-images-idx3-ubyte.gz Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-images-idx3-ubyte.gz to ./data/FashionMNIST/raw/train-images-idx3-ubyte.gz 0%| | 0/26421880 [00:00<?, ?it/s] 0%| | 65536/26421880 [00:00<01:12, 364219.97it/s] 1%| | 229376/26421880 [00:00<00:38, 686138.70it/s] 4%|3 | 950272/26421880 [00:00<00:11, 2201377.51it/s] 14%|#4 | 3801088/26421880 [00:00<00:02, 7581352.34it/s] 37%|###7 | 9797632/26421880 [00:00<00:00, 16849344.06it/s] 59%|#####9 | 15663104/26421880 [00:01<00:00, 26145189.61it/s] 71%|#######1 | 18776064/26421880 [00:01<00:00, 23360633.32it/s] 93%|#########2| 24543232/26421880 [00:01<00:00, 26387177.79it/s] 100%|##########| 26421880/26421880 [00:01<00:00, 19446710.50it/s] Extracting ./data/FashionMNIST/raw/train-images-idx3-ubyte.gz to ./data/FashionMNIST/raw Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-labels-idx1-ubyte.gz Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-labels-idx1-ubyte.gz to ./data/FashionMNIST/raw/train-labels-idx1-ubyte.gz 0%| | 0/29515 [00:00<?, ?it/s] 100%|##########| 29515/29515 [00:00<00:00, 326274.86it/s] Extracting ./data/FashionMNIST/raw/train-labels-idx1-ubyte.gz to ./data/FashionMNIST/raw Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-images-idx3-ubyte.gz Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-images-idx3-ubyte.gz to ./data/FashionMNIST/raw/t10k-images-idx3-ubyte.gz 0%| | 0/4422102 [00:00<?, ?it/s] 1%|1 | 65536/4422102 [00:00<00:11, 364622.91it/s] 5%|5 | 229376/4422102 [00:00<00:06, 684813.81it/s] 21%|##1 | 950272/4422102 [00:00<00:01, 2200476.22it/s] 85%|########5 | 3768320/4422102 [00:00<00:00, 7506714.24it/s] 100%|##########| 4422102/4422102 [00:00<00:00, 6115026.62it/s] Extracting ./data/FashionMNIST/raw/t10k-images-idx3-ubyte.gz to ./data/FashionMNIST/raw Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-labels-idx1-ubyte.gz Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-labels-idx1-ubyte.gz to ./data/FashionMNIST/raw/t10k-labels-idx1-ubyte.gz 0%| | 0/5148 [00:00<?, ?it/s] 100%|##########| 5148/5148 [00:00<00:00, 35867569.75it/s] Extracting ./data/FashionMNIST/raw/t10k-labels-idx1-ubyte.gz to ./data/FashionMNIST/raw Training set has 60000 instances Validation set has 10000 instances
像往常一样,让我们通过可视化数据来进行健全性检查:
import matplotlib.pyplot as plt import numpy as np # Helper function for inline image display def matplotlib_imshow(img, one_channel=False): if one_channel: img = img.mean(dim=0) img = img / 2 + 0.5 # unnormalize npimg = img.numpy() if one_channel: plt.imshow(npimg, cmap="Greys") else: plt.imshow(np.transpose(npimg, (1, 2, 0))) dataiter = iter(training_loader) images, labels = next(dataiter) # Create a grid from the images and show them img_grid = torchvision.utils.make_grid(images) matplotlib_imshow(img_grid, one_channel=True) print(' '.join(classes[labels[j]] for j in range(4)))
Sandal Sneaker Coat Sneaker
模型
在这个例子中,我们将使用 LeNet-5 的变体模型 - 如果您观看了本系列中的先前视频,这应该是熟悉的。
import torch.nn as nn import torch.nn.functional as F # PyTorch models inherit from torch.nn.Module class GarmentClassifier(nn.Module): def __init__(self): super(GarmentClassifier, self).__init__() self.conv1 = nn.Conv2d(1, 6, 5) self.pool = nn.MaxPool2d(2, 2) self.conv2 = nn.Conv2d(6, 16, 5) self.fc1 = nn.Linear(16 * 4 * 4, 120) self.fc2 = nn.Linear(120, 84) self.fc3 = nn.Linear(84, 10) def forward(self, x): x = self.pool(F.relu(self.conv1(x))) x = self.pool(F.relu(self.conv2(x))) x = x.view(-1, 16 * 4 * 4) x = F.relu(self.fc1(x)) x = F.relu(self.fc2(x)) x = self.fc3(x) return x model = GarmentClassifier()
损失函数
在这个例子中,我们将使用交叉熵损失。为了演示目的,我们将创建一批虚拟输出和标签值,将它们通过损失函数运行,并检查结果。
loss_fn = torch.nn.CrossEntropyLoss() # NB: Loss functions expect data in batches, so we're creating batches of 4 # Represents the model's confidence in each of the 10 classes for a given input dummy_outputs = torch.rand(4, 10) # Represents the correct class among the 10 being tested dummy_labels = torch.tensor([1, 5, 3, 7]) print(dummy_outputs) print(dummy_labels) loss = loss_fn(dummy_outputs, dummy_labels) print('Total loss for this batch: {}'.format(loss.item()))
tensor([[0.7026, 0.1489, 0.0065, 0.6841, 0.4166, 0.3980, 0.9849, 0.6701, 0.4601, 0.8599], [0.7461, 0.3920, 0.9978, 0.0354, 0.9843, 0.0312, 0.5989, 0.2888, 0.8170, 0.4150], [0.8408, 0.5368, 0.0059, 0.8931, 0.3942, 0.7349, 0.5500, 0.0074, 0.0554, 0.1537], [0.7282, 0.8755, 0.3649, 0.4566, 0.8796, 0.2390, 0.9865, 0.7549, 0.9105, 0.5427]]) tensor([1, 5, 3, 7]) Total loss for this batch: 2.428950071334839
优化器
在这个例子中,我们将使用带有动量的简单随机梯度下降。
尝试对这个优化方案进行一些变化可能会有帮助:
- 学习率确定了优化器采取的步长大小。不同的学习率对训练结果的准确性和收敛时间有什么影响?
- 动量在多个步骤中将优化器推向最强梯度的方向。改变这个值会对你的结果产生什么影响?
- 尝试一些不同的优化算法,比如平均 SGD、Adagrad 或 Adam。你的结果有什么不同?
# Optimizers specified in the torch.optim package optimizer = torch.optim.SGD(model.parameters(), lr=0.001, momentum=0.9)
训练循环
下面是一个执行一个训练周期的函数。它枚举来自 DataLoader 的数据,并在每次循环中执行以下操作:
- 从 DataLoader 获取一批训练数据
- 将优化器的梯度置零
- 执行推断 - 也就是为输入批次从模型获取预测
- 计算该批次预测与数据集标签之间的损失
- 计算学习权重的反向梯度
- 告诉优化器执行一个学习步骤 - 即根据我们选择的优化算法,根据这一批次的观察梯度调整模型的学习权重
- 它报告每 1000 批次的损失。
- 最后,它报告了最后 1000 批次的平均每批次损失,以便与验证运行进行比较
def train_one_epoch(epoch_index, tb_writer): running_loss = 0. last_loss = 0. # Here, we use enumerate(training_loader) instead of # iter(training_loader) so that we can track the batch # index and do some intra-epoch reporting for i, data in enumerate(training_loader): # Every data instance is an input + label pair inputs, labels = data # Zero your gradients for every batch! optimizer.zero_grad() # Make predictions for this batch outputs = model(inputs) # Compute the loss and its gradients loss = loss_fn(outputs, labels) loss.backward() # Adjust learning weights optimizer.step() # Gather data and report running_loss += loss.item() if i % 1000 == 999: last_loss = running_loss / 1000 # loss per batch print(' batch {} loss: {}'.format(i + 1, last_loss)) tb_x = epoch_index * len(training_loader) + i + 1 tb_writer.add_scalar('Loss/train', last_loss, tb_x) running_loss = 0. return last_loss
每轮活动
每轮我们都要做一些事情:
- 通过检查在训练中未使用的一组数据上的相对损失来执行验证,并报告此结果
- 保存模型的副本
在这里,我们将在 TensorBoard 中进行报告。这将需要转到命令行启动 TensorBoard,并在另一个浏览器选项卡中打开它。
# Initializing in a separate cell so we can easily add more epochs to the same run timestamp = datetime.now().strftime('%Y%m%d_%H%M%S') writer = SummaryWriter('runs/fashion_trainer_{}'.format(timestamp)) epoch_number = 0 EPOCHS = 5 best_vloss = 1_000_000. for epoch in range(EPOCHS): print('EPOCH {}:'.format(epoch_number + 1)) # Make sure gradient tracking is on, and do a pass over the data model.train(True) avg_loss = train_one_epoch(epoch_number, writer) running_vloss = 0.0 # Set the model to evaluation mode, disabling dropout and using population # statistics for batch normalization. model.eval() # Disable gradient computation and reduce memory consumption. with torch.no_grad(): for i, vdata in enumerate(validation_loader): vinputs, vlabels = vdata voutputs = model(vinputs) vloss = loss_fn(voutputs, vlabels) running_vloss += vloss avg_vloss = running_vloss / (i + 1) print('LOSS train {} valid {}'.format(avg_loss, avg_vloss)) # Log the running loss averaged per batch # for both training and validation writer.add_scalars('Training vs. Validation Loss', { 'Training' : avg_loss, 'Validation' : avg_vloss }, epoch_number + 1) writer.flush() # Track best performance, and save the model's state if avg_vloss < best_vloss: best_vloss = avg_vloss model_path = 'model_{}_{}'.format(timestamp, epoch_number) torch.save(model.state_dict(), model_path) epoch_number += 1
EPOCH 1: batch 1000 loss: 1.6334228584356607 batch 2000 loss: 0.8325267538074403 batch 3000 loss: 0.7359380583595484 batch 4000 loss: 0.6198329215242994 batch 5000 loss: 0.6000315657821484 batch 6000 loss: 0.555109024874866 batch 7000 loss: 0.5260250487388112 batch 8000 loss: 0.4973462742221891 batch 9000 loss: 0.4781935699362075 batch 10000 loss: 0.47880298678041433 batch 11000 loss: 0.45598648857555235 batch 12000 loss: 0.4327470133750467 batch 13000 loss: 0.41800182418141046 batch 14000 loss: 0.4115047634313814 batch 15000 loss: 0.4211296908891527 LOSS train 0.4211296908891527 valid 0.414460688829422 EPOCH 2: batch 1000 loss: 0.3879808729066281 batch 2000 loss: 0.35912817339546743 batch 3000 loss: 0.38074520684120944 batch 4000 loss: 0.3614532373107213 batch 5000 loss: 0.36850082185724753 batch 6000 loss: 0.3703581801643886 batch 7000 loss: 0.38547042514081115 batch 8000 loss: 0.37846584360170527 batch 9000 loss: 0.3341486988377292 batch 10000 loss: 0.3433013284947956 batch 11000 loss: 0.35607743899174965 batch 12000 loss: 0.3499939931873523 batch 13000 loss: 0.33874178926000603 batch 14000 loss: 0.35130289171106416 batch 15000 loss: 0.3394507191307202 LOSS train 0.3394507191307202 valid 0.3581162691116333 EPOCH 3: batch 1000 loss: 0.3319729989422485 batch 2000 loss: 0.29558994361863006 batch 3000 loss: 0.3107374766407593 batch 4000 loss: 0.3298987646112146 batch 5000 loss: 0.30858693152241906 batch 6000 loss: 0.33916381367447684 batch 7000 loss: 0.3105102765217889 batch 8000 loss: 0.3011080777524912 batch 9000 loss: 0.3142058177240979 batch 10000 loss: 0.31458891937109 batch 11000 loss: 0.31527258940579483 batch 12000 loss: 0.31501667268342864 batch 13000 loss: 0.3011875962628328 batch 14000 loss: 0.30012811454350596 batch 15000 loss: 0.31833117976446373 LOSS train 0.31833117976446373 valid 0.3307691514492035 EPOCH 4: batch 1000 loss: 0.2786161053752294 batch 2000 loss: 0.27965198021690596 batch 3000 loss: 0.28595415444140965 batch 4000 loss: 0.292985666413857 batch 5000 loss: 0.3069892351147719 batch 6000 loss: 0.29902250939945224 batch 7000 loss: 0.2863366014406201 batch 8000 loss: 0.2655441066541243 batch 9000 loss: 0.3045048695363293 batch 10000 loss: 0.27626545656517554 batch 11000 loss: 0.2808379335970967 batch 12000 loss: 0.29241049340573955 batch 13000 loss: 0.28030834131941446 batch 14000 loss: 0.2983542350126445 batch 15000 loss: 0.3009556676162611 LOSS train 0.3009556676162611 valid 0.41686952114105225 EPOCH 5: batch 1000 loss: 0.2614263167564495 batch 2000 loss: 0.2587047562422049 batch 3000 loss: 0.2642477260621345 batch 4000 loss: 0.2825975873669813 batch 5000 loss: 0.26987933717705165 batch 6000 loss: 0.2759250026817317 batch 7000 loss: 0.26055969463163275 batch 8000 loss: 0.29164007206353565 batch 9000 loss: 0.2893096504513578 batch 10000 loss: 0.2486029507305684 batch 11000 loss: 0.2732803234480907 batch 12000 loss: 0.27927226484491985 batch 13000 loss: 0.2686819267635074 batch 14000 loss: 0.24746483912148323 batch 15000 loss: 0.27903492261294194 LOSS train 0.27903492261294194 valid 0.31206756830215454
加载模型的保存版本:
saved_model = GarmentClassifier() saved_model.load_state_dict(torch.load(PATH))
加载模型后,它已准备好用于您需要的任何操作 - 更多训练,推断或分析。
请注意,如果您的模型具有影响模型结构的构造函数参数,您需要提供它们并将模型配置为与保存时的状态相同。
其他资源
- PyTorch 中的数据工具文档,包括 Dataset 和 DataLoader
- 关于在 GPU 训练中使用固定内存的说明
- TorchVision,TorchText和TorchAudio中可用数据集的文档
- PyTorch 中可用的损失函数的文档
- torch.optim 包的文档,其中包括优化器和相关工具,如学习率调度
- 有关保存和加载模型的详细教程
- pytorch.org 的教程部分包含广泛的训练任务教程,包括不同领域的分类,生成对抗网络,强化学习等
脚本的总运行时间:(5 分钟 4.557 秒)
下载 Python 源代码:trainingyt.py
下载 Jupyter 笔记本:trainingyt.ipynb
使用 Captum 进行模型理解
原文:
pytorch.org/tutorials/beginner/introyt/captumyt.html
译者:飞龙
注意
点击这里下载完整示例代码
介绍 || 张量 || 自动微分 || 构建模型 || TensorBoard 支持 || 训练模型 || 模型理解
请跟随下面的视频或YouTube进行操作。在这里下载笔记本和相应的文件这里。
www.youtube.com/embed/Am2EF9CLu-g
Captum(拉丁语中的“理解”)是一个建立在 PyTorch 上的开源、可扩展的模型可解释性库。
随着模型复杂性的增加和由此产生的不透明性,模型可解释性方法变得越来越重要。模型理解既是一个活跃的研究领域,也是一个在使用机器学习的各行业中实际应用的重点领域。Captum 提供了最先进的算法,包括集成梯度,为研究人员和开发人员提供了一种简单的方法来理解哪些特征对模型的输出有贡献。
在Captum.ai网站上提供了完整的文档、API 参考和一系列关于特定主题的教程。
介绍
Captum 对模型可解释性的方法是以归因为基础的。Captum 提供了三种类型的归因:
- 特征归因试图解释特定输出,以输入的特征生成它。例如,解释电影评论是积极的还是消极的,以评论中的某些词语为例。
- 层归因研究了模型隐藏层在特定输入后的活动。检查卷积层对输入图像的空间映射输出是层归因的一个例子。
- 神经元归因类似于层归因,但专注于单个神经元的活动。
在这个互动笔记本中,我们将查看特征归因和层归因。
每种归因类型都有多个归因算法与之相关。许多归因算法可分为两大类:
- 基于梯度的算法计算模型输出、层输出或神经元激活相对于输入的反向梯度。集成梯度(用于特征)、层梯度*激活和神经元电导都是基于梯度的算法。
- 基于扰动的算法检查模型、层或神经元对输入变化的响应。输入扰动可能是有方向的或随机的。遮挡、特征消融和特征置换都是基于扰动的算法。
我们将在下面检查这两种类型的算法。
特别是涉及大型模型时,以一种易于将其与正在检查的输入特征相关联的方式可视化归因数据可能是有价值的。虽然可以使用 Matplotlib、Plotly 或类似工具创建自己的可视化,但 Captum 提供了专门针对其归因的增强工具:
captum.attr.visualization
模块(如下导入为viz
)提供了有用的函数,用于可视化与图像相关的归因。- Captum Insights是一个易于使用的 API,位于 Captum 之上,提供了一个可视化小部件,其中包含了针对图像、文本和任意模型类型的现成可视化。
这两种可视化工具集将在本笔记本中进行演示。前几个示例将重点放在计算机视觉用例上,但最后的 Captum Insights 部分将演示在多模型、视觉问答模型中的归因可视化。
安装
在开始之前,您需要具有 Python 环境:
- Python 版本 3.6 或更高
- 对于 Captum Insights 示例,需要 Flask 1.1 或更高版本以及 Flask-Compress(建议使用最新版本)
- PyTorch 版本 1.2 或更高(建议使用最新版本)
- TorchVision 版本 0.6 或更高(建议使用最新版本)
- Captum(建议使用最新版本)
- Matplotlib 版本 3.3.4,因为 Captum 目前使用的 Matplotlib 函数在后续版本中已更名其参数
要在 Anaconda 或 pip 虚拟环境中安装 Captum,请使用下面适合您环境的命令:
使用conda
:
conda install pytorch torchvision captum flask-compress matplotlib=3.3.4 -c pytorch
使用pip
:
pip install torch torchvision captum matplotlib==3.3.4 Flask-Compress
在您设置的环境中重新启动此笔记本,然后您就可以开始了!
PyTorch 2.2 中文官方教程(三)(3)https://developer.aliyun.com/article/1482489