PyTorch 2.2 中文官方教程（四）（3）-阿里云开发者社区

PyTorch 2.2 中文官方教程（四）（2）https://developer.aliyun.com/article/1482493

定义您的模型

在本教程中，我们将使用基于Faster R-CNN的Mask R-CNN。Faster R-CNN 是一个模型，用于预测图像中潜在对象的边界框和类别分数。

Mask R-CNN 在 Faster R-CNN 中添加了一个额外的分支，还为每个实例预测分割蒙版。

有两种常见情况可能需要修改 TorchVision Model Zoo 中的可用模型之一。第一种情况是当我们想要从预训练模型开始，只微调最后一层时。另一种情况是当我们想要用不同的主干替换模型的主干时（例如为了更快的预测）。

让我们看看在以下部分中我们将如何执行其中一个或另一个。

1 - 从预训练模型微调

假设您想从在 COCO 上预训练的模型开始，并希望对其进行微调以适应您的特定类别。以下是可能的操作方式：

import torchvision
from torchvision.models.detection.faster_rcnn import FastRCNNPredictor
# load a model pre-trained on COCO
model = torchvision.models.detection.fasterrcnn_resnet50_fpn(weights="DEFAULT")
# replace the classifier with a new one, that has
# num_classes which is user-defined
num_classes = 2  # 1 class (person) + background
# get number of input features for the classifier
in_features = model.roi_heads.box_predictor.cls_score.in_features
# replace the pre-trained head with a new one
model.roi_heads.box_predictor = FastRCNNPredictor(in_features, num_classes)

Downloading: "https://download.pytorch.org/models/fasterrcnn_resnet50_fpn_coco-258fb6c6.pth" to /var/lib/jenkins/.cache/torch/hub/checkpoints/fasterrcnn_resnet50_fpn_coco-258fb6c6.pth
  0%|          | 0.00/160M [00:00<?, ?B/s]
  8%|7         | 12.1M/160M [00:00<00:01, 127MB/s]
 16%|#6        | 25.8M/160M [00:00<00:01, 137MB/s]
 25%|##4       | 39.6M/160M [00:00<00:00, 140MB/s]
 34%|###3      | 53.6M/160M [00:00<00:00, 143MB/s]
 42%|####2     | 67.5M/160M [00:00<00:00, 144MB/s]
 51%|#####     | 81.4M/160M [00:00<00:00, 145MB/s]
 60%|#####9    | 95.4M/160M [00:00<00:00, 145MB/s]
 68%|######8   | 109M/160M [00:00<00:00, 145MB/s]
 77%|#######7  | 123M/160M [00:00<00:00, 146MB/s]
 86%|########5 | 137M/160M [00:01<00:00, 146MB/s]
 95%|#########4| 151M/160M [00:01<00:00, 146MB/s]
100%|##########| 160M/160M [00:01<00:00, 144MB/s]

2 - 修改模型以添加不同的主干

import torchvision
from torchvision.models.detection import FasterRCNN
from torchvision.models.detection.rpn import AnchorGenerator
# load a pre-trained model for classification and return
# only the features
backbone = torchvision.models.mobilenet_v2(weights="DEFAULT").features
# ``FasterRCNN`` needs to know the number of
# output channels in a backbone. For mobilenet_v2, it's 1280
# so we need to add it here
backbone.out_channels = 1280
# let's make the RPN generate 5 x 3 anchors per spatial
# location, with 5 different sizes and 3 different aspect
# ratios. We have a Tuple[Tuple[int]] because each feature
# map could potentially have different sizes and
# aspect ratios
anchor_generator = AnchorGenerator(
    sizes=((32, 64, 128, 256, 512),),
    aspect_ratios=((0.5, 1.0, 2.0),)
)
# let's define what are the feature maps that we will
# use to perform the region of interest cropping, as well as
# the size of the crop after rescaling.
# if your backbone returns a Tensor, featmap_names is expected to
# be [0]. More generally, the backbone should return an
# ``OrderedDict[Tensor]``, and in ``featmap_names`` you can choose which
# feature maps to use.
roi_pooler = torchvision.ops.MultiScaleRoIAlign(
    featmap_names=['0'],
    output_size=7,
    sampling_ratio=2
)
# put the pieces together inside a Faster-RCNN model
model = FasterRCNN(
    backbone,
    num_classes=2,
    rpn_anchor_generator=anchor_generator,
    box_roi_pool=roi_pooler
)

Downloading: "https://download.pytorch.org/models/mobilenet_v2-7ebf99e0.pth" to /var/lib/jenkins/.cache/torch/hub/checkpoints/mobilenet_v2-7ebf99e0.pth
  0%|          | 0.00/13.6M [00:00<?, ?B/s]
 92%|#########1| 12.5M/13.6M [00:00<00:00, 131MB/s]
100%|##########| 13.6M/13.6M [00:00<00:00, 131MB/s]

PennFudan 数据集的目标检测和实例分割模型

在我们的情况下，我们希望从预训练模型进行微调，鉴于我们的数据集非常小，因此我们将遵循第一种方法。

在这里，我们还想计算实例分割掩模，因此我们将使用 Mask R-CNN：

import torchvision
from torchvision.models.detection.faster_rcnn import FastRCNNPredictor
from torchvision.models.detection.mask_rcnn import MaskRCNNPredictor
def get_model_instance_segmentation(num_classes):
    # load an instance segmentation model pre-trained on COCO
    model = torchvision.models.detection.maskrcnn_resnet50_fpn(weights="DEFAULT")
    # get number of input features for the classifier
    in_features = model.roi_heads.box_predictor.cls_score.in_features
    # replace the pre-trained head with a new one
    model.roi_heads.box_predictor = FastRCNNPredictor(in_features, num_classes)
    # now get the number of input features for the mask classifier
    in_features_mask = model.roi_heads.mask_predictor.conv5_mask.in_channels
    hidden_layer = 256
    # and replace the mask predictor with a new one
    model.roi_heads.mask_predictor = MaskRCNNPredictor(
        in_features_mask,
        hidden_layer,
        num_classes
    )
    return model

这样，model 就准备好在您的自定义数据集上进行训练和评估了。

将所有内容放在一起

在 references/detection/ 中，我们有许多辅助函数来简化训练和评估检测模型。在这里，我们将使用 references/detection/engine.py 和 references/detection/utils.py。只需将 references/detection 下的所有内容下载到您的文件夹中并在此处使用它们。在 Linux 上，如果您有 wget，您可以使用以下命令下载它们：

os.system("wget https://raw.githubusercontent.com/pytorch/vision/main/references/detection/engine.py")
os.system("wget https://raw.githubusercontent.com/pytorch/vision/main/references/detection/utils.py")
os.system("wget https://raw.githubusercontent.com/pytorch/vision/main/references/detection/coco_utils.py")
os.system("wget https://raw.githubusercontent.com/pytorch/vision/main/references/detection/coco_eval.py")
os.system("wget https://raw.githubusercontent.com/pytorch/vision/main/references/detection/transforms.py")

自 v0.15.0 起，torchvision 提供了新的 Transforms API，以便为目标检测和分割任务轻松编写数据增强流水线。

让我们编写一些辅助函数用于数据增强/转换：

from torchvision.transforms import v2 as T
def get_transform(train):
    transforms = []
    if train:
        transforms.append(T.RandomHorizontalFlip(0.5))
    transforms.append(T.ToDtype(torch.float, scale=True))
    transforms.append(T.ToPureTensor())
    return T.Compose(transforms)

测试 `forward()` 方法（可选）

在迭代数据集之前，查看模型在训练和推断时对样本数据的期望是很好的。

import utils
model = torchvision.models.detection.fasterrcnn_resnet50_fpn(weights="DEFAULT")
dataset = PennFudanDataset('data/PennFudanPed', get_transform(train=True))
data_loader = torch.utils.data.DataLoader(
    dataset,
    batch_size=2,
    shuffle=True,
    num_workers=4,
    collate_fn=utils.collate_fn
)
# For Training
images, targets = next(iter(data_loader))
images = list(image for image in images)
targets = [{k: v for k, v in t.items()} for t in targets]
output = model(images, targets)  # Returns losses and detections
print(output)
# For inference
model.eval()
x = [torch.rand(3, 300, 400), torch.rand(3, 500, 400)]
predictions = model(x)  # Returns predictions
print(predictions[0])

{'loss_classifier': tensor(0.0689, grad_fn=<NllLossBackward0>), 'loss_box_reg': tensor(0.0268, grad_fn=<DivBackward0>), 'loss_objectness': tensor(0.0055, grad_fn=<BinaryCrossEntropyWithLogitsBackward0>), 'loss_rpn_box_reg': tensor(0.0036, grad_fn=<DivBackward0>)}
{'boxes': tensor([], size=(0, 4), grad_fn=<StackBackward0>), 'labels': tensor([], dtype=torch.int64), 'scores': tensor([], grad_fn=<IndexBackward0>)}

现在让我们编写执行训练和验证的主要函数：

from engine import train_one_epoch, evaluate
# train on the GPU or on the CPU, if a GPU is not available
device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')
# our dataset has two classes only - background and person
num_classes = 2
# use our dataset and defined transformations
dataset = PennFudanDataset('data/PennFudanPed', get_transform(train=True))
dataset_test = PennFudanDataset('data/PennFudanPed', get_transform(train=False))
# split the dataset in train and test set
indices = torch.randperm(len(dataset)).tolist()
dataset = torch.utils.data.Subset(dataset, indices[:-50])
dataset_test = torch.utils.data.Subset(dataset_test, indices[-50:])
# define training and validation data loaders
data_loader = torch.utils.data.DataLoader(
    dataset,
    batch_size=2,
    shuffle=True,
    num_workers=4,
    collate_fn=utils.collate_fn
)
data_loader_test = torch.utils.data.DataLoader(
    dataset_test,
    batch_size=1,
    shuffle=False,
    num_workers=4,
    collate_fn=utils.collate_fn
)
# get the model using our helper function
model = get_model_instance_segmentation(num_classes)
# move model to the right device
model.to(device)
# construct an optimizer
params = [p for p in model.parameters() if p.requires_grad]
optimizer = torch.optim.SGD(
    params,
    lr=0.005,
    momentum=0.9,
    weight_decay=0.0005
)
# and a learning rate scheduler
lr_scheduler = torch.optim.lr_scheduler.StepLR(
    optimizer,
    step_size=3,
    gamma=0.1
)
# let's train it just for 2 epochs
num_epochs = 2
for epoch in range(num_epochs):
    # train for one epoch, printing every 10 iterations
    train_one_epoch(model, optimizer, data_loader, device, epoch, print_freq=10)
    # update the learning rate
    lr_scheduler.step()
    # evaluate on the test dataset
    evaluate(model, data_loader_test, device=device)
print("That's it!")

Downloading: "https://download.pytorch.org/models/maskrcnn_resnet50_fpn_coco-bf2d0c1e.pth" to /var/lib/jenkins/.cache/torch/hub/checkpoints/maskrcnn_resnet50_fpn_coco-bf2d0c1e.pth
  0%|          | 0.00/170M [00:00<?, ?B/s]
  8%|7         | 12.7M/170M [00:00<00:01, 134MB/s]
 16%|#5        | 26.8M/170M [00:00<00:01, 142MB/s]
 24%|##3       | 40.7M/170M [00:00<00:00, 144MB/s]
 32%|###2      | 54.8M/170M [00:00<00:00, 145MB/s]
 41%|####      | 68.9M/170M [00:00<00:00, 146MB/s]
 49%|####8     | 83.0M/170M [00:00<00:00, 147MB/s]
 57%|#####7    | 97.1M/170M [00:00<00:00, 147MB/s]
 65%|######5   | 111M/170M [00:00<00:00, 147MB/s]
 74%|#######3  | 125M/170M [00:00<00:00, 148MB/s]
 82%|########2 | 140M/170M [00:01<00:00, 148MB/s]
 90%|######### | 154M/170M [00:01<00:00, 148MB/s]
 99%|#########8| 168M/170M [00:01<00:00, 148MB/s]
100%|##########| 170M/170M [00:01<00:00, 147MB/s]
Epoch: [0]  [ 0/60]  eta: 0:02:32  lr: 0.000090  loss: 3.8792 (3.8792)  loss_classifier: 0.4863 (0.4863)  loss_box_reg: 0.2543 (0.2543)  loss_mask: 3.1288 (3.1288)  loss_objectness: 0.0043 (0.0043)  loss_rpn_box_reg: 0.0055 (0.0055)  time: 2.5479  data: 0.2985  max mem: 2783
Epoch: [0]  [10/60]  eta: 0:00:52  lr: 0.000936  loss: 1.7038 (2.3420)  loss_classifier: 0.3913 (0.3626)  loss_box_reg: 0.2683 (0.2687)  loss_mask: 1.1038 (1.6881)  loss_objectness: 0.0204 (0.0184)  loss_rpn_box_reg: 0.0049 (0.0043)  time: 1.0576  data: 0.0315  max mem: 3158
Epoch: [0]  [20/60]  eta: 0:00:39  lr: 0.001783  loss: 0.9972 (1.5790)  loss_classifier: 0.2425 (0.2735)  loss_box_reg: 0.2683 (0.2756)  loss_mask: 0.3489 (1.0043)  loss_objectness: 0.0127 (0.0184)  loss_rpn_box_reg: 0.0051 (0.0072)  time: 0.9143  data: 0.0057  max mem: 3158
Epoch: [0]  [30/60]  eta: 0:00:28  lr: 0.002629  loss: 0.5966 (1.2415)  loss_classifier: 0.0979 (0.2102)  loss_box_reg: 0.2580 (0.2584)  loss_mask: 0.2155 (0.7493)  loss_objectness: 0.0119 (0.0165)  loss_rpn_box_reg: 0.0057 (0.0071)  time: 0.9036  data: 0.0065  max mem: 3158
Epoch: [0]  [40/60]  eta: 0:00:18  lr: 0.003476  loss: 0.5234 (1.0541)  loss_classifier: 0.0737 (0.1749)  loss_box_reg: 0.2241 (0.2505)  loss_mask: 0.1796 (0.6080)  loss_objectness: 0.0055 (0.0135)  loss_rpn_box_reg: 0.0047 (0.0071)  time: 0.8759  data: 0.0064  max mem: 3158
Epoch: [0]  [50/60]  eta: 0:00:09  lr: 0.004323  loss: 0.3642 (0.9195)  loss_classifier: 0.0435 (0.1485)  loss_box_reg: 0.1648 (0.2312)  loss_mask: 0.1585 (0.5217)  loss_objectness: 0.0025 (0.0113)  loss_rpn_box_reg: 0.0047 (0.0069)  time: 0.8693  data: 0.0065  max mem: 3158
Epoch: [0]  [59/60]  eta: 0:00:00  lr: 0.005000  loss: 0.3504 (0.8381)  loss_classifier: 0.0379 (0.1339)  loss_box_reg: 0.1343 (0.2178)  loss_mask: 0.1585 (0.4690)  loss_objectness: 0.0011 (0.0102)  loss_rpn_box_reg: 0.0048 (0.0071)  time: 0.8884  data: 0.0066  max mem: 3158
Epoch: [0] Total time: 0:00:55 (0.9230 s / it)
creating index...
index created!
Test:  [ 0/50]  eta: 0:00:23  model_time: 0.2550 (0.2550)  evaluator_time: 0.0066 (0.0066)  time: 0.4734  data: 0.2107  max mem: 3158
Test:  [49/50]  eta: 0:00:00  model_time: 0.1697 (0.1848)  evaluator_time: 0.0057 (0.0078)  time: 0.1933  data: 0.0034  max mem: 3158
Test: Total time: 0:00:10 (0.2022 s / it)
Averaged stats: model_time: 0.1697 (0.1848)  evaluator_time: 0.0057 (0.0078)
Accumulating evaluation results...
DONE (t=0.02s).
Accumulating evaluation results...
DONE (t=0.02s).
IoU metric: bbox
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.686
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.974
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.802
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.322
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.611
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.708
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.314
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.738
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.739
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.400
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.727
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.750
IoU metric: segm
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.697
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.979
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.871
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.339
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.332
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.719
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.314
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.736
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.737
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.600
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.709
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.744
Epoch: [1]  [ 0/60]  eta: 0:01:12  lr: 0.005000  loss: 0.3167 (0.3167)  loss_classifier: 0.0377 (0.0377)  loss_box_reg: 0.1232 (0.1232)  loss_mask: 0.1439 (0.1439)  loss_objectness: 0.0022 (0.0022)  loss_rpn_box_reg: 0.0097 (0.0097)  time: 1.2113  data: 0.2601  max mem: 3158
Epoch: [1]  [10/60]  eta: 0:00:45  lr: 0.005000  loss: 0.3185 (0.3209)  loss_classifier: 0.0377 (0.0376)  loss_box_reg: 0.1053 (0.1058)  loss_mask: 0.1563 (0.1684)  loss_objectness: 0.0012 (0.0017)  loss_rpn_box_reg: 0.0064 (0.0073)  time: 0.9182  data: 0.0290  max mem: 3158
Epoch: [1]  [20/60]  eta: 0:00:36  lr: 0.005000  loss: 0.2989 (0.2902)  loss_classifier: 0.0338 (0.0358)  loss_box_reg: 0.0875 (0.0952)  loss_mask: 0.1456 (0.1517)  loss_objectness: 0.0009 (0.0017)  loss_rpn_box_reg: 0.0050 (0.0058)  time: 0.8946  data: 0.0062  max mem: 3158
Epoch: [1]  [30/60]  eta: 0:00:27  lr: 0.005000  loss: 0.2568 (0.2833)  loss_classifier: 0.0301 (0.0360)  loss_box_reg: 0.0836 (0.0912)  loss_mask: 0.1351 (0.1482)  loss_objectness: 0.0008 (0.0018)  loss_rpn_box_reg: 0.0031 (0.0061)  time: 0.8904  data: 0.0065  max mem: 3158
Epoch: [1]  [40/60]  eta: 0:00:17  lr: 0.005000  loss: 0.2630 (0.2794)  loss_classifier: 0.0335 (0.0363)  loss_box_reg: 0.0804 (0.0855)  loss_mask: 0.1381 (0.1497)  loss_objectness: 0.0020 (0.0022)  loss_rpn_box_reg: 0.0030 (0.0056)  time: 0.8667  data: 0.0065  max mem: 3158
Epoch: [1]  [50/60]  eta: 0:00:08  lr: 0.005000  loss: 0.2729 (0.2829)  loss_classifier: 0.0365 (0.0375)  loss_box_reg: 0.0685 (0.0860)  loss_mask: 0.1604 (0.1515)  loss_objectness: 0.0022 (0.0022)  loss_rpn_box_reg: 0.0031 (0.0056)  time: 0.8834  data: 0.0064  max mem: 3158
Epoch: [1]  [59/60]  eta: 0:00:00  lr: 0.005000  loss: 0.2930 (0.2816)  loss_classifier: 0.0486 (0.0381)  loss_box_reg: 0.0809 (0.0847)  loss_mask: 0.1466 (0.1511)  loss_objectness: 0.0012 (0.0021)  loss_rpn_box_reg: 0.0042 (0.0056)  time: 0.8855  data: 0.0064  max mem: 3158
Epoch: [1] Total time: 0:00:53 (0.8890 s / it)
creating index...
index created!
Test:  [ 0/50]  eta: 0:00:23  model_time: 0.2422 (0.2422)  evaluator_time: 0.0061 (0.0061)  time: 0.4774  data: 0.2283  max mem: 3158
Test:  [49/50]  eta: 0:00:00  model_time: 0.1712 (0.1832)  evaluator_time: 0.0051 (0.0066)  time: 0.1911  data: 0.0036  max mem: 3158
Test: Total time: 0:00:10 (0.2001 s / it)
Averaged stats: model_time: 0.1712 (0.1832)  evaluator_time: 0.0051 (0.0066)
Accumulating evaluation results...
DONE (t=0.01s).
Accumulating evaluation results...
DONE (t=0.01s).
IoU metric: bbox
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.791
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.981
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.961
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.368
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.673
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.809
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.361
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.826
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.826
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.500
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.800
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.838
IoU metric: segm
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.745
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.984
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.902
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.334
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.504
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.769
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.341
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.782
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.782
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.500
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.709
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.797
That's it!

因此，在训练一个周期后，我们获得了 COCO 风格的 mAP > 50，以及 65 的 mask mAP。

但是预测结果是什么样的呢？让我们看看数据集中的一张图片并验证

import matplotlib.pyplot as plt
from torchvision.utils import draw_bounding_boxes, draw_segmentation_masks
image = read_image("data/PennFudanPed/PNGImages/FudanPed00046.png")
eval_transform = get_transform(train=False)
model.eval()
with torch.no_grad():
    x = eval_transform(image)
    # convert RGBA -> RGB and move to device
    x = x[:3, ...].to(device)
    predictions = model([x, ])
    pred = predictions[0]
image = (255.0 * (image - image.min()) / (image.max() - image.min())).to(torch.uint8)
image = image[:3, ...]
pred_labels = [f"pedestrian: {score:.3f}" for label, score in zip(pred["labels"], pred["scores"])]
pred_boxes = pred["boxes"].long()
output_image = draw_bounding_boxes(image, pred_boxes, pred_labels, colors="red")
masks = (pred["masks"] > 0.7).squeeze(1)
output_image = draw_segmentation_masks(output_image, masks, alpha=0.5, colors="blue")
plt.figure(figsize=(12, 12))
plt.imshow(output_image.permute(1, 2, 0))

<matplotlib.image.AxesImage object at 0x7f48881f2830>

结果看起来不错！

总结

在本教程中，您已经学会了如何为自定义数据集创建自己的目标检测模型训练流程。为此，您编写了一个torch.utils.data.Dataset类，该类返回图像和真实边界框以及分割掩模。您还利用了一个在 COCO train2017 上预训练的 Mask R-CNN 模型，以便在这个新数据集上进行迁移学习。

要查看包括多机器/多 GPU 训练在内的更完整示例，请查看 references/detection/train.py，该文件位于 torchvision 仓库中。

脚本的总运行时间：（2 分钟 27.747 秒）

下载 Python 源代码：torchvision_tutorial.py

下载 Jupyter 笔记本：torchvision_tutorial.ipynb

Sphinx-Gallery 生成的画廊

计算机视觉迁移学习教程

原文：pytorch.org/tutorials/beginner/transfer_learning_tutorial.html

译者：飞龙

协议：CC BY-NC-SA 4.0

注意

点击这里下载完整的示例代码

作者：Sasank Chilamkurthy

在本教程中，您将学习如何使用迁移学习训练卷积神经网络进行图像分类。您可以在cs231n 笔记中阅读更多关于迁移学习的信息

引用这些笔记，

实际上，很少有人从头开始训练整个卷积网络（使用随机初始化），因为拥有足够大小的数据集相对较少。相反，通常是在非常大的数据集上预训练一个卷积网络（例如 ImageNet，其中包含 120 万张带有 1000 个类别的图像），然后将卷积网络用作感兴趣任务的初始化或固定特征提取器。

这两种主要的迁移学习场景如下：

微调卷积网络：与随机初始化不同，我们使用预训练网络来初始化网络，比如在 imagenet 1000 数据集上训练的网络。其余的训练看起来和往常一样。
卷积网络作为固定特征提取器：在这里，我们将冻结所有网络的权重，除了最后的全连接层之外。这个最后的全连接层被替换为一个具有随机权重的新层，只训练这一层。

# License: BSD
# Author: Sasank Chilamkurthy
import torch
import torch.nn as nn
import torch.optim as optim
from torch.optim import lr_scheduler
import torch.backends.cudnn as cudnn
import numpy as np
import torchvision
from torchvision import datasets, models, transforms
import matplotlib.pyplot as plt
import time
import os
from PIL import Image
from tempfile import TemporaryDirectory
cudnn.benchmark = True
plt.ion()   # interactive mode

<contextlib.ExitStack object at 0x7f6aede85450>

加载数据

我们将使用 torchvision 和 torch.utils.data 包来加载数据。

我们今天要解决的问题是训练一个模型来分类蚂蚁和蜜蜂。我们每类有大约 120 张蚂蚁和蜜蜂的训练图像。每个类别有 75 张验证图像。通常，如果从头开始训练，这是一个非常小的数据集来进行泛化。由于我们使用迁移学习，我们应该能够相当好地泛化。

这个数据集是 imagenet 的一个非常小的子集。

注意

从这里下载数据并将其解压到当前目录。

# Data augmentation and normalization for training
# Just normalization for validation
data_transforms = {
    'train': transforms.Compose([
        transforms.RandomResizedCrop(224),
        transforms.RandomHorizontalFlip(),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
    ]),
    'val': transforms.Compose([
        transforms.Resize(256),
        transforms.CenterCrop(224),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
    ]),
}
data_dir = 'data/hymenoptera_data'
image_datasets = {x: datasets.ImageFolder(os.path.join(data_dir, x),
                                          data_transforms[x])
                  for x in ['train', 'val']}
dataloaders = {x: torch.utils.data.DataLoader(image_datasets[x], batch_size=4,
                                             shuffle=True, num_workers=4)
              for x in ['train', 'val']}
dataset_sizes = {x: len(image_datasets[x]) for x in ['train', 'val']}
class_names = image_datasets['train'].classes
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

可视化一些图像

让我们可视化一些训练图像，以便了解数据增强。

def imshow(inp, title=None):
  """Display image for Tensor."""
    inp = inp.numpy().transpose((1, 2, 0))
    mean = np.array([0.485, 0.456, 0.406])
    std = np.array([0.229, 0.224, 0.225])
    inp = std * inp + mean
    inp = np.clip(inp, 0, 1)
    plt.imshow(inp)
    if title is not None:
        plt.title(title)
    plt.pause(0.001)  # pause a bit so that plots are updated
# Get a batch of training data
inputs, classes = next(iter(dataloaders['train']))
# Make a grid from batch
out = torchvision.utils.make_grid(inputs)
imshow(out, title=[class_names[x] for x in classes])

![[‘蚂蚁’，‘蚂蚁’，‘蚂蚁’，‘蚂蚁’]]（…/Images/be538c850b645a41a7a77ff388954e14.png）

PyTorch 2.2 中文官方教程（四）（4）https://developer.aliyun.com/article/1482496

PyTorch 2.2 中文官方教程（四）（3）

定义您的模型

1 - 从预训练模型微调

2 - 修改模型以添加不同的主干

PennFudan 数据集的目标检测和实例分割模型

将所有内容放在一起

测试 `forward()` 方法（可选）

总结

计算机视觉迁移学习教程

加载数据

可视化一些图像

热门文章

最新文章

相关课程

相关电子书

推荐镜像

PyTorch 2.2 中文官方教程（四）（3）

定义您的模型

1 - 从预训练模型微调

2 - 修改模型以添加不同的主干

PennFudan 数据集的目标检测和实例分割模型

将所有内容放在一起

测试 forward() 方法（可选）

总结

计算机视觉迁移学习教程

加载数据

可视化一些图像

热门文章

最新文章

相关课程

相关电子书

推荐镜像

测试 `forward()` 方法（可选）