PyTorch 2.2 中文官方教程(四)(2)https://developer.aliyun.com/article/1482493
定义您的模型
在本教程中,我们将使用基于Faster R-CNN的Mask R-CNN。Faster R-CNN 是一个模型,用于预测图像中潜在对象的边界框和类别分数。
Mask R-CNN 在 Faster R-CNN 中添加了一个额外的分支,还为每个实例预测分割蒙版。
有两种常见情况可能需要修改 TorchVision Model Zoo 中的可用模型之一。第一种情况是当我们想要从预训练模型开始,只微调最后一层时。另一种情况是当我们想要用不同的主干替换模型的主干时(例如为了更快的预测)。
让我们看看在以下部分中我们将如何执行其中一个或另一个。
1 - 从预训练模型微调
假设您想从在 COCO 上预训练的模型开始,并希望对其进行微调以适应您的特定类别。以下是可能的操作方式:
import torchvision from torchvision.models.detection.faster_rcnn import FastRCNNPredictor # load a model pre-trained on COCO model = torchvision.models.detection.fasterrcnn_resnet50_fpn(weights="DEFAULT") # replace the classifier with a new one, that has # num_classes which is user-defined num_classes = 2 # 1 class (person) + background # get number of input features for the classifier in_features = model.roi_heads.box_predictor.cls_score.in_features # replace the pre-trained head with a new one model.roi_heads.box_predictor = FastRCNNPredictor(in_features, num_classes)
Downloading: "https://download.pytorch.org/models/fasterrcnn_resnet50_fpn_coco-258fb6c6.pth" to /var/lib/jenkins/.cache/torch/hub/checkpoints/fasterrcnn_resnet50_fpn_coco-258fb6c6.pth 0%| | 0.00/160M [00:00<?, ?B/s] 8%|7 | 12.1M/160M [00:00<00:01, 127MB/s] 16%|#6 | 25.8M/160M [00:00<00:01, 137MB/s] 25%|##4 | 39.6M/160M [00:00<00:00, 140MB/s] 34%|###3 | 53.6M/160M [00:00<00:00, 143MB/s] 42%|####2 | 67.5M/160M [00:00<00:00, 144MB/s] 51%|##### | 81.4M/160M [00:00<00:00, 145MB/s] 60%|#####9 | 95.4M/160M [00:00<00:00, 145MB/s] 68%|######8 | 109M/160M [00:00<00:00, 145MB/s] 77%|#######7 | 123M/160M [00:00<00:00, 146MB/s] 86%|########5 | 137M/160M [00:01<00:00, 146MB/s] 95%|#########4| 151M/160M [00:01<00:00, 146MB/s] 100%|##########| 160M/160M [00:01<00:00, 144MB/s]
2 - 修改模型以添加不同的主干
import torchvision from torchvision.models.detection import FasterRCNN from torchvision.models.detection.rpn import AnchorGenerator # load a pre-trained model for classification and return # only the features backbone = torchvision.models.mobilenet_v2(weights="DEFAULT").features # ``FasterRCNN`` needs to know the number of # output channels in a backbone. For mobilenet_v2, it's 1280 # so we need to add it here backbone.out_channels = 1280 # let's make the RPN generate 5 x 3 anchors per spatial # location, with 5 different sizes and 3 different aspect # ratios. We have a Tuple[Tuple[int]] because each feature # map could potentially have different sizes and # aspect ratios anchor_generator = AnchorGenerator( sizes=((32, 64, 128, 256, 512),), aspect_ratios=((0.5, 1.0, 2.0),) ) # let's define what are the feature maps that we will # use to perform the region of interest cropping, as well as # the size of the crop after rescaling. # if your backbone returns a Tensor, featmap_names is expected to # be [0]. More generally, the backbone should return an # ``OrderedDict[Tensor]``, and in ``featmap_names`` you can choose which # feature maps to use. roi_pooler = torchvision.ops.MultiScaleRoIAlign( featmap_names=['0'], output_size=7, sampling_ratio=2 ) # put the pieces together inside a Faster-RCNN model model = FasterRCNN( backbone, num_classes=2, rpn_anchor_generator=anchor_generator, box_roi_pool=roi_pooler )
Downloading: "https://download.pytorch.org/models/mobilenet_v2-7ebf99e0.pth" to /var/lib/jenkins/.cache/torch/hub/checkpoints/mobilenet_v2-7ebf99e0.pth 0%| | 0.00/13.6M [00:00<?, ?B/s] 92%|#########1| 12.5M/13.6M [00:00<00:00, 131MB/s] 100%|##########| 13.6M/13.6M [00:00<00:00, 131MB/s]
PennFudan 数据集的目标检测和实例分割模型
在我们的情况下,我们希望从预训练模型进行微调,鉴于我们的数据集非常小,因此我们将遵循第一种方法。
在这里,我们还想计算实例分割掩模,因此我们将使用 Mask R-CNN:
import torchvision from torchvision.models.detection.faster_rcnn import FastRCNNPredictor from torchvision.models.detection.mask_rcnn import MaskRCNNPredictor def get_model_instance_segmentation(num_classes): # load an instance segmentation model pre-trained on COCO model = torchvision.models.detection.maskrcnn_resnet50_fpn(weights="DEFAULT") # get number of input features for the classifier in_features = model.roi_heads.box_predictor.cls_score.in_features # replace the pre-trained head with a new one model.roi_heads.box_predictor = FastRCNNPredictor(in_features, num_classes) # now get the number of input features for the mask classifier in_features_mask = model.roi_heads.mask_predictor.conv5_mask.in_channels hidden_layer = 256 # and replace the mask predictor with a new one model.roi_heads.mask_predictor = MaskRCNNPredictor( in_features_mask, hidden_layer, num_classes ) return model
这样,model
就准备好在您的自定义数据集上进行训练和评估了。
将所有内容放在一起
在 references/detection/
中,我们有许多辅助函数来简化训练和评估检测模型。在这里,我们将使用 references/detection/engine.py
和 references/detection/utils.py
。只需将 references/detection
下的所有内容下载到您的文件夹中并在此处使用它们。在 Linux 上,如果您有 wget
,您可以使用以下命令下载它们:
os.system("wget https://raw.githubusercontent.com/pytorch/vision/main/references/detection/engine.py") os.system("wget https://raw.githubusercontent.com/pytorch/vision/main/references/detection/utils.py") os.system("wget https://raw.githubusercontent.com/pytorch/vision/main/references/detection/coco_utils.py") os.system("wget https://raw.githubusercontent.com/pytorch/vision/main/references/detection/coco_eval.py") os.system("wget https://raw.githubusercontent.com/pytorch/vision/main/references/detection/transforms.py")
0
自 v0.15.0 起,torchvision 提供了新的 Transforms API,以便为目标检测和分割任务轻松编写数据增强流水线。
让我们编写一些辅助函数用于数据增强/转换:
from torchvision.transforms import v2 as T def get_transform(train): transforms = [] if train: transforms.append(T.RandomHorizontalFlip(0.5)) transforms.append(T.ToDtype(torch.float, scale=True)) transforms.append(T.ToPureTensor()) return T.Compose(transforms)
测试 forward()
方法(可选)
在迭代数据集之前,查看模型在训练和推断时对样本数据的期望是很好的。
import utils model = torchvision.models.detection.fasterrcnn_resnet50_fpn(weights="DEFAULT") dataset = PennFudanDataset('data/PennFudanPed', get_transform(train=True)) data_loader = torch.utils.data.DataLoader( dataset, batch_size=2, shuffle=True, num_workers=4, collate_fn=utils.collate_fn ) # For Training images, targets = next(iter(data_loader)) images = list(image for image in images) targets = [{k: v for k, v in t.items()} for t in targets] output = model(images, targets) # Returns losses and detections print(output) # For inference model.eval() x = [torch.rand(3, 300, 400), torch.rand(3, 500, 400)] predictions = model(x) # Returns predictions print(predictions[0])
{'loss_classifier': tensor(0.0689, grad_fn=<NllLossBackward0>), 'loss_box_reg': tensor(0.0268, grad_fn=<DivBackward0>), 'loss_objectness': tensor(0.0055, grad_fn=<BinaryCrossEntropyWithLogitsBackward0>), 'loss_rpn_box_reg': tensor(0.0036, grad_fn=<DivBackward0>)} {'boxes': tensor([], size=(0, 4), grad_fn=<StackBackward0>), 'labels': tensor([], dtype=torch.int64), 'scores': tensor([], grad_fn=<IndexBackward0>)}
现在让我们编写执行训练和验证的主要函数:
from engine import train_one_epoch, evaluate # train on the GPU or on the CPU, if a GPU is not available device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu') # our dataset has two classes only - background and person num_classes = 2 # use our dataset and defined transformations dataset = PennFudanDataset('data/PennFudanPed', get_transform(train=True)) dataset_test = PennFudanDataset('data/PennFudanPed', get_transform(train=False)) # split the dataset in train and test set indices = torch.randperm(len(dataset)).tolist() dataset = torch.utils.data.Subset(dataset, indices[:-50]) dataset_test = torch.utils.data.Subset(dataset_test, indices[-50:]) # define training and validation data loaders data_loader = torch.utils.data.DataLoader( dataset, batch_size=2, shuffle=True, num_workers=4, collate_fn=utils.collate_fn ) data_loader_test = torch.utils.data.DataLoader( dataset_test, batch_size=1, shuffle=False, num_workers=4, collate_fn=utils.collate_fn ) # get the model using our helper function model = get_model_instance_segmentation(num_classes) # move model to the right device model.to(device) # construct an optimizer params = [p for p in model.parameters() if p.requires_grad] optimizer = torch.optim.SGD( params, lr=0.005, momentum=0.9, weight_decay=0.0005 ) # and a learning rate scheduler lr_scheduler = torch.optim.lr_scheduler.StepLR( optimizer, step_size=3, gamma=0.1 ) # let's train it just for 2 epochs num_epochs = 2 for epoch in range(num_epochs): # train for one epoch, printing every 10 iterations train_one_epoch(model, optimizer, data_loader, device, epoch, print_freq=10) # update the learning rate lr_scheduler.step() # evaluate on the test dataset evaluate(model, data_loader_test, device=device) print("That's it!")
Downloading: "https://download.pytorch.org/models/maskrcnn_resnet50_fpn_coco-bf2d0c1e.pth" to /var/lib/jenkins/.cache/torch/hub/checkpoints/maskrcnn_resnet50_fpn_coco-bf2d0c1e.pth 0%| | 0.00/170M [00:00<?, ?B/s] 8%|7 | 12.7M/170M [00:00<00:01, 134MB/s] 16%|#5 | 26.8M/170M [00:00<00:01, 142MB/s] 24%|##3 | 40.7M/170M [00:00<00:00, 144MB/s] 32%|###2 | 54.8M/170M [00:00<00:00, 145MB/s] 41%|#### | 68.9M/170M [00:00<00:00, 146MB/s] 49%|####8 | 83.0M/170M [00:00<00:00, 147MB/s] 57%|#####7 | 97.1M/170M [00:00<00:00, 147MB/s] 65%|######5 | 111M/170M [00:00<00:00, 147MB/s] 74%|#######3 | 125M/170M [00:00<00:00, 148MB/s] 82%|########2 | 140M/170M [00:01<00:00, 148MB/s] 90%|######### | 154M/170M [00:01<00:00, 148MB/s] 99%|#########8| 168M/170M [00:01<00:00, 148MB/s] 100%|##########| 170M/170M [00:01<00:00, 147MB/s] Epoch: [0] [ 0/60] eta: 0:02:32 lr: 0.000090 loss: 3.8792 (3.8792) loss_classifier: 0.4863 (0.4863) loss_box_reg: 0.2543 (0.2543) loss_mask: 3.1288 (3.1288) loss_objectness: 0.0043 (0.0043) loss_rpn_box_reg: 0.0055 (0.0055) time: 2.5479 data: 0.2985 max mem: 2783 Epoch: [0] [10/60] eta: 0:00:52 lr: 0.000936 loss: 1.7038 (2.3420) loss_classifier: 0.3913 (0.3626) loss_box_reg: 0.2683 (0.2687) loss_mask: 1.1038 (1.6881) loss_objectness: 0.0204 (0.0184) loss_rpn_box_reg: 0.0049 (0.0043) time: 1.0576 data: 0.0315 max mem: 3158 Epoch: [0] [20/60] eta: 0:00:39 lr: 0.001783 loss: 0.9972 (1.5790) loss_classifier: 0.2425 (0.2735) loss_box_reg: 0.2683 (0.2756) loss_mask: 0.3489 (1.0043) loss_objectness: 0.0127 (0.0184) loss_rpn_box_reg: 0.0051 (0.0072) time: 0.9143 data: 0.0057 max mem: 3158 Epoch: [0] [30/60] eta: 0:00:28 lr: 0.002629 loss: 0.5966 (1.2415) loss_classifier: 0.0979 (0.2102) loss_box_reg: 0.2580 (0.2584) loss_mask: 0.2155 (0.7493) loss_objectness: 0.0119 (0.0165) loss_rpn_box_reg: 0.0057 (0.0071) time: 0.9036 data: 0.0065 max mem: 3158 Epoch: [0] [40/60] eta: 0:00:18 lr: 0.003476 loss: 0.5234 (1.0541) loss_classifier: 0.0737 (0.1749) loss_box_reg: 0.2241 (0.2505) loss_mask: 0.1796 (0.6080) loss_objectness: 0.0055 (0.0135) loss_rpn_box_reg: 0.0047 (0.0071) time: 0.8759 data: 0.0064 max mem: 3158 Epoch: [0] [50/60] eta: 0:00:09 lr: 0.004323 loss: 0.3642 (0.9195) loss_classifier: 0.0435 (0.1485) loss_box_reg: 0.1648 (0.2312) loss_mask: 0.1585 (0.5217) loss_objectness: 0.0025 (0.0113) loss_rpn_box_reg: 0.0047 (0.0069) time: 0.8693 data: 0.0065 max mem: 3158 Epoch: [0] [59/60] eta: 0:00:00 lr: 0.005000 loss: 0.3504 (0.8381) loss_classifier: 0.0379 (0.1339) loss_box_reg: 0.1343 (0.2178) loss_mask: 0.1585 (0.4690) loss_objectness: 0.0011 (0.0102) loss_rpn_box_reg: 0.0048 (0.0071) time: 0.8884 data: 0.0066 max mem: 3158 Epoch: [0] Total time: 0:00:55 (0.9230 s / it) creating index... index created! Test: [ 0/50] eta: 0:00:23 model_time: 0.2550 (0.2550) evaluator_time: 0.0066 (0.0066) time: 0.4734 data: 0.2107 max mem: 3158 Test: [49/50] eta: 0:00:00 model_time: 0.1697 (0.1848) evaluator_time: 0.0057 (0.0078) time: 0.1933 data: 0.0034 max mem: 3158 Test: Total time: 0:00:10 (0.2022 s / it) Averaged stats: model_time: 0.1697 (0.1848) evaluator_time: 0.0057 (0.0078) Accumulating evaluation results... DONE (t=0.02s). Accumulating evaluation results... DONE (t=0.02s). IoU metric: bbox Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.686 Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.974 Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.802 Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.322 Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.611 Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.708 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.314 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.738 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.739 Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.400 Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.727 Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.750 IoU metric: segm Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.697 Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.979 Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.871 Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.339 Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.332 Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.719 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.314 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.736 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.737 Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.600 Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.709 Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.744 Epoch: [1] [ 0/60] eta: 0:01:12 lr: 0.005000 loss: 0.3167 (0.3167) loss_classifier: 0.0377 (0.0377) loss_box_reg: 0.1232 (0.1232) loss_mask: 0.1439 (0.1439) loss_objectness: 0.0022 (0.0022) loss_rpn_box_reg: 0.0097 (0.0097) time: 1.2113 data: 0.2601 max mem: 3158 Epoch: [1] [10/60] eta: 0:00:45 lr: 0.005000 loss: 0.3185 (0.3209) loss_classifier: 0.0377 (0.0376) loss_box_reg: 0.1053 (0.1058) loss_mask: 0.1563 (0.1684) loss_objectness: 0.0012 (0.0017) loss_rpn_box_reg: 0.0064 (0.0073) time: 0.9182 data: 0.0290 max mem: 3158 Epoch: [1] [20/60] eta: 0:00:36 lr: 0.005000 loss: 0.2989 (0.2902) loss_classifier: 0.0338 (0.0358) loss_box_reg: 0.0875 (0.0952) loss_mask: 0.1456 (0.1517) loss_objectness: 0.0009 (0.0017) loss_rpn_box_reg: 0.0050 (0.0058) time: 0.8946 data: 0.0062 max mem: 3158 Epoch: [1] [30/60] eta: 0:00:27 lr: 0.005000 loss: 0.2568 (0.2833) loss_classifier: 0.0301 (0.0360) loss_box_reg: 0.0836 (0.0912) loss_mask: 0.1351 (0.1482) loss_objectness: 0.0008 (0.0018) loss_rpn_box_reg: 0.0031 (0.0061) time: 0.8904 data: 0.0065 max mem: 3158 Epoch: [1] [40/60] eta: 0:00:17 lr: 0.005000 loss: 0.2630 (0.2794) loss_classifier: 0.0335 (0.0363) loss_box_reg: 0.0804 (0.0855) loss_mask: 0.1381 (0.1497) loss_objectness: 0.0020 (0.0022) loss_rpn_box_reg: 0.0030 (0.0056) time: 0.8667 data: 0.0065 max mem: 3158 Epoch: [1] [50/60] eta: 0:00:08 lr: 0.005000 loss: 0.2729 (0.2829) loss_classifier: 0.0365 (0.0375) loss_box_reg: 0.0685 (0.0860) loss_mask: 0.1604 (0.1515) loss_objectness: 0.0022 (0.0022) loss_rpn_box_reg: 0.0031 (0.0056) time: 0.8834 data: 0.0064 max mem: 3158 Epoch: [1] [59/60] eta: 0:00:00 lr: 0.005000 loss: 0.2930 (0.2816) loss_classifier: 0.0486 (0.0381) loss_box_reg: 0.0809 (0.0847) loss_mask: 0.1466 (0.1511) loss_objectness: 0.0012 (0.0021) loss_rpn_box_reg: 0.0042 (0.0056) time: 0.8855 data: 0.0064 max mem: 3158 Epoch: [1] Total time: 0:00:53 (0.8890 s / it) creating index... index created! Test: [ 0/50] eta: 0:00:23 model_time: 0.2422 (0.2422) evaluator_time: 0.0061 (0.0061) time: 0.4774 data: 0.2283 max mem: 3158 Test: [49/50] eta: 0:00:00 model_time: 0.1712 (0.1832) evaluator_time: 0.0051 (0.0066) time: 0.1911 data: 0.0036 max mem: 3158 Test: Total time: 0:00:10 (0.2001 s / it) Averaged stats: model_time: 0.1712 (0.1832) evaluator_time: 0.0051 (0.0066) Accumulating evaluation results... DONE (t=0.01s). Accumulating evaluation results... DONE (t=0.01s). IoU metric: bbox Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.791 Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.981 Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.961 Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.368 Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.673 Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.809 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.361 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.826 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.826 Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.500 Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.800 Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.838 IoU metric: segm Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.745 Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.984 Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.902 Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.334 Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.504 Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.769 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.341 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.782 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.782 Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.500 Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.709 Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.797 That's it!
因此,在训练一个周期后,我们获得了 COCO 风格的 mAP > 50,以及 65 的 mask mAP。
但是预测结果是什么样的呢?让我们看看数据集中的一张图片并验证
import matplotlib.pyplot as plt from torchvision.utils import draw_bounding_boxes, draw_segmentation_masks image = read_image("data/PennFudanPed/PNGImages/FudanPed00046.png") eval_transform = get_transform(train=False) model.eval() with torch.no_grad(): x = eval_transform(image) # convert RGBA -> RGB and move to device x = x[:3, ...].to(device) predictions = model([x, ]) pred = predictions[0] image = (255.0 * (image - image.min()) / (image.max() - image.min())).to(torch.uint8) image = image[:3, ...] pred_labels = [f"pedestrian: {score:.3f}" for label, score in zip(pred["labels"], pred["scores"])] pred_boxes = pred["boxes"].long() output_image = draw_bounding_boxes(image, pred_boxes, pred_labels, colors="red") masks = (pred["masks"] > 0.7).squeeze(1) output_image = draw_segmentation_masks(output_image, masks, alpha=0.5, colors="blue") plt.figure(figsize=(12, 12)) plt.imshow(output_image.permute(1, 2, 0))
<matplotlib.image.AxesImage object at 0x7f48881f2830>
结果看起来不错!
总结
在本教程中,您已经学会了如何为自定义数据集创建自己的目标检测模型训练流程。为此,您编写了一个torch.utils.data.Dataset
类,该类返回图像和真实边界框以及分割掩模。您还利用了一个在 COCO train2017 上预训练的 Mask R-CNN 模型,以便在这个新数据集上进行迁移学习。
要查看包括多机器/多 GPU 训练在内的更完整示例,请查看 references/detection/train.py
,该文件位于 torchvision 仓库中。
脚本的总运行时间:(2 分钟 27.747 秒)
下载 Python 源代码:torchvision_tutorial.py
下载 Jupyter 笔记本:torchvision_tutorial.ipynb
计算机视觉迁移学习教程
原文:
pytorch.org/tutorials/beginner/transfer_learning_tutorial.html
译者:飞龙
注意
点击这里下载完整的示例代码
在本教程中,您将学习如何使用迁移学习训练卷积神经网络进行图像分类。您可以在cs231n 笔记中阅读更多关于迁移学习的信息
引用这些笔记,
实际上,很少有人从头开始训练整个卷积网络(使用随机初始化),因为拥有足够大小的数据集相对较少。相反,通常是在非常大的数据集上预训练一个卷积网络(例如 ImageNet,其中包含 120 万张带有 1000 个类别的图像),然后将卷积网络用作感兴趣任务的初始化或固定特征提取器。
这两种主要的迁移学习场景如下:
- 微调卷积网络:与随机初始化不同,我们使用预训练网络来初始化网络,比如在 imagenet 1000 数据集上训练的网络。其余的训练看起来和往常一样。
- 卷积网络作为固定特征提取器:在这里,我们将冻结所有网络的权重,除了最后的全连接层之外。这个最后的全连接层被替换为一个具有随机权重的新层,只训练这一层。
# License: BSD # Author: Sasank Chilamkurthy import torch import torch.nn as nn import torch.optim as optim from torch.optim import lr_scheduler import torch.backends.cudnn as cudnn import numpy as np import torchvision from torchvision import datasets, models, transforms import matplotlib.pyplot as plt import time import os from PIL import Image from tempfile import TemporaryDirectory cudnn.benchmark = True plt.ion() # interactive mode
<contextlib.ExitStack object at 0x7f6aede85450>
加载数据
我们将使用 torchvision 和 torch.utils.data 包来加载数据。
我们今天要解决的问题是训练一个模型来分类蚂蚁和蜜蜂。我们每类有大约 120 张蚂蚁和蜜蜂的训练图像。每个类别有 75 张验证图像。通常,如果从头开始训练,这是一个非常小的数据集来进行泛化。由于我们使用迁移学习,我们应该能够相当好地泛化。
这个数据集是 imagenet 的一个非常小的子集。
注意
从这里下载数据并将其解压到当前目录。
# Data augmentation and normalization for training # Just normalization for validation data_transforms = { 'train': transforms.Compose([ transforms.RandomResizedCrop(224), transforms.RandomHorizontalFlip(), transforms.ToTensor(), transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]) ]), 'val': transforms.Compose([ transforms.Resize(256), transforms.CenterCrop(224), transforms.ToTensor(), transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]) ]), } data_dir = 'data/hymenoptera_data' image_datasets = {x: datasets.ImageFolder(os.path.join(data_dir, x), data_transforms[x]) for x in ['train', 'val']} dataloaders = {x: torch.utils.data.DataLoader(image_datasets[x], batch_size=4, shuffle=True, num_workers=4) for x in ['train', 'val']} dataset_sizes = {x: len(image_datasets[x]) for x in ['train', 'val']} class_names = image_datasets['train'].classes device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
可视化一些图像
让我们可视化一些训练图像,以便了解数据增强。
def imshow(inp, title=None): """Display image for Tensor.""" inp = inp.numpy().transpose((1, 2, 0)) mean = np.array([0.485, 0.456, 0.406]) std = np.array([0.229, 0.224, 0.225]) inp = std * inp + mean inp = np.clip(inp, 0, 1) plt.imshow(inp) if title is not None: plt.title(title) plt.pause(0.001) # pause a bit so that plots are updated # Get a batch of training data inputs, classes = next(iter(dataloaders['train'])) # Make a grid from batch out = torchvision.utils.make_grid(inputs) imshow(out, title=[class_names[x] for x in classes])
![[‘蚂蚁’,‘蚂蚁’,‘蚂蚁’,‘蚂蚁’]](…/Images/be538c850b645a41a7a77ff388954e14.png)
PyTorch 2.2 中文官方教程(四)(4)https://developer.aliyun.com/article/1482496