话不多说 直接上结果! 上图为效果图!
1.定义数据集
用于训练目标检测、实例分割和人员关键点检测的参考脚本允许轻松地支持添加新的自定义数据集。数据集应该从标准 torch.utils.data 继承。类,并实现 _ _ len _ _ 和 _ _ getitem _ _。
我们需要的唯一特性是数据集 _ _ getitem _ _ 应该返回:
image:PIL大小图像(高、宽)
target:包含以下字段的dict
boxes(FloatTensor[N,4]):[x0,y0,x1,y1]格式的N个边界框的坐标,范围从0到W,从0到H
labels(Int64Tensor[N]):每个边界框的标签。0始终表示背景类。
image_id(Int64Tensor[1]):图像标识符。它在数据集中的所有图像之间应该是唯一的,并且在评估期间使用
面积(张量[N]):边界框的面积。这在使用COCO指标进行评估时使用,以区分小、中、大框之间的指标得分。
iscrowd(UInt8Tensor[N]):iscrowd=True的实例将在评估期间被忽略。
(可选)掩码(UInt8Tensor[N,H,W]):每个对象的分段掩码
(可选)关键点(FloatTensor[N,K,3]):对于N个对象中的每一个,它包含[x,y,visibility]格式的K个关键点,用于定义对象。visibility=0表示关键点不可见。请注意,对于数据增强,翻转关键点的概念取决于数据表示,您可能应该调整引用/检测/转换。py表示新的关键点表示
如果您的模型返回上述方法,他们将使其既可用于培训,也可用于评估,并将使用来自 pycotools 的评估脚本,这些脚本可以与 pip install pycotools 一起安装。下列指令安装该库
pip install git+https://github.com/gautamchitnis/cocoapi.git@cocodataset-master#subdirectory=PythonAPI
import os import numpy as np import torch from PIL import Image class PennFudanDataset(torch.utils.data.Dataset): def __init__(self, root, transforms): self.root = root self.transforms = transforms # load all image files, sorting them to # ensure that they are aligned self.imgs = list(sorted(os.listdir(os.path.join(root, "PNGImages")))) self.masks = list(sorted(os.listdir(os.path.join(root, "PedMasks")))) def __getitem__(self, idx): # load images and masks img_path = os.path.join(self.root, "PNGImages", self.imgs[idx]) mask_path = os.path.join(self.root, "PedMasks", self.masks[idx]) img = Image.open(img_path).convert("RGB") # note that we haven't converted the mask to RGB, # because each color corresponds to a different instance # with 0 being background mask = Image.open(mask_path) # convert the PIL Image into a numpy array mask = np.array(mask) # instances are encoded as different colors obj_ids = np.unique(mask) # first id is the background, so remove it obj_ids = obj_ids[1:] # split the color-encoded mask into a set # of binary masks masks = mask == obj_ids[:, None, None] # get bounding box coordinates for each mask num_objs = len(obj_ids) boxes = [] for i in range(num_objs): pos = np.where(masks[i]) xmin = np.min(pos[1]) xmax = np.max(pos[1]) ymin = np.min(pos[0]) ymax = np.max(pos[0]) boxes.append([xmin, ymin, xmax, ymax]) # convert everything into a torch.Tensor boxes = torch.as_tensor(boxes, dtype=torch.float32) # there is only one class labels = torch.ones((num_objs,), dtype=torch.int64) masks = torch.as_tensor(masks, dtype=torch.uint8) image_id = torch.tensor([idx]) area = (boxes[:, 3] - boxes[:, 1]) * (boxes[:, 2] - boxes[:, 0]) # suppose all instances are not crowd iscrowd = torch.zeros((num_objs,), dtype=torch.int64) target = {} target["boxes"] = boxes target["labels"] = labels target["masks"] = masks target["image_id"] = image_id target["area"] = area target["iscrowd"] = iscrowd if self.transforms is not None: img, target = self.transforms(img, target) return img, target def __len__(self): return len(self.imgs)
2.定义模型
网络结构图大致如上图所示。
有两种常见的情况可能需要修改 Torchvision model zoo 中的一个可用模型。第一个是当我们想要从一个预先训练的模型开始,只是微调最后一层。另一个是当我们想要用一个不同的模型(例如,为了更快的预测)来代替模型的主干。
1.从预先训练的模型进行微调
让我们假设您希望从一个预先接受过 COCO 训练的模型开始,并希望为您的特定类对其进行微调。这里有一个可行的方法:
import torchvision from torchvision.models.detection.faster_rcnn import FastRCNNPredictor # load a model pre-trained on COCO model = torchvision.models.detection.fasterrcnn_resnet50_fpn(weights="DEFAULT") # replace the classifier with a new one, that has # num_classes which is user-defined num_classes = 2 # 1 class (person) + background # get number of input features for the classifier in_features = model.roi_heads.box_predictor.cls_score.in_features # replace the pre-trained head with a new one model.roi_heads.box_predictor = FastRCNNPredictor(in_features, num_classes)
2.修改模型以添加不同的主干
import torchvision from torchvision.models.detection import FasterRCNN from torchvision.models.detection.rpn import AnchorGenerator # load a pre-trained model for classification and return # only the features backbone = torchvision.models.mobilenet_v2(weights="DEFAULT").features # FasterRCNN needs to know the number of # output channels in a backbone. For mobilenet_v2, it's 1280 # so we need to add it here backbone.out_channels = 1280 # let's make the RPN generate 5 x 3 anchors per spatial # location, with 5 different sizes and 3 different aspect # ratios. We have a Tuple[Tuple[int]] because each feature # map could potentially have different sizes and # aspect ratios anchor_generator = AnchorGenerator(sizes=((32, 64, 128, 256, 512),), aspect_ratios=((0.5, 1.0, 2.0),)) # let's define what are the feature maps that we will # use to perform the region of interest cropping, as well as # the size of the crop after rescaling. # if your backbone returns a Tensor, featmap_names is expected to # be [0]. More generally, the backbone should return an # OrderedDict[Tensor], and in featmap_names you can choose which # feature maps to use. roi_pooler = torchvision.ops.MultiScaleRoIAlign(featmap_names=['0'], output_size=7, sampling_ratio=2) # put the pieces together inside a FasterRCNN model model = FasterRCNN(backbone, num_classes=2, rpn_anchor_generator=anchor_generator, box_roi_pool=roi_pooler)
一种基于 PennFudan 数据集的实例分割模型
import torchvision from torchvision.models.detection.faster_rcnn import FastRCNNPredictor from torchvision.models.detection.mask_rcnn import MaskRCNNPredictor def get_model_instance_segmentation(num_classes): # load an instance segmentation model pre-trained on COCO model = torchvision.models.detection.maskrcnn_resnet50_fpn(weights="DEFAULT") # get number of input features for the classifier in_features = model.roi_heads.box_predictor.cls_score.in_features # replace the pre-trained head with a new one model.roi_heads.box_predictor = FastRCNNPredictor(in_features, num_classes) # now get the number of input features for the mask classifier in_features_mask = model.roi_heads.mask_predictor.conv5_mask.in_channels hidden_layer = 256 # and replace the mask predictor with a new one model.roi_heads.mask_predictor = MaskRCNNPredictor(in_features_mask, hidden_layer, num_classes) return model
3.执行训练和验证的 main 函数:
from engine import train_one_epoch, evaluate import utils def main(): # train on the GPU or on the CPU, if a GPU is not available device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu') # our dataset has two classes only - background and person num_classes = 2 # use our dataset and defined transformations dataset = PennFudanDataset('PennFudanPed', get_transform(train=True)) dataset_test = PennFudanDataset('PennFudanPed', get_transform(train=False)) # split the dataset in train and test set indices = torch.randperm(len(dataset)).tolist() dataset = torch.utils.data.Subset(dataset, indices[:-50]) dataset_test = torch.utils.data.Subset(dataset_test, indices[-50:]) # define training and validation data loaders data_loader = torch.utils.data.DataLoader( dataset, batch_size=2, shuffle=True, num_workers=4, collate_fn=utils.collate_fn) data_loader_test = torch.utils.data.DataLoader( dataset_test, batch_size=1, shuffle=False, num_workers=4, collate_fn=utils.collate_fn) # get the model using our helper function model = get_model_instance_segmentation(num_classes) # move model to the right device model.to(device) # construct an optimizer params = [p for p in model.parameters() if p.requires_grad] optimizer = torch.optim.SGD(params, lr=0.005, momentum=0.9, weight_decay=0.0005) # and a learning rate scheduler lr_scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=3, gamma=0.1) # let's train it for 10 epochs num_epochs = 10 for epoch in range(num_epochs): # train for one epoch, printing every 10 iterations train_one_epoch(model, optimizer, data_loader, device, epoch, print_freq=10) # update the learning rate lr_scheduler.step() # evaluate on the test dataset evaluate(model, data_loader_test, device=device) print("That's it!")
4. 结果
因此,经过一个时代的训练,我们获得了60.6的 COCO 样式 mAP 和70.4的掩膜 mAP。
但是这些预测是什么样子的呢? 让我们在数据集中找一张图片来验证一下
5.结论
在本教程中,您已经学习了如何在自定义数据集上为实例分段模型创建自己的训练管道。为此,您编写了 torch.utils.data。类,该类返回图像、地面真值框和分割掩码。您还利用了在 COCO train 2017上预先训练的蒙版 R-CNN 模型,以便在这个新数据集上执行迁移学习。