Transformers 4.37 中文文档(五)(1)https://developer.aliyun.com/article/1565241
训练 DETR 模型
在前几节中,您已经完成了大部分繁重的工作,现在您已经准备好训练您的模型了!即使在调整大小后,此数据集中的图像仍然相当大。这意味着微调此模型将需要至少一个 GPU。
训练包括以下步骤:
- 使用与预处理中相同的检查点加载模型 AutoModelForObjectDetection。
- 在 TrainingArguments 中定义您的训练超参数。
- 将训练参数传递给 Trainer,以及模型、数据集、图像处理器和数据整理器。
- 调用 train()来微调您的模型。
在从用于预处理的相同检查点加载模型时,请记住传递您从数据集元数据中创建的label2id
和id2label
映射。此外,我们指定ignore_mismatched_sizes=True
以用新的替换现有的分类头。
>>> from transformers import AutoModelForObjectDetection >>> model = AutoModelForObjectDetection.from_pretrained( ... checkpoint, ... id2label=id2label, ... label2id=label2id, ... ignore_mismatched_sizes=True, ... )
在 TrainingArguments 中使用output_dir
指定保存模型的位置,然后根据需要配置超参数。重要的是不要删除未使用的列,因为这将删除图像列。没有图像列,您无法创建pixel_values
。因此,将remove_unused_columns
设置为False
。如果希望通过将其推送到 Hub 来共享您的模型,请将push_to_hub
设置为True
(您必须登录到 Hugging Face 才能上传您的模型)。
>>> from transformers import TrainingArguments >>> training_args = TrainingArguments( ... output_dir="detr-resnet-50_finetuned_cppe5", ... per_device_train_batch_size=8, ... num_train_epochs=10, ... fp16=True, ... save_steps=200, ... logging_steps=50, ... learning_rate=1e-5, ... weight_decay=1e-4, ... save_total_limit=2, ... remove_unused_columns=False, ... push_to_hub=True, ... )
最后,将所有内容汇总,并调用 train():
>>> from transformers import Trainer >>> trainer = Trainer( ... model=model, ... args=training_args, ... data_collator=collate_fn, ... train_dataset=cppe5["train"], ... tokenizer=image_processor, ... ) >>> trainer.train()
如果在training_args
中将push_to_hub
设置为True
,则训练检查点将被推送到 Hugging Face Hub。在训练完成后,通过调用 push_to_hub()方法将最终模型也推送到 Hub。
>>> trainer.push_to_hub()
评估
目标检测模型通常使用一组COCO 风格指标进行评估。您可以使用现有的指标实现之一,但在这里,您将使用来自torchvision
的指标来评估推送到 Hub 的最终模型。
要使用torchvision
评估器,您需要准备一个真实的 COCO 数据集。构建 COCO 数据集的 API 要求数据以特定格式存储,因此您需要首先将图像和注释保存到磁盘上。就像您为训练准备数据时一样,来自cppe5["test"]
的注释需要进行格式化。但是,图像应保持原样。
评估步骤需要一些工作,但可以分为三个主要步骤。首先,准备cppe5["test"]
集:格式化注释并将数据保存到磁盘上。
>>> import json >>> # format annotations the same as for training, no need for data augmentation >>> def val_formatted_anns(image_id, objects): ... annotations = [] ... for i in range(0, len(objects["id"])): ... new_ann = { ... "id": objects["id"][i], ... "category_id": objects["category"][i], ... "iscrowd": 0, ... "image_id": image_id, ... "area": objects["area"][i], ... "bbox": objects["bbox"][i], ... } ... annotations.append(new_ann) ... return annotations >>> # Save images and annotations into the files torchvision.datasets.CocoDetection expects >>> def save_cppe5_annotation_file_images(cppe5): ... output_json = {} ... path_output_cppe5 = f"{os.getcwd()}/cppe5/" ... if not os.path.exists(path_output_cppe5): ... os.makedirs(path_output_cppe5) ... path_anno = os.path.join(path_output_cppe5, "cppe5_ann.json") ... categories_json = [{"supercategory": "none", "id": id, "name": id2label[id]} for id in id2label] ... output_json["images"] = [] ... output_json["annotations"] = [] ... for example in cppe5: ... ann = val_formatted_anns(example["image_id"], example["objects"]) ... output_json["images"].append( ... { ... "id": example["image_id"], ... "width": example["image"].width, ... "height": example["image"].height, ... "file_name": f"{example['image_id']}.png", ... } ... ) ... output_json["annotations"].extend(ann) ... output_json["categories"] = categories_json ... with open(path_anno, "w") as file: ... json.dump(output_json, file, ensure_ascii=False, indent=4) ... for im, img_id in zip(cppe5["image"], cppe5["image_id"]): ... path_img = os.path.join(path_output_cppe5, f"{img_id}.png") ... im.save(path_img) ... return path_output_cppe5, path_anno
接下来,准备一个可以与cocoevaluator
一起使用的CocoDetection
类的实例。
>>> import torchvision >>> class CocoDetection(torchvision.datasets.CocoDetection): ... def __init__(self, img_folder, image_processor, ann_file): ... super().__init__(img_folder, ann_file) ... self.image_processor = image_processor ... def __getitem__(self, idx): ... # read in PIL image and target in COCO format ... img, target = super(CocoDetection, self).__getitem__(idx) ... # preprocess image and target: converting target to DETR format, ... # resizing + normalization of both image and target) ... image_id = self.ids[idx] ... target = {"image_id": image_id, "annotations": target} ... encoding = self.image_processor(images=img, annotations=target, return_tensors="pt") ... pixel_values = encoding["pixel_values"].squeeze() # remove batch dimension ... target = encoding["labels"][0] # remove batch dimension ... return {"pixel_values": pixel_values, "labels": target} >>> im_processor = AutoImageProcessor.from_pretrained("devonho/detr-resnet-50_finetuned_cppe5") >>> path_output_cppe5, path_anno = save_cppe5_annotation_file_images(cppe5["test"]) >>> test_ds_coco_format = CocoDetection(path_output_cppe5, im_processor, path_anno)
最后,加载指标并运行评估。
>>> import evaluate >>> from tqdm import tqdm >>> model = AutoModelForObjectDetection.from_pretrained("devonho/detr-resnet-50_finetuned_cppe5") >>> module = evaluate.load("ybelkada/cocoevaluate", coco=test_ds_coco_format.coco) >>> val_dataloader = torch.utils.data.DataLoader( ... test_ds_coco_format, batch_size=8, shuffle=False, num_workers=4, collate_fn=collate_fn ... ) >>> with torch.no_grad(): ... for idx, batch in enumerate(tqdm(val_dataloader)): ... pixel_values = batch["pixel_values"] ... pixel_mask = batch["pixel_mask"] ... labels = [ ... {k: v for k, v in t.items()} for t in batch["labels"] ... ] # these are in DETR format, resized + normalized ... # forward pass ... outputs = model(pixel_values=pixel_values, pixel_mask=pixel_mask) ... orig_target_sizes = torch.stack([target["orig_size"] for target in labels], dim=0) ... results = im_processor.post_process(outputs, orig_target_sizes) # convert outputs of model to Pascal VOC format (xmin, ymin, xmax, ymax) ... module.add(prediction=results, reference=labels) ... del batch >>> results = module.compute() >>> print(results) Accumulating evaluation results... DONE (t=0.08s). IoU metric: bbox Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.352 Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.681 Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.292 Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.168 Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.208 Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.429 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.274 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.484 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.501 Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.191 Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.323 Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.590
通过调整 TrainingArguments 中的超参数,这些结果可以进一步改善。试一试吧!
推断
现在您已经微调了一个 DETR 模型,对其进行了评估,并将其上传到 Hugging Face Hub,您可以将其用于推断。尝试使用您微调的模型进行推断的最简单方法是在 Pipeline 中使用它。使用您的模型实例化一个用于目标检测的流水线,并将图像传递给它:
>>> from transformers import pipeline >>> import requests >>> url = "https://i.imgur.com/2lnWoly.jpg" >>> image = Image.open(requests.get(url, stream=True).raw) >>> obj_detector = pipeline("object-detection", model="devonho/detr-resnet-50_finetuned_cppe5") >>> obj_detector(image)
如果您愿意,也可以手动复制流水线的结果:
>>> image_processor = AutoImageProcessor.from_pretrained("devonho/detr-resnet-50_finetuned_cppe5") >>> model = AutoModelForObjectDetection.from_pretrained("devonho/detr-resnet-50_finetuned_cppe5") >>> with torch.no_grad(): ... inputs = image_processor(images=image, return_tensors="pt") ... outputs = model(**inputs) ... target_sizes = torch.tensor([image.size[::-1]]) ... results = image_processor.post_process_object_detection(outputs, threshold=0.5, target_sizes=target_sizes)[0] >>> for score, label, box in zip(results["scores"], results["labels"], results["boxes"]): ... box = [round(i, 2) for i in box.tolist()] ... print( ... f"Detected {model.config.id2label[label.item()]} with confidence " ... f"{round(score.item(), 3)} at location {box}" ... ) Detected Coverall with confidence 0.566 at location [1215.32, 147.38, 4401.81, 3227.08] Detected Mask with confidence 0.584 at location [2449.06, 823.19, 3256.43, 1413.9]
让我们绘制结果:
>>> draw = ImageDraw.Draw(image) >>> for score, label, box in zip(results["scores"], results["labels"], results["boxes"]): ... box = [round(i, 2) for i in box.tolist()] ... x, y, x2, y2 = tuple(box) ... draw.rectangle((x, y, x2, y2), outline="red", width=1) ... draw.text((x, y), model.config.id2label[label.item()], fill="white") >>> image
在一张新图片上的目标检测结果
Transformers 4.37 中文文档(五)(3)https://developer.aliyun.com/article/1565243