Yolo-V5目标检测项目实战-阿里云开发者社区

引言

本文将一步一步的指导训练 Yolo-v5并进行推断来计算血细胞并定位它们。

我曾试图用 Yolo v3-v4做一个目标检测模型，在显微镜下用血液涂抹的图像上计算红细胞、白细胞和血小板，但是我没有得到我想要的准确度，而且这个模型也没有投入生产。

最近，我偶然看到了来自 Ultralytics 的 Yolo-v5模型的发布，它使用 PyTorch 构建。由于之前的失败，我一开始有点怀疑，但是在阅读了他们 Github repo 中的使用手册后，这次我非常自信，我想试一试。

而且它的效果很神奇，Yolo-v5很容易训练，也很容易推理。

所以这篇文章总结了我在血细胞计数数据集上使用 Yolo-v5模型的实践经验。

Ultralytics 最近推出了 Yolo-v5。目前，Yolo 的前三个版本是由 Joseph Redmon 创造的。但是较新的版本比其他版本有更高的平均精度和更快的推理时间。与此同时，它是在 PyTorch 的基础上构建的，这使得训练和推理过程非常快，结果也非常好。

那么，让我们来分解一下我们训练过程中的步骤。

数据ー预处理(Yolo-v5兼容)
模型ー训练
推理

1. 数据 - 预处理(Yolo-v5兼容)

我使用了 Github 上提供的数据集 BCCD 数据集，这个数据集有血迹模糊的显微镜图像，并且在 XML 文件中提供了相应的边界框标注。

Dataset Structure:
- BCCD
  - Annotations
    - BloodImage_00000.xml
    - BloodImage_00001.xml
    ...
- JpegImages
    - BloodImage_00001.jpg
    - BloodImage_00001.jpg
    ...

示例图像及其标注：

样本输入图像

XML 文件中的标签

将标注映射为图像中的边界框后，将得到如下结果：

但是为了训练 Yolo-v5模型，我们需要组织我们的数据集结构，它需要图像(.jpg/.png，等等)和它以.txt保存的相应标签。

Yolo-v5 Dataset Structure:
- BCCD
  - Images
    - Train (.jpg files)
    - Valid (.jpg files)
- Labels
     - Train (.txt files)
     - Valid (.txt files)

然后. txt 文件格式应该是：

txt 文件的结构:

- 每个对象一行。

- 每行为 class x_center y_center width height 格式。

- Box 坐标必须是标准化的 xywh 格式(从0-1开始)。如果你的 box 是以像素为单位，用图像宽度除以 x_center 和 width，用图像 height 除以 y_center 和 height。

- 类号为零索引(从0开始)。

一个示例标签，包括类1(RBC)和类2(WBC) ，以及它们各自的 x_center、 y_center、width 和 height (全部标准化为0-1) ，看起来像下面这个。

Yolo-v5 Dataset Structure:
- BCCD
  - Images
    - Train (.jpg files)
    - Valid (.jpg files)
- Labels
     - Train (.txt files)
     - Valid (.txt files)

labels.txt 示例

因此，让我们看看如何在上面指定的结构中预处理数据。

我们的第一步应该是解析所有 XML 文件中的数据，并将它们存储在数据框架中以便进一步处理。因此，我们运行下面的代码来实现它。

# Dataset Extraction from github
!git clone 'https://github.com/Shenggan/BCCD_Dataset.git'
import os, sys, random, shutil
import xml.etree.ElementTree as ET
from glob import glob
import pandas as pd
from shutil import copyfile
import pandas as pd
from sklearn import preprocessing, model_selection
import matplotlib.pyplot as plt
%matplotlib inline
from matplotlib import patches
import numpy as np
annotations = sorted(glob('/content/BCCD_Dataset/BCCD/Annotations/*.xml'))
df = []
cnt = 0
for file in annotations:
  prev_filename = file.split('/')[-1].split('.')[0] + '.jpg'
  filename = str(cnt) + '.jpg'
  row = []
  parsedXML = ET.parse(file)
  for node in parsedXML.getroot().iter('object'):
    blood_cells = node.find('name').text
    xmin = int(node.find('bndbox/xmin').text)
    xmax = int(node.find('bndbox/xmax').text)
    ymin = int(node.find('bndbox/ymin').text)
    ymax = int(node.find('bndbox/ymax').text)
    row = [prev_filename, filename, blood_cells, xmin, xmax, ymin, ymax]
    df.append(row)
  cnt += 1
data = pd.DataFrame(df, columns=['prev_filename', 'filename', 'cell_type', 'xmin', 'xmax', 'ymin', 'ymax'])
data[['prev_filename','filename', 'cell_type', 'xmin', 'xmax', 'ymin', 'ymax']].to_csv('/content/blood_cell_detection.csv', index=False)
data.head(10)

数据帧应该是这样的：

保存此文件后，我们需要进行更改，以将其转换为与 Yolo-v5兼容的格式。

REQUIRED DATAFRAME STRUCTURE
- filename : contains the name of the image
- cell_type: denotes the type of the cell
- xmin: x-coordinate of the bottom left part of the image
- xmax: x-coordinate of the top right part of the image
- ymin: y-coordinate of the bottom left part of the image
- ymax: y-coordinate of the top right part of the image
- labels : Encoded cell-type (Yolo - label input-1)
- width : width of that bbox
- height : height of that bbox
- x_center : bbox center (x-axis)
- y_center : bbox center (y-axis)
- x_center_norm : x_center normalized (0-1) (Yolo - label input-2)
- y_center_norm : y_center normalized (0-1) (Yolo - label input-3)
- width_norm : width normalized (0-1) (Yolo - label input-4)
- height_norm : height normalized (0-1) (Yolo - label input-5)

我编写了一些代码，将现有的数据帧转换为上面代码片段中指定的结构。

img_width = 640
img_height = 480
def width(df):
  return int(df.xmax - df.xmin)
def height(df):
  return int(df.ymax - df.ymin)
def x_center(df):
  return int(df.xmin + (df.width/2))
def y_center(df):
  return int(df.ymin + (df.height/2))
def w_norm(df):
  return df/img_width
def h_norm(df):
  return df/img_height
df = pd.read_csv('/content/blood_cell_detection.csv')
le = preprocessing.LabelEncoder()
le.fit(df['cell_type'])
print(le.classes_)
labels = le.transform(df['cell_type'])
df['labels'] = labels
df['width'] = df.apply(width, axis=1)
df['height'] = df.apply(height, axis=1)
df['x_center'] = df.apply(x_center, axis=1)
df['y_center'] = df.apply(y_center, axis=1)
df['x_center_norm'] = df['x_center'].apply(w_norm)
df['width_norm'] = df['width'].apply(w_norm)
df['y_center_norm'] = df['y_center'].apply(h_norm)
df['height_norm'] = df['height'].apply(h_norm)
df.head(30)

经过预处理我们的数据帧看起来像这样，这里我们可以看到一个图像文件存在许多行(例如 blooddimage _ 0000.jpg) ，现在我们需要收集所有的(labels，x_center_norm，y_center_norm，width_norm，height_norm)值为单个图像文件，并将其保存为.txt 文件。

预处理

现在，我们将数据集分成训练集和验证集，并保存相应的图像，以及保存在.txt文件中的标签。为此，我写了一小段代码片段。

df_train, df_valid = model_selection.train_test_split(df, test_size=0.1, random_state=13, shuffle=True)
print(df_train.shape, df_valid.shape)
os.mkdir('/content/bcc/')
os.mkdir('/content/bcc/images/')
os.mkdir('/content/bcc/images/train/')
os.mkdir('/content/bcc/images/valid/')
os.mkdir('/content/bcc/labels/')
os.mkdir('/content/bcc/labels/train/')
os.mkdir('/content/bcc/labels/valid/')
def segregate_data(df, img_path, label_path, train_img_path, train_label_path):
  filenames = []
  for filename in df.filename:
    filenames.append(filename)
  filenames = set(filenames)
  for filename in filenames:
    yolo_list = []
    for _,row in df[df.filename == filename].iterrows():
      yolo_list.append([row.labels, row.x_center_norm, row.y_center_norm, row.width_norm, row.height_norm])
    yolo_list = np.array(yolo_list)
    txt_filename = os.path.join(train_label_path,str(row.prev_filename.split('.')[0])+".txt")
    # Save the .img & .txt files to the corresponding train and validation folders
    np.savetxt(txt_filename, yolo_list, fmt=["%d", "%f", "%f", "%f", "%f"])
    shutil.copyfile(os.path.join(img_path,row.prev_filename), os.path.join(train_img_path,row.prev_filename))
## Apply function ## 
src_img_path = "/content/BCCD_Dataset/BCCD/JPEGImages/"
src_label_path = "/content/BCCD_Dataset/BCCD/Annotations/"
train_img_path = "/content/bcc/images/train"
train_label_path = "/content/bcc/labels/train"
valid_img_path = "/content/bcc/images/valid"
valid_label_path = "/content/bcc/labels/valid"
segregate_data(df_train, src_img_path, src_label_path, train_img_path, train_label_path)
segregate_data(df_valid, src_img_path, src_label_path, valid_img_path, valid_label_path)
print("No. of Training images", len(os.listdir('/content/bcc/images/train')))
print("No. of Training labels", len(os.listdir('/content/bcc/labels/train')))
print("No. of valid images", len(os.listdir('/content/bcc/images/valid')))
print("No. of valid labels", len(os.listdir('/content/bcc/labels/valid')))

在运行代码之后，我们应该像预期的那样有了文件夹结构，并准备好对模型进行训练。

No. of Training images 364 
No. of Training labels 364
No. of valid images 270 
No. of valid labels 270
&&
- BCCD
  - Images
    - Train (364 .jpg files)
    - Valid (270 .jpg files)
- Labels
     - Train (364 .txt files)
     - Valid (270 .txt files)

数据预处理已完成。

2. 模型ー训练

要开始训练过程，我们需要克隆官方的 Yolo-v5的权重和配置文件。

!git clone  'https://github.com/ultralytics/yolov5.git'
!pip install -qr '/content/yolov5/requirements.txt'  # install dependencies
## Create a yaml file and move it into the yolov5 folder ##
shutil.copyfile('/content/bcc.yaml', '/content/yolov5/bcc.yaml')

然后安装 requirements.txt 文件中指定的必需包。

Yolov5 结构

bcc.yaml：

现在我们需要创建一个 Yaml 文件，其中包含训练和验证目录、类的数量以及标签名称。稍后我们需要把 yaml 文件放到我们克隆的 yolov5目录中。

## Contents inside the .yaml file
train: /content/bcc/images/train
val: /content/bcc/images/valid
nc: 3 
names: ['Platelets', 'RBC', 'WBC']

model’s — YAML :

现在我们需要从./models 文件夹中选择一个模型(small, medium, large, xlarge)。

下图描述了现有模型的各种特征，例如参数数量等。您可以根据手头任务的复杂性选择任何模型，默认情况下，它们都可以作为模型文件夹中的 yaml 文件。

来自 Ultralytics 的 Yolo 模型参数

模型的 YAML 文件

现在我们需要编辑我们选择的模型的 *.yaml 文件。我们只需要在本例中替换类的数量，以便与模型的 YAML 文件中的类的数量相匹配。为了简单起见，我选择 yolov5s.yaml 来加快处理速度。

## parameters
nc: 3 # number of classes
depth_multiple: 0.33  # model depth multiple
width_multiple: 0.50  # layer channel multiple
# anchors
anchors:
- [10,13, 16,30, 33,23]  # P3/8
- [30,61, 62,45, 59,119]  # P4/16
...............

注意：如果我们没有替换模型的 YAML 文件中的 nc (我已经替换了) ，那么这个步骤不是强制的，它将自动覆盖我们之前创建的 nc 值(bcc.yaml) ，在训练模型时，您将看到这一行，这证明我们不必修改它。

“Overriding ./yolov5/models/yolov5s.yaml nc=80 with nc=3”

模型训练参数：

我们需要配置的训练参数，如 epoch，batch_size等

Training Parameters
!python
- <'location of train.py file'>
- --img <'width of image'>
- --batch <'batch size'>
- --epochs <'no of epochs'>
- --data <'location of the .yaml file'>
- --cfg <'Which yolo configuration you want'>(yolov5s/yolov5m/yolov5l/yolov5x).yaml | (small, medium, large, xlarge)
- --name <'Name of the best model to save after training'>

另外，如果我们愿意的话，我们可以用 tensorboard 查看日志文件。

# Start tensorboard (optional)
%load_ext tensorboard
%tensorboard --logdir runs/
!python yolov5/train.py --img 640 --batch 8 --epochs 100 \
    --data bcc.yaml --cfg models/yolov5s.yaml --name BCCM

这将启动训练过程，需要一段时间才能完成。

我发布了一些我的训练过程的片段

训练过程摘录

METRICS FROM TRAINING PROCESS
No.of classes, No.of images, No.of targets, Precision (P), Recall (R), mean Average Precision (map)
- Class | Images | Targets | P | R | mAP@.5 | mAP@.5:.95: |
- all   | 270    |     489 |    0.0899 |       0.827 |      0.0879 |      0.0551

因此，通过 P(精度)、 R(召回率)和 mAP (平均平均精度)的值，我们可以知道我们的模型是否正常。即使我只训练了这个模型100个 epoch，它的表现还是很棒的。

Tensorboard 可视化

模型训练结束

3. 推理

现在是测试我们的模型，看看它是如何做出预测的激动人心的时刻。但是我们需要遵循一些简单的步骤。

推理参数

Inference Parameters
!python
- <'location of detect.py file'>
- --source <'location of image/ folder to predict'>
- --weight <'location of the saved best weights'>
- --output <'location to store the outputs after prediction'>
- --img-size <'Image size of the trained model'>
(Optional)
- --conf-thres <"default=0.4", 'object confidence threshold')>
- --iou-thres <"default=0.5" , 'threshold for NMS')>    
- --device <'cuda device or cpu')>
- --view-img <'display results')>
- --save-txt <'saves the bbox co-ordinates results to *.txt')>
- --classes <'filter by class: --class 0, or --class 0 2 3')>
## And there are other more customization availble, check them in the detect.py file. ##

运行下面的代码，对文件夹/图像进行预测。

## TO PREDICT IMAGES IN A FOLDER ##
!python yolov5/detect.py --source /content/bcc/images/valid/ 
  --weights '/content/drive/My Drive/Machine Learning Projects/YOLO/best_yolov5_BCCM.pt'
  --output '/content/inference/output'
## TO PREDICT A SINGLE IMAGE FILE ##
output = !python yolov5/detect.py --source /content/bcc/images/valid/BloodImage_00000.jpg 
  --weights '/content/drive/My Drive/Machine Learning Projects/YOLO/best_yolov5_BCCM.pt'
print(output)

结果是好的

输出样本

解释来自.txt 文件的输出: (可选读取)

为了以防万一，假设你正在做人脸检测和人脸识别，并且想要在你的处理过程中更近一步，假设你想使用 bbox 坐标裁剪 opencv 的人脸，并将他们发送到人脸识别模型，在这种情况下，我们不仅需要输出像上面的图形，而且我们需要每个人脸的坐标。有什么办法吗？答案是肯定的，继续读下去。

(我只使用了人脸检测和识别作为一个例子，Yolo-V5也可以用来做到这一点)

此外，我们还可以将输出保存到一个.txt 文件中，该文件包含一些输入图像的 bbox 坐标。

# class x_center_norm y_center_norm width_norm height_norm #
  1     0.718         0.829         0.143      0.193
  ...

运行下面的代码，获取.txt 文件中的输出,

!python yolov5/detect.py --source /content/bcc/images/valid/BloodImage_00000.jpg 
        --weights '/content/runs/exp0_BCCM/weights/best_BCCM.pt' 
        --view-img 
        --save-txt

成功运行代码后，我们可以看到输出存储在推理文件夹中,

输出标签

很好，现在.txt 文件的输出格式是：

  [ class x_center_norm y_center_norm width_norm height_norm ]
         "we need to convert it to the form specified below"
               [ class, x_min, y_min, width, height ]

[ class, X_center_norm, y_center_norm, Width_norm, Height_norm ] , 我们需要将其转换为 [ class, x_min, y_min, width, height ] , (也是反规范化的) ，以便于绘制。

要做到这一点，只需运行下面执行上述转换的代码。

# Plotting bbox ffrom the .txt file output from yolo #
## Provide the location of the output .txt file ##
a_file = open("/content/inference/output/BloodImage_00000.txt", "r")
# Stripping data from the txt file into a list #
list_of_lists = []
for line in a_file:
  stripped_line = line.strip()
  line_list = stripped_line.split()
  list_of_lists.append(line_list)
a_file.close()
# Conversion of str to int #
stage1 = []
for i in range(0, len(list_of_lists)):
  test_list = list(map(float, list_of_lists[i])) 
  stage1.append(test_list)
# Denormalizing # 
stage2 = []
mul = [1,640,480,640,480] #[constant, image_width, image_height, image_width, image_height]
for x in stage1:
  c,xx,yy,w,h = x[0]*mul[0], x[1]*mul[1], x[2]*mul[2], x[3]*mul[3], x[4]*mul[4]    
  stage2.append([c,xx,yy,w,h])
# Convert (x_center, y_center, width, height) --> (x_min, y_min, width, height) #
stage_final = []
for x in stage2:
  c,xx,yy,w,h = x[0]*1, (x[1]-(x[3]/2)) , (x[2]-(x[4]/2)), x[3]*1, x[4]*1  
  stage_final.append([c,xx,yy,w,h])
fig = plt.figure()
import cv2
#add axes to the image
ax = fig.add_axes([0,0,1,1])
# read and plot the image
## Location of the input image which is sent to model's prediction ##
image = plt.imread('/content/BCCD_Dataset/BCCD/JPEGImages/BloodImage_00000.jpg')
plt.imshow(image)
# iterating over the image for different objects
for x in stage_final:
  class_ = int(x[0])
  xmin = x[1]
  ymin = x[2]
  width = x[3]
  height = x[4]
  xmax = width + xmin
  ymax = height + ymin
    # assign different color to different classes of objects
  if class_ == 1:
    edgecolor = 'r'
    ax.annotate('RBC', xy=(xmax-40,ymin+20))
  elif class_ == 2:
    edgecolor = 'b'
    ax.annotate('WBC', xy=(xmax-40,ymin+20))
  elif class_ == 0:
    edgecolor = 'g'
    ax.annotate('Platelets', xy=(xmax-40,ymin+20))
    # add bounding boxes to the image
  rect = patches.Rectangle((xmin,ymin), width, height, edgecolor = edgecolor, facecolor = 'none')
  ax.add_patch(rect)

然后输出绘制的图像看起来像这样。

前面代码的输出

4. 从模型到生产化

为了以防万一，如果您希望将模型移动到生产环境或部署到任何地方，则必须遵循以下步骤。

首先，安装依赖项来运行 yolov5，我们需要一些来自 yolov5文件夹的文件，并将它们添加到 python 系统路径目录中以加载 utils。所以把它们拷贝到你需要的地方，然后把它们移动到你需要的地方。

所以在下面的图片1，我已经打包了一些文件夹和文件，你可以下载他们，并保持在一个单独的文件夹，如图片2。

图1

图2 生产中加载的必要文件

现在我们需要告诉 python 编译器将上面的文件夹位置添加到 account 中，这样当我们运行程序时，它将在运行时加载模型和函数。

在下面的代码片段中，在第9行，我添加了 sys.path... 命令，并在其中指定了移动这些文件的文件夹位置，您可以用自己的文件夹替换它。

然后运行这些代码开始预测。

import os, sys, random
from glob import glob
import matplotlib.pyplot as plt
%matplotlib inline
!pip install -qr '/content/drive/My Drive/Machine Learning Projects/YOLO/SOURCE/requirements.txt'  # install dependencies
## Add the path where you have stored the neccessary supporting files to run detect.py ##
## Replace this with your path.##
sys.path.insert(0, '/content/drive/My Drive/Machine Learning Projects/YOLO/SOURCE/') 
print(sys.path)
cwd = os.getcwd()
print(cwd)
## Single Image prediction
## Beware the contents in the output folder will be deleted for every prediction
output = !python '/content/drive/My Drive/Machine Learning Projects/YOLO/SOURCE/detect.py' 
          --source '/content/BloodImage_00026.jpg' 
          --weights '/content/drive/My Drive/Machine Learning Projects/YOLO/SOURCE/best_BCCM.pt' 
           --output '/content/OUTPUTS/' --device 'cpu'
print(output)
img = plt.imread('/content/OUTPUTS/BloodImage_00026.jpg')
plt.imshow(img)
## Folder Prediction
output = !python '/content/drive/My Drive/Machine Learning Projects/YOLO/SOURCE/detect.py' 
          --source '/content/inputs/' 
          --weights '/content/drive/My Drive/Machine Learning Projects/YOLO/SOURCE/best_BCCM.pt' 
          --output '/content/OUTPUTS/' --device 'cpu'
print(output)