【目标检测之数据集预处理】继承Dataset定义自己的数据集【附代码】（下）

2023-02-14 212

版权

本文内容由阿里云实名注册用户自发贡献，版权归原作者所有，阿里云开发者社区不拥有其著作权，亦不承担相应法律责任。具体规则请查看《阿里云开发者社区用户服务协议》和《阿里云开发者社区知识产权保护指引》。如果您发现本社区中有涉嫌抄袭的内容，填写侵权投诉表单进行举报，一经查实，本社区将立刻删除涉嫌侵权内容。

简介： 在深度学习训练中，除了设计有效的卷积神经网络框架外，更重要的是数据的处理。在训练之前需要对训练数据进行预处理。比如在目标检测网络训练中，首先需要划分训练集和测试集，然后对标签、边界框等进行处理后才能送入网络进行训练，本文章以VOC数据集格式为例，对数据集进行预处理后送入目标检测网络进行训练。【附代码】

bbox处理

box_data = np.zeros((len(box), 5)) # 创建一个和bbox shape一样的全零矩阵

Out[30]:

array([[0., 0., 0., 0., 0.],

[0., 0., 0., 0., 0.],

[0., 0., 0., 0., 0.],

[0., 0., 0., 0., 0.]])

因为前面已经对图像进行了缩放，那么相应的，缩放后的bbox也会发生改变，所以计算一下缩放后的bbox 【分别院box对前4列坐标信息进行缩放】

box[:, [0, 2]] = box[:, [0, 2]] * nw / iw + dx  # 对原框x坐标缩放
box[:, [1, 3]] = box[:, [1, 3]] * nh / ih + dy  # 对原y坐标进行缩放

得到缩放后图像的bbox信息

array([[ 26, 19, 96, 150, 0],

[168, 0, 279, 151, 0],

[ 14, 158, 120, 298, 0],

[186, 162, 250, 297, 0]])

进一步处理坐标信息，防止缩放后坐标的溢出或者说出现负坐标

# 处理左上坐标,防止负坐标
box[:, 0:2][box[:, 0:2] < 0] = 0
# 处理右下坐标，防止超过输入边界
box[:, 2][box[:, 2] > w] = w   # box[:, 2] > w是条件语句，意思是判断第2列坐标是否超过了w
box[:, 3][box[:, 3] > h] = h

# 计算缩放后的框的尺寸
box_w = box[:, 2] - box[:, 0]  # 第2列坐标-第0列坐标，可以得出box的w
box_h = box[:, 3] - box[:, 1]

我这里的输出box_w,box_h各有4个值，是因为我原来标签中有4个边界框

In [40]: box_w,box_h

Out[40]: (array([ 70, 111, 106, 64]), array([131, 151, 140, 135]))

计算一下有效的边界框【有效的边界框是指box长度大于1】

box = box[np.logical_and(box_w > 1, box_h > 1)] # 逻辑与判断有效的边界框
box_data = np.zeros((len(box), 5))
# 将有效的边界框赋值给前面定义的全零box_data
box_data[:len(box)] = box

现在我们再返回

def __getitem__(self, index):

        if self.is_train:
            img, y = self.get_data(lines[index], self.image_size[0:2], random=False)

现在的img,y就是通过我们定义def get_data返回的结合，即上面输出的image_data和box_data

取出box坐标【不包含类的那一列】

boxes = np.array(y[:, :4], dtype=np.float32)

In [49]: boxes

Out[49]:

array([[ 26., 19., 96., 150.],

[168., 0., 279., 151.],

[ 14., 158., 120., 298.],

[186., 162., 250., 297.]], dtype=float32)

进一步对box坐标进行处理，归一化处理

boxes[:, 0] = boxes[:, 0] / self.image_size[1]
boxes[:, 1] = boxes[:, 1] / self.image_size[0]
boxes[:, 2] = boxes[:, 2] / self.image_size[1]
boxes[:, 3] = boxes[:, 3] / self.image_size[0]

In [55]: boxes

Out[55]:

array([[0.08666667, 0.06333333, 0.32 , 0.5 ],

[0.56 , 0. , 0.93 , 0.50333333],

[0.04666667, 0.52666664, 0.4 , 0.99333334],

[0.62 , 0.54 , 0.8333333 , 0.99 ]], dtype=float32)

获取boxes坐标比1小比0大的有效坐标

boxes = np.maximum(np.minimum(boxes, 1), 0)

再将处理以后的box坐标矩阵和类别这一列进行拼接，得到完整的bbox信息【包含类标签】

y = np.concatenate([boxes, y[:, -1:]], axis=-1)

In [59]: y

Out[59]:

array([[0.08666667, 0.06333333, 0.31999999, 0.5 , 0. ],

[0.56 , 0. , 0.93000001, 0.50333333, 0. ],

[0.04666667, 0.52666664, 0.40000001, 0.99333334, 0. ],

[0.62 , 0.54000002, 0.83333331, 0.99000001, 0. ]])

上面的y就是最终得到的bbox，可以看出前4列是边界框坐标信息，最后一列是类

img = np.array(img, dtype=np.float32) # 将图像转为数组
tmp_inp = np.transpose(img - MEANS, (2, 0, 1))  # tmp_inp的shape为（3，300，300）

tmp_targets = np.array(y, dtype=np.float32) # 标签转数组

--------------------------------------------------------------------------------------------------------------------------------

现在我们就得到了最终数据处理后的图像信息(包含了边界框坐标)和标签信息

完整的代码：

class MyDatasets(Dataset):
    def __init__(self, train_line, image_size, is_train):
        super(MyDatasets, self).__init__()
        self.train_line = train_line
        self.train_batches = len(train_line)
        self.image_size = image_size
        self.is_train = is_train
        embed()
    def get_data(self, annotation_line, input_shape, random=True):
        line = annotation_line.split()
        image = Image.open(line[0])  # line[0]是图片路径，line[1:]是框和标签信息
        iw, ih = image.size  # 真实输入图像大小
        h, w = input_shape  # 网络输入大小
        box = np.array([np.array(list(map(int, box.split(',')))) for box in line[1:]])  # 将box信息转为数组
        if not random:
            # 裁剪图像
            scale = min(w / iw, h / ih)
            nw = int(iw * scale)
            nh = int(ih * scale)
            dx = (w - nw) // 2  # 取商（应该是留部分条状）
            dy = (h - nh) // 2
            image = image.resize((nw, nh), Image.BICUBIC) # 采用双三次插值算法缩小图像
            new_image = Image.new('RGB', (w, h), (128, 128, 128))
            new_image.paste(image, (dx, dy))
            image_data = np.array(new_image, np.float32)
            # 处理真实框
            box_data = np.zeros((len(box), 5))
            if (len(box) > 0):
                np.random.shuffle(box)
                box[:, [0, 2]] = box[:, [0, 2]] * nw / iw + dx  # 对原框x坐标缩放
                box[:, [1, 3]] = box[:, [1, 3]] * nh / ih + dy  # 对原y坐标进行缩放
                # 处理左上坐标,防止负坐标
                box[:, 0:2][box[:, 0:2] < 0] = 0
                # 处理右下坐标，防止超过输入边界
                box[:, 2][box[:, 2] > w] = w
                box[:, 3][box[:, 3] > h] = h
                # 计算缩放后的框的尺寸
                box_w = box[:, 2] - box[:, 0]
                box_h = box[:, 3] - box[:, 1]
                box = box[np.logical_and(box_w > 1, box_h > 1)]
                box_data = np.zeros((len(box), 5))
                box_data[:len(box)] = box
            return image_data, box_data
    def __len__(self):  # 返回数据集的长度
        return self.train_batches
    def __getitem__(self, index):  # 返回数据集和标签
        lines = self.train_line
        if self.is_train:
            img, y = self.get_data(lines[index], self.image_size[0:2], random=False)
        else:
            img, y = self.get_data(lines[index], self.image_size[0:2], random=False)
        boxes = np.array(y[:, :4], dtype=np.float32)
        boxes[:, 0] = boxes[:, 0] / self.image_size[1]
        boxes[:, 1] = boxes[:, 1] / self.image_size[0]
        boxes[:, 2] = boxes[:, 2] / self.image_size[1]
        boxes[:, 3] = boxes[:, 3] / self.image_size[0]
        boxes = np.maximum(np.minimum(boxes, 1), 0)
        y = np.concatenate([boxes, y[:, -1:]], axis=-1)
        img = np.array(img, dtype=np.float32)
        tmp_inp = np.transpose(img - MEANS, (2, 0, 1))
        tmp_targets = np.array(y, dtype=np.float32)
        return tmp_inp, tmp_targets

【目标检测之数据集预处理】继承Dataset定义自己的数据集【附代码】（下）

bbox处理

完整的代码：

热门文章

最新文章

相关课程

相关电子书

相关实验场景

热门

活动广场

任务中心

开发者评测

高校计划

乘风者计划

训练营

阿里云MVP

话题

直播

下载

镜像站

技术资料

插件

【目标检测之数据集预处理】继承Dataset定义自己的数据集【附代码】（下）

bbox处理

完整的代码：

热门文章

最新文章

相关课程

相关电子书

相关实验场景