基于PaddleOCR的集装箱箱号检测识别

本文涉及的产品
传统型负载均衡 CLB,每月750个小时 15LCU
小语种识别,小语种识别 200次/月
函数计算FC,每月15万CU 3个月
简介: 全球百大集装箱港口,更是在 2021 年共完成集装箱吞吐量 6.76 亿 TEU。如此大规模的集装箱数量,使得箱号识别的压力骤增,传统的由人对集装箱号进行识别记录的方式成本高、效率低,运营条件落后。随着经济和社会的发展,在港口经营中引入人工智能,已经成为传统港口在市场竞争中蜕变升维的关键。于是希望从环境准备到模型训练,演示如何借助 PaddleOCR,进行集装箱箱号检测识别。

基于PaddleOCR的集装箱箱号检测识别

项目背景

国际航运咨询分析机构 Alphaliner 在今年 3 月公布的一组数据,2021 年集装箱吞吐量排名前 30 的榜单中,上海港以 4702.5 万标箱的「成绩单」雄踞鳌头。

Image

较上一年同期,上海港集装箱吞吐量增长 8.1%

与最近的竞争对手新加坡拉开了近 1000 万标准箱的差距

全球百大集装箱港口,更是在 2021 年共完成集装箱吞吐量 6.76 亿 TEU。如此大规模的集装箱数量,使得箱号识别的压力骤增,传统的由人对集装箱号进行识别记录的方式成本高、效率低,运营条件落后。

随着经济和社会的发展,在港口经营中引入人工智能,已经成为传统港口在市场竞争中蜕变升维的关键。

于是希望从环境准备到模型训练,演示如何借助 PaddleOCR,进行集装箱箱号检测识别。

一、项目介绍:用少量数据实现箱号检测识别任务

集装箱号是指装运出口货物集装箱的箱号,填写托运单时必填此项。标准箱号的构成采用ISO6346 (1995) 标准,由 11 位编码组成,以箱号 CBHU 123456 7 为例,它包括 3 个部分:

第一部分由 4 个英文字母组成,前 3 个字母表示箱主、经营人,第 4 个字母表示集装箱类型。CBHU 表示箱主和经营人为中远集运的标准集装箱。

第二部分由 6 位数字组成,表示箱体注册码,是集装箱箱体持有的唯一标识。

第三部分为校验码,由前面 4 个字母和 6 位数字经过校验规则运算得到,用于识别在校验时是否发生错误。

Image

堆积在港口等待运输的集装箱

这是一个基于PaddleOCR进行集装箱箱号检测识别任务,使用少量数据分别训练检测、识别模型,最后将他们串联在一起实现集装箱箱号检测识别的任务

二、环境准备

首先我们肯定是需要安装paddlepaddle的,这个paddle安装还是比较容易的,只需要以下代码,但是如果想要GPU版本,在他们的官网也是有介绍如何安装的,我这里下的是2.3的版本,因为也比较稳定,paddle安装参考 https://www.paddlepaddle.org.cn/install/quick

CPU版本

python -m pip install paddlepaddle==2.3.2 -i https://pypi.tuna.tsinghua.edu.cn/simple

GPU版本

利用conda安装

conda install paddlepaddle-gpu==2.3.2 cudatoolkit=11.6 -c https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/Paddle/ -c conda-forge 

除此之外,我们是利用PaddleOCR做实验的,所以我们还需要下载下来PaddleOCR的源码,这个也是很容易的,可以在github中下载,我这里下载的是2.6的版本,可以通过git clone下载一下我们的文件

git clone https://github.com/PaddlePaddle/PaddleOCR.git

或者上github下载PaddleOCR的全部代码 https://github.com/PaddlePaddle/PaddleOCR

最后我们就进入我们的文件夹了,并且安装所有的依赖

  • 进入PaddleOCR文件夹
cd PaddleOCR
  • 安装PaddleOCR
!pip install -r requirements.txt #安装PaddleOCR所需依赖
  • 安装完毕返回上层文件夹
cd ..

三、数据集介绍

本教程所使用的集装箱箱号数据集,该数据包含3003张分辨率为1920×1080的集装箱图像

1、PaddleOCR检测模型训练标注规则如下,中间用"\t"分隔:

" 图像文件名                    json.dumps编码的图像标注信息"
ch4_test_images/img_61.jpg    [{"transcription": "MASA", "points": [[310, 104], [416, 141], [418, 216], [312, 179]]}, {...}]

其中json.dumps编码前的图像标注信息是包含多个字典的list,字典中的 points 表示文本框的四个点的坐标(x, y),从左上角的点开始顺时针排列。 transcription 表示当前文本框的文字,当其内容为“###”时,表示该文本框无效,在训练时会跳过。

2、PaddleOCR识别模型训练标注规则如下,中间用"\t"分隔:

" 图像文件名                 图像标注信息 "

train_data/rec/train/word_001.jpg   简单可依赖
train_data/rec/train/word_002.jpg   用科技让复杂的世界更简单

四、数据整理

4.1 检测模型所需数据准备

将数据集3000张图片按2:1划分成训练集和验证集,运行以下代码

from tqdm import tqdm
finename = "all_label.txt"
f = open(finename)
lines = f.readlines() 
t = open('det_train_label.txt','w')
v = open('det_eval_label.txt','w')
count = 0
for line in tqdm(lines):
    if count < 2000:
        t.writelines(line)
        count += 1
    else:
        v.writelines(line)
f.close()
t.close()
v.close()
100%|██████████| 3003/3003 [00:00<00:00, 56103.65it/s]

4.2 识别模型所需数据准备

我们根据检测部分的注释,裁剪数据集尽可能只包含文字部分图片作为识别的数据,运行以下代码

from PIL import Image
import json
from tqdm import tqdm
import os
import numpy as np
import cv2
import math

from PIL import Image, ImageDraw

class Rotate(object):

    def __init__(self, image: Image.Image, coordinate):
        self.image = image.convert('RGB')
        self.coordinate = coordinate
        self.xy = [tuple(self.coordinate[k]) for k in ['left_top', 'right_top', 'right_bottom', 'left_bottom']]
        self._mask = None
        self.image.putalpha(self.mask)

    @property
    def mask(self):
        if not self._mask:
            mask = Image.new('L', self.image.size, 0)
            draw = ImageDraw.Draw(mask, 'L')
            draw.polygon(self.xy, fill=255)
            self._mask = mask
        return self._mask

    def run(self):
        image = self.rotation_angle()
        box = image.getbbox()
        return image.crop(box)

    def rotation_angle(self):
        x1, y1 = self.xy[0]
        x2, y2 = self.xy[1]
        angle = self.angle([x1, y1, x2, y2], [0, 0, 10, 0]) * -1
        return self.image.rotate(angle, expand=True)

    def angle(self, v1, v2):
        dx1 = v1[2] - v1[0]
        dy1 = v1[3] - v1[1]
        dx2 = v2[2] - v2[0]
        dy2 = v2[3] - v2[1]
        angle1 = math.atan2(dy1, dx1)
        angle1 = int(angle1 * 180 / math.pi)
        angle2 = math.atan2(dy2, dx2)
        angle2 = int(angle2 * 180 / math.pi)
        if angle1 * angle2 >= 0:
            included_angle = abs(angle1 - angle2)
        else:
            included_angle = abs(angle1) + abs(angle2)
            if included_angle > 180:
                included_angle = 360 - included_angle
        return included_angle



def image_cut_save(path, bbox, save_path):
    """
    :param path: 图片路径
    :param left: 区块左上角位置的像素点离图片左边界的距离
    :param upper:区块左上角位置的像素点离图片上边界的距离
    :param right:区块右下角位置的像素点离图片左边界的距离
    :param lower:区块右下角位置的像素点离图片上边界的距离
    """
    img_width  = 1920
    img_height = 1080
    img = Image.open(path)
    coordinate = {'left_top': bbox[0], 'right_top': bbox[1], 'right_bottom': bbox[2], 'left_bottom': bbox[3]}
    rotate = Rotate(img, coordinate)
    
    left, upper = bbox[0]
    right, lower = bbox[2]
    if lower-upper > right-left:
        rotate.run().convert('RGB').transpose(Image.ROTATE_90).save(save_path)
    else:
        rotate.run().convert('RGB').save(save_path)
    return True

#读取检测标注制作识别数据集
files = ["det_train_label.txt","det_eval_label.txt"]
filetypes =["train","eval"]
for index,filename in enumerate(files):
    f = open(filename)
    l = open('rec_'+filetypes[index]+'_label.txt','w')
    if index == 0:
        data_dir = "RecTrainData"
    else:
        data_dir = "RecEvalData"
    if not os.path.exists(data_dir):
        os.mkdir(data_dir)
    lines = f.readlines() 
    for line in tqdm(lines):
        image_name = line.split("\t")[0].split("/")[-1]
        annos = json.loads(line.split("\t")[-1])
        img_path = os.path.join("./dataset/images",image_name)
        for i,anno in enumerate(annos):
            data_path = os.path.join(data_dir,str(i)+"_"+image_name)
            if image_cut_save(img_path,anno["points"],data_path):
                l.writelines(str(i)+"_"+image_name+"\t"+anno["transcription"]+"\n")
    l.close()
    f.close()
  0%|          | 2/2000 [00:00<02:13, 14.98it/s]/tmp/ipykernel_250961/282371847.py:76: DeprecationWarning: ROTATE_90 is deprecated and will be removed in Pillow 10 (2023-07-01). Use Transpose.ROTATE_90 instead.
  rotate.run().convert('RGB').transpose(Image.ROTATE_90).save(save_path)
100%|██████████| 2000/2000 [01:02<00:00, 32.15it/s]
100%|██████████| 1003/1003 [00:29<00:00, 33.76it/s]

五、实验

由于数据集比较少,为了模型更好和更快的收敛,这里选用 PaddleOCR 中的 PP-OCRv3 模型进行检测和识别。PP-OCRv3在PP-OCRv2的基础上,中文场景端到端Hmean指标相比于PP-OCRv2提升5%, 英文数字模型端到端效果提升11%。详细优化细节请参考PP-OCRv3技术报告。

问你也可以看到各个模型的列表 https://github.com/PaddlePaddle/PaddleOCR/blob/v2.6.0/doc/doc_ch/models_list.md. 包括后面所有的模型都是从里面下载下来的

import os
os.environ['CUDA_VISIBLE_DEVICES'] = '1' # 选择GPU运行,比如我这里使用1号GPU运行

5.1 检测模型

5.1.1 检测模型配置

PaddleOCR提供了许多检测模型,在路径PaddleOCR/configs/det下可找到模型及其配置文件。如我们选用模型ch_PP-OCRv3_det_student.yml,其配置文件路径在:PaddleOCR/configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_student.yml。使用前需对其进行必要的设置,如训练参数、数据集路径等。然后设置以下重要的参数部分,将部分关键配置展示如下:

#关键训练参数
use_gpu: true # 是否使用显卡GPU
epoch_num: 50 # 训练epoch个数

save_model_dir: ./output/ch_PP-OCR_V3_det/ #模型保存路径
save_epoch_step: 100 # 每训练100step,保存一次模型
eval_batch_step:
  - 0
  - 200 #训练每迭代400次,进行一次验证
pretrained_model: ./PaddleOCR/pretrained_model/ch_PP-OCRv3_det_distill_train/student.pdparams # 预训练模型路径
#训练集路径设置
Train:
  dataset:
    name: SimpleDataSet
    data_dir: ./dataset/images #图片文件夹路径
    label_file_list:
      - ./det_train_label.txt #标签路径
# 同时也需要设置验证集
Eval:
  dataset:
    name: SimpleDataSet
    data_dir: ./dataset/images
    label_file_list:
      - ./det_eval_label.txt
  loader:
    shuffle: true
    drop_last: false
    batch_size_per_card: 8 # 每张卡所占的batchsize,如果在训练过程中显存超限,可以把batch size调小一点

5.1.2 模型微调

在notebook中运行如下命令对模型进行微调,其中 -c 传入的为配置好的模型文件路径

!python PaddleOCR/tools/train.py -c PaddleOCR/configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_student.yml
[2022/11/22 13:06:21] ppocr INFO: Architecture : 
[2022/11/22 13:06:21] ppocr INFO:     Backbone : 
[2022/11/22 13:06:21] ppocr INFO:         disable_se : True
[2022/11/22 13:06:21] ppocr INFO:         model_name : large
[2022/11/22 13:06:21] ppocr INFO:         name : MobileNetV3
[2022/11/22 13:06:21] ppocr INFO:         scale : 0.5
[2022/11/22 13:06:21] ppocr INFO:     Head : 
[2022/11/22 13:06:21] ppocr INFO:         k : 50
[2022/11/22 13:06:21] ppocr INFO:         name : DBHead
[2022/11/22 13:06:21] ppocr INFO:     Neck : 
[2022/11/22 13:06:21] ppocr INFO:         name : RSEFPN
[2022/11/22 13:06:21] ppocr INFO:         out_channels : 96
[2022/11/22 13:06:21] ppocr INFO:         shortcut : True
[2022/11/22 13:06:21] ppocr INFO:     Transform : None
[2022/11/22 13:06:21] ppocr INFO:     algorithm : DB
[2022/11/22 13:06:21] ppocr INFO:     model_type : det
[2022/11/22 13:06:21] ppocr INFO: Eval : 
[2022/11/22 13:06:21] ppocr INFO:     dataset : 
[2022/11/22 13:06:21] ppocr INFO:         data_dir : ./dataset/images
[2022/11/22 13:06:21] ppocr INFO:         label_file_list : ['./det_eval_label.txt']
[2022/11/22 13:06:21] ppocr INFO:         name : SimpleDataSet
[2022/11/22 13:06:21] ppocr INFO:         transforms : 
[2022/11/22 13:06:21] ppocr INFO:             DecodeImage : 
[2022/11/22 13:06:21] ppocr INFO:                 channel_first : False
[2022/11/22 13:06:21] ppocr INFO:                 img_mode : BGR
[2022/11/22 13:06:21] ppocr INFO:             DetLabelEncode : None
[2022/11/22 13:06:21] ppocr INFO:             DetResizeForTest : None
[2022/11/22 13:06:21] ppocr INFO:             NormalizeImage : 
[2022/11/22 13:06:21] ppocr INFO:                 mean : [0.485, 0.456, 0.406]
[2022/11/22 13:06:21] ppocr INFO:                 order : hwc
[2022/11/22 13:06:21] ppocr INFO:                 scale : 1./255.
[2022/11/22 13:06:21] ppocr INFO:                 std : [0.229, 0.224, 0.225]
[2022/11/22 13:06:21] ppocr INFO:             ToCHWImage : None
[2022/11/22 13:06:21] ppocr INFO:             KeepKeys : 
[2022/11/22 13:06:21] ppocr INFO:                 keep_keys : ['image', 'shape', 'polys', 'ignore_tags']
[2022/11/22 13:06:21] ppocr INFO:     loader : 
[2022/11/22 13:06:21] ppocr INFO:         batch_size_per_card : 1
[2022/11/22 13:06:21] ppocr INFO:         drop_last : False
[2022/11/22 13:06:21] ppocr INFO:         num_workers : 2
[2022/11/22 13:06:21] ppocr INFO:         shuffle : False
[2022/11/22 13:06:21] ppocr INFO: Global : 
[2022/11/22 13:06:21] ppocr INFO:     cal_metric_during_train : False
[2022/11/22 13:06:21] ppocr INFO:     checkpoints : None
[2022/11/22 13:06:21] ppocr INFO:     debug : False
[2022/11/22 13:06:21] ppocr INFO:     distributed : False
[2022/11/22 13:06:21] ppocr INFO:     epoch_num : 50
[2022/11/22 13:06:21] ppocr INFO:     eval_batch_step : [0, 200]
[2022/11/22 13:06:21] ppocr INFO:     infer_img : doc/imgs_en/img_10.jpg
[2022/11/22 13:06:21] ppocr INFO:     log_smooth_window : 20
[2022/11/22 13:06:21] ppocr INFO:     pretrained_model : ./PaddleOCR/pretrained_model/ch_PP-OCRv3_det_distill_train/student.pdparams
[2022/11/22 13:06:21] ppocr INFO:     print_batch_step : 10
[2022/11/22 13:06:21] ppocr INFO:     save_epoch_step : 100
[2022/11/22 13:06:21] ppocr INFO:     save_inference_dir : None
[2022/11/22 13:06:21] ppocr INFO:     save_model_dir : ./output/ch_PP-OCR_V3_det/
[2022/11/22 13:06:21] ppocr INFO:     save_res_path : ./checkpoints/det_db/predicts_db.txt
[2022/11/22 13:06:21] ppocr INFO:     use_gpu : True
[2022/11/22 13:06:21] ppocr INFO:     use_visualdl : False
[2022/11/22 13:06:21] ppocr INFO: Loss : 
[2022/11/22 13:06:21] ppocr INFO:     alpha : 5
[2022/11/22 13:06:21] ppocr INFO:     balance_loss : True
[2022/11/22 13:06:21] ppocr INFO:     beta : 10
[2022/11/22 13:06:21] ppocr INFO:     main_loss_type : DiceLoss
[2022/11/22 13:06:21] ppocr INFO:     name : DBLoss
[2022/11/22 13:06:21] ppocr INFO:     ohem_ratio : 3
[2022/11/22 13:06:21] ppocr INFO: Metric : 
[2022/11/22 13:06:21] ppocr INFO:     main_indicator : hmean
[2022/11/22 13:06:21] ppocr INFO:     name : DetMetric
[2022/11/22 13:06:21] ppocr INFO: Optimizer : 
[2022/11/22 13:06:21] ppocr INFO:     beta1 : 0.9
[2022/11/22 13:06:21] ppocr INFO:     beta2 : 0.999
[2022/11/22 13:06:21] ppocr INFO:     lr : 
[2022/11/22 13:06:21] ppocr INFO:         learning_rate : 0.001
[2022/11/22 13:06:21] ppocr INFO:         name : Cosine
[2022/11/22 13:06:21] ppocr INFO:         warmup_epoch : 2
[2022/11/22 13:06:21] ppocr INFO:     name : Adam
[2022/11/22 13:06:21] ppocr INFO:     regularizer : 
[2022/11/22 13:06:21] ppocr INFO:         factor : 5e-05
[2022/11/22 13:06:21] ppocr INFO:         name : L2
[2022/11/22 13:06:21] ppocr INFO: PostProcess : 
[2022/11/22 13:06:21] ppocr INFO:     box_thresh : 0.6
[2022/11/22 13:06:21] ppocr INFO:     max_candidates : 1000
[2022/11/22 13:06:21] ppocr INFO:     name : DBPostProcess
[2022/11/22 13:06:21] ppocr INFO:     thresh : 0.3
[2022/11/22 13:06:21] ppocr INFO:     unclip_ratio : 1.5
[2022/11/22 13:06:21] ppocr INFO: Train : 
[2022/11/22 13:06:21] ppocr INFO:     dataset : 
[2022/11/22 13:06:21] ppocr INFO:         data_dir : ./dataset/images
[2022/11/22 13:06:21] ppocr INFO:         label_file_list : ['./det_train_label.txt']
[2022/11/22 13:06:21] ppocr INFO:         name : SimpleDataSet
[2022/11/22 13:06:21] ppocr INFO:         ratio_list : [1.0]
[2022/11/22 13:06:21] ppocr INFO:         transforms : 
[2022/11/22 13:06:21] ppocr INFO:             DecodeImage : 
[2022/11/22 13:06:21] ppocr INFO:                 channel_first : False
[2022/11/22 13:06:21] ppocr INFO:                 img_mode : BGR
[2022/11/22 13:06:21] ppocr INFO:             DetLabelEncode : None
[2022/11/22 13:06:21] ppocr INFO:             IaaAugment : 
[2022/11/22 13:06:21] ppocr INFO:                 augmenter_args : 
[2022/11/22 13:06:21] ppocr INFO:                     args : 
[2022/11/22 13:06:21] ppocr INFO:                         p : 0.5
[2022/11/22 13:06:21] ppocr INFO:                     type : Fliplr
[2022/11/22 13:06:21] ppocr INFO:                     args : 
[2022/11/22 13:06:21] ppocr INFO:                         rotate : [-10, 10]
[2022/11/22 13:06:21] ppocr INFO:                     type : Affine
[2022/11/22 13:06:21] ppocr INFO:                     args : 
[2022/11/22 13:06:21] ppocr INFO:                         size : [0.5, 3]
[2022/11/22 13:06:21] ppocr INFO:                     type : Resize
[2022/11/22 13:06:21] ppocr INFO:             EastRandomCropData : 
[2022/11/22 13:06:21] ppocr INFO:                 keep_ratio : True
[2022/11/22 13:06:21] ppocr INFO:                 max_tries : 50
[2022/11/22 13:06:21] ppocr INFO:                 size : [960, 960]
[2022/11/22 13:06:21] ppocr INFO:             MakeBorderMap : 
[2022/11/22 13:06:21] ppocr INFO:                 shrink_ratio : 0.4
[2022/11/22 13:06:21] ppocr INFO:                 thresh_max : 0.7
[2022/11/22 13:06:21] ppocr INFO:                 thresh_min : 0.3
[2022/11/22 13:06:21] ppocr INFO:             MakeShrinkMap : 
[2022/11/22 13:06:21] ppocr INFO:                 min_text_size : 8
[2022/11/22 13:06:21] ppocr INFO:                 shrink_ratio : 0.4
[2022/11/22 13:06:21] ppocr INFO:             NormalizeImage : 
[2022/11/22 13:06:21] ppocr INFO:                 mean : [0.485, 0.456, 0.406]
[2022/11/22 13:06:21] ppocr INFO:                 order : hwc
[2022/11/22 13:06:21] ppocr INFO:                 scale : 1./255.
[2022/11/22 13:06:21] ppocr INFO:                 std : [0.229, 0.224, 0.225]
[2022/11/22 13:06:21] ppocr INFO:             ToCHWImage : None
[2022/11/22 13:06:21] ppocr INFO:             KeepKeys : 
[2022/11/22 13:06:21] ppocr INFO:                 keep_keys : ['image', 'threshold_map', 'threshold_mask', 'shrink_map', 'shrink_mask']
[2022/11/22 13:06:21] ppocr INFO:     loader : 
[2022/11/22 13:06:21] ppocr INFO:         batch_size_per_card : 8
[2022/11/22 13:06:21] ppocr INFO:         drop_last : False
[2022/11/22 13:06:21] ppocr INFO:         num_workers : 4
[2022/11/22 13:06:21] ppocr INFO:         shuffle : True
[2022/11/22 13:06:21] ppocr INFO: profiler_options : None
[2022/11/22 13:06:21] ppocr INFO: train with paddle 2.3.2 and device Place(gpu:0)
[2022/11/22 13:06:21] ppocr INFO: Initialize indexs of datasets:['./det_train_label.txt']
[2022/11/22 13:06:21] ppocr INFO: Initialize indexs of datasets:['./det_eval_label.txt']
W1122 13:06:21.615907 1637263 gpu_resources.cc:61] Please NOTE: device: 0, GPU Compute Capability: 8.0, Driver API Version: 11.8, Runtime API Version: 11.6
W1122 13:06:21.621130 1637263 gpu_resources.cc:91] device: 0, cuDNN Version: 8.4.
[2022/11/22 13:06:25] ppocr INFO: train dataloader has 250 iters
[2022/11/22 13:06:25] ppocr INFO: valid dataloader has 1003 iters
[2022/11/22 13:06:28] ppocr INFO: load pretrain successful from ./PaddleOCR/pretrained_model/ch_PP-OCRv3_det_distill_train/student
[2022/11/22 13:06:28] ppocr INFO: During the training process, after the 0th iteration, an evaluation is run every 200 iterations
[2022/11/22 13:06:41] ppocr INFO: epoch: [1/50], global_step: 10, lr: 0.000009, loss: 3.958436, loss_shrink_maps: 2.519069, loss_threshold_maps: 0.946237, loss_binary_maps: 0.504128, avg_reader_cost: 0.24535 s, avg_batch_cost: 1.18631 s, avg_samples: 8.0, ips: 6.74361 samples/s, eta: 4:06:56
[2022/11/22 13:06:45] ppocr INFO: epoch: [1/50], global_step: 20, lr: 0.000019, loss: 3.866610, loss_shrink_maps: 2.481548, loss_threshold_maps: 0.944892, loss_binary_maps: 0.496426, avg_reader_cost: 0.00237 s, avg_batch_cost: 0.36309 s, avg_samples: 8.0, ips: 22.03292 samples/s, eta: 2:41:08
[2022/11/22 13:06:49] ppocr INFO: epoch: [1/50], global_step: 30, lr: 0.000039, loss: 3.867103, loss_shrink_maps: 2.555314, loss_threshold_maps: 0.927763, loss_binary_maps: 0.511137, avg_reader_cost: 0.00391 s, avg_batch_cost: 0.36944 s, avg_samples: 8.0, ips: 21.65439 samples/s, eta: 2:12:55
[2022/11/22 13:06:53] ppocr INFO: epoch: [1/50], global_step: 40, lr: 0.000059, loss: 3.878752, loss_shrink_maps: 2.555314, loss_threshold_maps: 0.883135, loss_binary_maps: 0.511137, avg_reader_cost: 0.00372 s, avg_batch_cost: 0.36812 s, avg_samples: 8.0, ips: 21.73230 samples/s, eta: 1:58:43
[2022/11/22 13:06:58] ppocr INFO: epoch: [1/50], global_step: 50, lr: 0.000079, loss: 3.711426, loss_shrink_maps: 2.362094, loss_threshold_maps: 0.808442, loss_binary_maps: 0.472508, avg_reader_cost: 0.00139 s, avg_batch_cost: 0.36452 s, avg_samples: 8.0, ips: 21.94658 samples/s, eta: 1:50:02
[2022/11/22 13:07:02] ppocr INFO: epoch: [1/50], global_step: 60, lr: 0.000099, loss: 3.374191, loss_shrink_maps: 2.211356, loss_threshold_maps: 0.763958, loss_binary_maps: 0.442548, avg_reader_cost: 0.00234 s, avg_batch_cost: 0.36611 s, avg_samples: 8.0, ips: 21.85163 samples/s, eta: 1:44:16
[2022/11/22 13:07:06] ppocr INFO: epoch: [1/50], global_step: 70, lr: 0.000119, loss: 3.223969, loss_shrink_maps: 2.070736, loss_threshold_maps: 0.746714, loss_binary_maps: 0.414206, avg_reader_cost: 0.00026 s, avg_batch_cost: 0.36592 s, avg_samples: 8.0, ips: 21.86294 samples/s, eta: 1:40:08
[2022/11/22 13:07:10] ppocr INFO: epoch: [1/50], global_step: 80, lr: 0.000139, loss: 2.865101, loss_shrink_maps: 1.764307, loss_threshold_maps: 0.722126, loss_binary_maps: 0.353773, avg_reader_cost: 0.00557 s, avg_batch_cost: 0.37052 s, avg_samples: 8.0, ips: 21.59116 samples/s, eta: 1:37:08
[2022/11/22 13:07:15] ppocr INFO: epoch: [1/50], global_step: 90, lr: 0.000159, loss: 2.916367, loss_shrink_maps: 1.785145, loss_threshold_maps: 0.736615, loss_binary_maps: 0.357866, avg_reader_cost: 0.00193 s, avg_batch_cost: 0.36543 s, avg_samples: 8.0, ips: 21.89180 samples/s, eta: 1:34:40
[2022/11/22 13:07:19] ppocr INFO: epoch: [1/50], global_step: 100, lr: 0.000179, loss: 2.895378, loss_shrink_maps: 1.764575, loss_threshold_maps: 0.750305, loss_binary_maps: 0.352447, avg_reader_cost: 0.00201 s, avg_batch_cost: 0.36573 s, avg_samples: 8.0, ips: 21.87380 samples/s, eta: 1:32:41
[2022/11/22 13:07:23] ppocr INFO: epoch: [1/50], global_step: 110, lr: 0.000199, loss: 2.629834, loss_shrink_maps: 1.587836, loss_threshold_maps: 0.753817, loss_binary_maps: 0.316871, avg_reader_cost: 0.00125 s, avg_batch_cost: 0.36450 s, avg_samples: 8.0, ips: 21.94768 samples/s, eta: 1:31:02
[2022/11/22 13:07:27] ppocr INFO: epoch: [1/50], global_step: 120, lr: 0.000219, loss: 2.397547, loss_shrink_maps: 1.460247, loss_threshold_maps: 0.667723, loss_binary_maps: 0.291848, avg_reader_cost: 0.00466 s, avg_batch_cost: 0.37048 s, avg_samples: 8.0, ips: 21.59387 samples/s, eta: 1:29:45
[2022/11/22 13:07:32] ppocr INFO: epoch: [1/50], global_step: 130, lr: 0.000239, loss: 2.378679, loss_shrink_maps: 1.401813, loss_threshold_maps: 0.673884, loss_binary_maps: 0.280701, avg_reader_cost: 0.00088 s, avg_batch_cost: 0.36518 s, avg_samples: 8.0, ips: 21.90695 samples/s, eta: 1:28:34
[2022/11/22 13:07:36] ppocr INFO: epoch: [1/50], global_step: 140, lr: 0.000259, loss: 2.451726, loss_shrink_maps: 1.482388, loss_threshold_maps: 0.681260, loss_binary_maps: 0.296235, avg_reader_cost: 0.00300 s, avg_batch_cost: 0.38128 s, avg_samples: 8.0, ips: 20.98186 samples/s, eta: 1:27:47
[2022/11/22 13:07:41] ppocr INFO: epoch: [1/50], global_step: 150, lr: 0.000279, loss: 2.589176, loss_shrink_maps: 1.562890, loss_threshold_maps: 0.715202, loss_binary_maps: 0.312893, avg_reader_cost: 0.00400 s, avg_batch_cost: 0.37979 s, avg_samples: 8.0, ips: 21.06412 samples/s, eta: 1:27:05
[2022/11/22 13:07:45] ppocr INFO: epoch: [1/50], global_step: 160, lr: 0.000299, loss: 2.706166, loss_shrink_maps: 1.639270, loss_threshold_maps: 0.734711, loss_binary_maps: 0.328181, avg_reader_cost: 0.00109 s, avg_batch_cost: 0.37755 s, avg_samples: 8.0, ips: 21.18906 samples/s, eta: 1:26:25
[2022/11/22 13:07:49] ppocr INFO: epoch: [1/50], global_step: 170, lr: 0.000319, loss: 2.643976, loss_shrink_maps: 1.618946, loss_threshold_maps: 0.707388, loss_binary_maps: 0.324081, avg_reader_cost: 0.00578 s, avg_batch_cost: 0.38069 s, avg_samples: 8.0, ips: 21.01471 samples/s, eta: 1:25:52
[2022/11/22 13:07:54] ppocr INFO: epoch: [1/50], global_step: 180, lr: 0.000339, loss: 2.542865, loss_shrink_maps: 1.494720, loss_threshold_maps: 0.716266, loss_binary_maps: 0.299067, avg_reader_cost: 0.00024 s, avg_batch_cost: 0.37948 s, avg_samples: 8.0, ips: 21.08149 samples/s, eta: 1:25:22
[2022/11/22 13:07:58] ppocr INFO: epoch: [1/50], global_step: 190, lr: 0.000359, loss: 2.484875, loss_shrink_maps: 1.468433, loss_threshold_maps: 0.721491, loss_binary_maps: 0.293729, avg_reader_cost: 0.00022 s, avg_batch_cost: 0.36306 s, avg_samples: 8.0, ips: 22.03508 samples/s, eta: 1:24:44
[2022/11/22 13:08:02] ppocr INFO: epoch: [1/50], global_step: 200, lr: 0.000379, loss: 2.391915, loss_shrink_maps: 1.404688, loss_threshold_maps: 0.692680, loss_binary_maps: 0.281057, avg_reader_cost: 0.00021 s, avg_batch_cost: 0.37660 s, avg_samples: 8.0, ips: 21.24274 samples/s, eta: 1:24:17
eval model:: 100%|██████████████████████████| 1003/1003 [00:55<00:00, 18.21it/s]
[2022/11/22 13:08:57] ppocr INFO: cur metric, precision: 0.766156462585034, recall: 0.9745808545159546, hmean: 0.8578909783384908, fps: 24.47464613318005
[2022/11/22 13:08:57] ppocr INFO: save best model is to ./output/ch_PP-OCR_V3_det/best_accuracy
[2022/11/22 13:08:57] ppocr INFO: best metric, hmean: 0.8578909783384908, is_float16: False, precision: 0.766156462585034, recall: 0.9745808545159546, fps: 24.47464613318005, best_epoch: 1
......
eval model:: 100%|██████████████████████████| 1003/1003 [01:00<00:00, 16.46it/s]
[2022/11/22 15:41:39] ppocr INFO: cur metric, precision: 0.9622942113648434, recall: 0.9799891833423472, hmean: 0.9710610932475884, fps: 22.98522684952537
[2022/11/22 15:41:39] ppocr INFO: best metric, hmean: 0.9742212674543502, is_float16: False, precision: 0.9674666666666667, recall: 0.9810708491076258, fps: 23.192479978844272, best_epoch: 31
[2022/11/22 15:41:43] ppocr INFO: epoch: [50/50], global_step: 12410, lr: 0.000006, loss: 1.340954, loss_shrink_maps: 0.700807, loss_threshold_maps: 0.478234, loss_binary_maps: 0.139876, avg_reader_cost: 0.00200 s, avg_batch_cost: 0.37641 s, avg_samples: 8.0, ips: 21.25350 samples/s, eta: 0:00:34
[2022/11/22 15:41:48] ppocr INFO: epoch: [50/50], global_step: 12420, lr: 0.000005, loss: 1.443306, loss_shrink_maps: 0.745190, loss_threshold_maps: 0.499261, loss_binary_maps: 0.149030, avg_reader_cost: 0.00094 s, avg_batch_cost: 0.39211 s, avg_samples: 8.0, ips: 20.40230 samples/s, eta: 0:00:30
[2022/11/22 15:41:52] ppocr INFO: epoch: [50/50], global_step: 12430, lr: 0.000005, loss: 1.321360, loss_shrink_maps: 0.683633, loss_threshold_maps: 0.489775, loss_binary_maps: 0.136621, avg_reader_cost: 0.00189 s, avg_batch_cost: 0.38061 s, avg_samples: 8.0, ips: 21.01903 samples/s, eta: 0:00:27
[2022/11/22 15:41:56] ppocr INFO: epoch: [50/50], global_step: 12440, lr: 0.000005, loss: 1.238384, loss_shrink_maps: 0.635735, loss_threshold_maps: 0.448261, loss_binary_maps: 0.126847, avg_reader_cost: 0.00201 s, avg_batch_cost: 0.37915 s, avg_samples: 8.0, ips: 21.09974 samples/s, eta: 0:00:23
[2022/11/22 15:42:01] ppocr INFO: epoch: [50/50], global_step: 12450, lr: 0.000005, loss: 1.191645, loss_shrink_maps: 0.622861, loss_threshold_maps: 0.437820, loss_binary_maps: 0.124672, avg_reader_cost: 0.00017 s, avg_batch_cost: 0.37562 s, avg_samples: 8.0, ips: 21.29834 samples/s, eta: 0:00:19
[2022/11/22 15:42:05] ppocr INFO: epoch: [50/50], global_step: 12460, lr: 0.000005, loss: 1.180676, loss_shrink_maps: 0.629529, loss_threshold_maps: 0.427876, loss_binary_maps: 0.126142, avg_reader_cost: 0.00199 s, avg_batch_cost: 0.37954 s, avg_samples: 8.0, ips: 21.07820 samples/s, eta: 0:00:15
[2022/11/22 15:42:10] ppocr INFO: epoch: [50/50], global_step: 12470, lr: 0.000005, loss: 1.238333, loss_shrink_maps: 0.670731, loss_threshold_maps: 0.432300, loss_binary_maps: 0.134142, avg_reader_cost: 0.00199 s, avg_batch_cost: 0.37980 s, avg_samples: 8.0, ips: 21.06347 samples/s, eta: 0:00:11
[2022/11/22 15:42:14] ppocr INFO: epoch: [50/50], global_step: 12480, lr: 0.000004, loss: 1.254117, loss_shrink_maps: 0.662911, loss_threshold_maps: 0.454790, loss_binary_maps: 0.132727, avg_reader_cost: 0.00181 s, avg_batch_cost: 0.37586 s, avg_samples: 8.0, ips: 21.28430 samples/s, eta: 0:00:07
[2022/11/22 15:42:18] ppocr INFO: epoch: [50/50], global_step: 12490, lr: 0.000004, loss: 1.324386, loss_shrink_maps: 0.701260, loss_threshold_maps: 0.501584, loss_binary_maps: 0.140036, avg_reader_cost: 0.00019 s, avg_batch_cost: 0.37545 s, avg_samples: 8.0, ips: 21.30779 samples/s, eta: 0:00:03
[2022/11/22 15:42:23] ppocr INFO: epoch: [50/50], global_step: 12500, lr: 0.000004, loss: 1.407378, loss_shrink_maps: 0.767860, loss_threshold_maps: 0.488598, loss_binary_maps: 0.153395, avg_reader_cost: 0.00346 s, avg_batch_cost: 0.37688 s, avg_samples: 8.0, ips: 21.22694 samples/s, eta: 0:00:00
[2022/11/22 15:42:23] ppocr INFO: save model in ./output/ch_PP-OCR_V3_det/latest
[2022/11/22 15:42:23] ppocr INFO: best metric, hmean: 0.9742212674543502, is_float16: False, precision: 0.9674666666666667, recall: 0.9810708491076258, fps: 23.192479978844272, best_epoch: 31

修改了默认超参数,进行训练,模型ch_PP-OCRv3_det_student在训练集上训练50个epoch后,模型在验证集上的hmean达到:97.4%,在后面的epochs无明显增长

如果在训练过程中显存超限,可以把batch size调小一点

5.2 识别模型

5.2.1 识别模型配置

PaddleOCR也提供了许多识别模型,在路径PaddleOCR-r/configs/rec下可找到模型及其配置文件。如我们选用模型ch_PP-OCRv3_rec_distillation,其配置文件路径在:PaddleOCR/configs/rec/PP-OCRv3/ch_PP-OCRv3_rec_distillation.yml。使用前需对其进行必要的设置,如训练参数、数据集路径等。同样也要下载下来预训练权重模型,将部分关键配置展示如下:

#关键训练参数
use_gpu: true #是否使用显卡
epoch_num: 100 #训练epoch个数
save_model_dir: ./output/rec_ppocr_v3_distillation #模型保存路径
save_epoch_step: 100 #每训练100step,保存一次模型
eval_batch_step: [0, 100] #训练每迭代100次,进行一次验证
pretrained_model: ./PaddleOCR/pretrain_modeled/ch_PP-OCRv3_rec_train/best_accuracy.pdparams #预训练模型路径
#训练集路径设置
Train:
  dataset:
    name: SimpleDataSet
    data_dir: ./RecTrainData/ #图片文件夹路径
    label_file_list:
      - ./rec_train_label.txt #标签路径

5.2.2 模型微调

在notebook中运行如下命令对模型进行微调,其中 -c 传入的为配置好的模型文件路径

!python PaddleOCR/tools/train.py \
    -c PaddleOCR/configs/rec/PP-OCRv3/ch_PP-OCRv3_rec_distillation.yml
[2022/11/22 15:42:30] ppocr INFO: Architecture : 
[2022/11/22 15:42:30] ppocr INFO:     Models : 
[2022/11/22 15:42:30] ppocr INFO:         Student : 
[2022/11/22 15:42:30] ppocr INFO:             Backbone : 
[2022/11/22 15:42:30] ppocr INFO:                 last_conv_stride : [1, 2]
[2022/11/22 15:42:30] ppocr INFO:                 last_pool_type : avg
[2022/11/22 15:42:30] ppocr INFO:                 name : MobileNetV1Enhance
[2022/11/22 15:42:30] ppocr INFO:                 scale : 0.5
[2022/11/22 15:42:30] ppocr INFO:             Head : 
[2022/11/22 15:42:30] ppocr INFO:                 head_list : 
[2022/11/22 15:42:30] ppocr INFO:                     CTCHead : 
[2022/11/22 15:42:30] ppocr INFO:                         Head : 
[2022/11/22 15:42:30] ppocr INFO:                             fc_decay : 1e-05
[2022/11/22 15:42:30] ppocr INFO:                         Neck : 
[2022/11/22 15:42:30] ppocr INFO:                             depth : 2
[2022/11/22 15:42:30] ppocr INFO:                             dims : 64
[2022/11/22 15:42:30] ppocr INFO:                             hidden_dims : 120
[2022/11/22 15:42:30] ppocr INFO:                             name : svtr
[2022/11/22 15:42:30] ppocr INFO:                             use_guide : True
[2022/11/22 15:42:30] ppocr INFO:                     SARHead : 
[2022/11/22 15:42:30] ppocr INFO:                         enc_dim : 512
[2022/11/22 15:42:30] ppocr INFO:                         max_text_length : 25
[2022/11/22 15:42:30] ppocr INFO:                 name : MultiHead
[2022/11/22 15:42:30] ppocr INFO:             Transform : None
[2022/11/22 15:42:30] ppocr INFO:             algorithm : SVTR
[2022/11/22 15:42:30] ppocr INFO:             freeze_params : False
[2022/11/22 15:42:30] ppocr INFO:             model_type : rec
[2022/11/22 15:42:30] ppocr INFO:             pretrained : None
[2022/11/22 15:42:30] ppocr INFO:             return_all_feats : True
[2022/11/22 15:42:30] ppocr INFO:         Teacher : 
[2022/11/22 15:42:30] ppocr INFO:             Backbone : 
[2022/11/22 15:42:30] ppocr INFO:                 last_conv_stride : [1, 2]
[2022/11/22 15:42:30] ppocr INFO:                 last_pool_type : avg
[2022/11/22 15:42:30] ppocr INFO:                 name : MobileNetV1Enhance
[2022/11/22 15:42:30] ppocr INFO:                 scale : 0.5
[2022/11/22 15:42:30] ppocr INFO:             Head : 
[2022/11/22 15:42:30] ppocr INFO:                 head_list : 
[2022/11/22 15:42:30] ppocr INFO:                     CTCHead : 
[2022/11/22 15:42:30] ppocr INFO:                         Head : 
[2022/11/22 15:42:30] ppocr INFO:                             fc_decay : 1e-05
[2022/11/22 15:42:30] ppocr INFO:                         Neck : 
[2022/11/22 15:42:30] ppocr INFO:                             depth : 2
[2022/11/22 15:42:30] ppocr INFO:                             dims : 64
[2022/11/22 15:42:30] ppocr INFO:                             hidden_dims : 120
[2022/11/22 15:42:30] ppocr INFO:                             name : svtr
[2022/11/22 15:42:30] ppocr INFO:                             use_guide : True
[2022/11/22 15:42:30] ppocr INFO:                     SARHead : 
[2022/11/22 15:42:30] ppocr INFO:                         enc_dim : 512
[2022/11/22 15:42:30] ppocr INFO:                         max_text_length : 25
[2022/11/22 15:42:30] ppocr INFO:                 name : MultiHead
[2022/11/22 15:42:30] ppocr INFO:             Transform : None
[2022/11/22 15:42:30] ppocr INFO:             algorithm : SVTR
[2022/11/22 15:42:30] ppocr INFO:             freeze_params : False
[2022/11/22 15:42:30] ppocr INFO:             model_type : rec
[2022/11/22 15:42:30] ppocr INFO:             pretrained : None
[2022/11/22 15:42:30] ppocr INFO:             return_all_feats : True
[2022/11/22 15:42:30] ppocr INFO:     algorithm : Distillation
[2022/11/22 15:42:30] ppocr INFO:     model_type : rec
[2022/11/22 15:42:30] ppocr INFO:     name : DistillationModel
[2022/11/22 15:42:30] ppocr INFO: Eval : 
[2022/11/22 15:42:30] ppocr INFO:     dataset : 
[2022/11/22 15:42:30] ppocr INFO:         data_dir : ./RecEvalData/
[2022/11/22 15:42:30] ppocr INFO:         label_file_list : ['./rec_eval_label.txt']
[2022/11/22 15:42:30] ppocr INFO:         name : SimpleDataSet
[2022/11/22 15:42:30] ppocr INFO:         transforms : 
[2022/11/22 15:42:30] ppocr INFO:             DecodeImage : 
[2022/11/22 15:42:30] ppocr INFO:                 channel_first : False
[2022/11/22 15:42:30] ppocr INFO:                 img_mode : BGR
[2022/11/22 15:42:30] ppocr INFO:             MultiLabelEncode : None
[2022/11/22 15:42:30] ppocr INFO:             RecResizeImg : 
[2022/11/22 15:42:30] ppocr INFO:                 image_shape : [3, 48, 320]
[2022/11/22 15:42:30] ppocr INFO:             KeepKeys : 
[2022/11/22 15:42:30] ppocr INFO:                 keep_keys : ['image', 'label_ctc', 'label_sar', 'length', 'valid_ratio']
[2022/11/22 15:42:30] ppocr INFO:     loader : 
[2022/11/22 15:42:30] ppocr INFO:         batch_size_per_card : 128
[2022/11/22 15:42:30] ppocr INFO:         drop_last : False
[2022/11/22 15:42:30] ppocr INFO:         num_workers : 4
[2022/11/22 15:42:30] ppocr INFO:         shuffle : False
[2022/11/22 15:42:30] ppocr INFO: Global : 
[2022/11/22 15:42:30] ppocr INFO:     cal_metric_during_train : True
[2022/11/22 15:42:30] ppocr INFO:     character_dict_path : ./PaddleOCR/ppocr/utils/ppocr_keys_v1.txt
[2022/11/22 15:42:30] ppocr INFO:     checkpoints : None
[2022/11/22 15:42:30] ppocr INFO:     debug : False
[2022/11/22 15:42:30] ppocr INFO:     distributed : False
[2022/11/22 15:42:30] ppocr INFO:     epoch_num : 50
[2022/11/22 15:42:30] ppocr INFO:     eval_batch_step : [0, 100]
[2022/11/22 15:42:30] ppocr INFO:     infer_img : doc/imgs_words/ch/word_1.jpg
[2022/11/22 15:42:30] ppocr INFO:     infer_mode : False
[2022/11/22 15:42:30] ppocr INFO:     log_smooth_window : 20
[2022/11/22 15:42:30] ppocr INFO:     max_text_length : 25
[2022/11/22 15:42:30] ppocr INFO:     pretrained_model : ./PaddleOCR/pretrained_model/ch_PP-OCRv3_rec_train/best_accuracy.pdparams
[2022/11/22 15:42:30] ppocr INFO:     print_batch_step : 10
[2022/11/22 15:42:30] ppocr INFO:     save_epoch_step : 100
[2022/11/22 15:42:30] ppocr INFO:     save_inference_dir : None
[2022/11/22 15:42:30] ppocr INFO:     save_model_dir : ./output/rec_ppocr_v3_distillation
[2022/11/22 15:42:30] ppocr INFO:     save_res_path : ./output/rec/predicts_ppocrv3_distillation.txt
[2022/11/22 15:42:30] ppocr INFO:     use_gpu : True
[2022/11/22 15:42:30] ppocr INFO:     use_space_char : True
[2022/11/22 15:42:30] ppocr INFO:     use_visualdl : False
[2022/11/22 15:42:30] ppocr INFO: Loss : 
[2022/11/22 15:42:30] ppocr INFO:     loss_config_list : 
[2022/11/22 15:42:30] ppocr INFO:         DistillationDMLLoss : 
[2022/11/22 15:42:30] ppocr INFO:             act : softmax
[2022/11/22 15:42:30] ppocr INFO:             dis_head : ctc
[2022/11/22 15:42:30] ppocr INFO:             key : head_out
[2022/11/22 15:42:30] ppocr INFO:             model_name_pairs : [['Student', 'Teacher']]
[2022/11/22 15:42:30] ppocr INFO:             multi_head : True
[2022/11/22 15:42:30] ppocr INFO:             name : dml_ctc
[2022/11/22 15:42:30] ppocr INFO:             use_log : True
[2022/11/22 15:42:30] ppocr INFO:             weight : 1.0
[2022/11/22 15:42:30] ppocr INFO:         DistillationDMLLoss : 
[2022/11/22 15:42:30] ppocr INFO:             act : softmax
[2022/11/22 15:42:30] ppocr INFO:             dis_head : sar
[2022/11/22 15:42:30] ppocr INFO:             key : head_out
[2022/11/22 15:42:30] ppocr INFO:             model_name_pairs : [['Student', 'Teacher']]
[2022/11/22 15:42:30] ppocr INFO:             multi_head : True
[2022/11/22 15:42:30] ppocr INFO:             name : dml_sar
[2022/11/22 15:42:30] ppocr INFO:             use_log : True
[2022/11/22 15:42:30] ppocr INFO:             weight : 0.5
[2022/11/22 15:42:30] ppocr INFO:         DistillationDistanceLoss : 
[2022/11/22 15:42:30] ppocr INFO:             key : backbone_out
[2022/11/22 15:42:30] ppocr INFO:             mode : l2
[2022/11/22 15:42:30] ppocr INFO:             model_name_pairs : [['Student', 'Teacher']]
[2022/11/22 15:42:30] ppocr INFO:             weight : 1.0
[2022/11/22 15:42:30] ppocr INFO:         DistillationCTCLoss : 
[2022/11/22 15:42:30] ppocr INFO:             key : head_out
[2022/11/22 15:42:30] ppocr INFO:             model_name_list : ['Student', 'Teacher']
[2022/11/22 15:42:30] ppocr INFO:             multi_head : True
[2022/11/22 15:42:30] ppocr INFO:             weight : 1.0
[2022/11/22 15:42:30] ppocr INFO:         DistillationSARLoss : 
[2022/11/22 15:42:30] ppocr INFO:             key : head_out
[2022/11/22 15:42:30] ppocr INFO:             model_name_list : ['Student', 'Teacher']
[2022/11/22 15:42:30] ppocr INFO:             multi_head : True
[2022/11/22 15:42:30] ppocr INFO:             weight : 1.0
[2022/11/22 15:42:30] ppocr INFO:     name : CombinedLoss
[2022/11/22 15:42:30] ppocr INFO: Metric : 
[2022/11/22 15:42:30] ppocr INFO:     base_metric_name : RecMetric
[2022/11/22 15:42:30] ppocr INFO:     ignore_space : False
[2022/11/22 15:42:30] ppocr INFO:     key : Student
[2022/11/22 15:42:30] ppocr INFO:     main_indicator : acc
[2022/11/22 15:42:30] ppocr INFO:     name : DistillationMetric
[2022/11/22 15:42:30] ppocr INFO: Optimizer : 
[2022/11/22 15:42:30] ppocr INFO:     beta1 : 0.9
[2022/11/22 15:42:30] ppocr INFO:     beta2 : 0.999
[2022/11/22 15:42:30] ppocr INFO:     lr : 
[2022/11/22 15:42:30] ppocr INFO:         decay_epochs : [700, 800]
[2022/11/22 15:42:30] ppocr INFO:         name : Piecewise
[2022/11/22 15:42:30] ppocr INFO:         values : [0.0005, 5e-05]
[2022/11/22 15:42:30] ppocr INFO:         warmup_epoch : 5
[2022/11/22 15:42:30] ppocr INFO:     name : Adam
[2022/11/22 15:42:30] ppocr INFO:     regularizer : 
[2022/11/22 15:42:30] ppocr INFO:         factor : 3e-05
[2022/11/22 15:42:30] ppocr INFO:         name : L2
[2022/11/22 15:42:30] ppocr INFO: PostProcess : 
[2022/11/22 15:42:30] ppocr INFO:     key : head_out
[2022/11/22 15:42:30] ppocr INFO:     model_name : ['Student', 'Teacher']
[2022/11/22 15:42:30] ppocr INFO:     multi_head : True
[2022/11/22 15:42:30] ppocr INFO:     name : DistillationCTCLabelDecode
[2022/11/22 15:42:30] ppocr INFO: Train : 
[2022/11/22 15:42:30] ppocr INFO:     dataset : 
[2022/11/22 15:42:30] ppocr INFO:         data_dir : ./RecTrainData/
[2022/11/22 15:42:30] ppocr INFO:         ext_op_transform_idx : 1
[2022/11/22 15:42:30] ppocr INFO:         label_file_list : ['./rec_train_label.txt']
[2022/11/22 15:42:30] ppocr INFO:         name : SimpleDataSet
[2022/11/22 15:42:30] ppocr INFO:         transforms : 
[2022/11/22 15:42:30] ppocr INFO:             DecodeImage : 
[2022/11/22 15:42:30] ppocr INFO:                 channel_first : False
[2022/11/22 15:42:30] ppocr INFO:                 img_mode : BGR
[2022/11/22 15:42:30] ppocr INFO:             RecConAug : 
[2022/11/22 15:42:30] ppocr INFO:                 ext_data_num : 2
[2022/11/22 15:42:30] ppocr INFO:                 image_shape : [48, 320, 3]
[2022/11/22 15:42:30] ppocr INFO:                 max_text_length : 25
[2022/11/22 15:42:30] ppocr INFO:                 prob : 0.5
[2022/11/22 15:42:30] ppocr INFO:             RecAug : None
[2022/11/22 15:42:30] ppocr INFO:             MultiLabelEncode : None
[2022/11/22 15:42:30] ppocr INFO:             RecResizeImg : 
[2022/11/22 15:42:30] ppocr INFO:                 image_shape : [3, 48, 320]
[2022/11/22 15:42:30] ppocr INFO:             KeepKeys : 
[2022/11/22 15:42:30] ppocr INFO:                 keep_keys : ['image', 'label_ctc', 'label_sar', 'length', 'valid_ratio']
[2022/11/22 15:42:30] ppocr INFO:     loader : 
[2022/11/22 15:42:30] ppocr INFO:         batch_size_per_card : 128
[2022/11/22 15:42:30] ppocr INFO:         drop_last : True
[2022/11/22 15:42:30] ppocr INFO:         num_workers : 4
[2022/11/22 15:42:30] ppocr INFO:         shuffle : True
[2022/11/22 15:42:30] ppocr INFO: profiler_options : None
[2022/11/22 15:42:30] ppocr INFO: train with paddle 2.3.2 and device Place(gpu:0)
[2022/11/22 15:42:30] ppocr INFO: Initialize indexs of datasets:['./rec_train_label.txt']
[2022/11/22 15:42:30] ppocr INFO: Initialize indexs of datasets:['./rec_eval_label.txt']
W1122 15:42:30.640520 1858440 gpu_resources.cc:61] Please NOTE: device: 0, GPU Compute Capability: 8.0, Driver API Version: 11.8, Runtime API Version: 11.6
W1122 15:42:30.645676 1858440 gpu_resources.cc:91] device: 0, cuDNN Version: 8.4.
[2022/11/22 15:42:36] ppocr INFO: train dataloader has 28 iters
[2022/11/22 15:42:36] ppocr INFO: valid dataloader has 15 iters
[2022/11/22 15:42:40] ppocr INFO: load pretrain successful from ./PaddleOCR/pretrained_model/ch_PP-OCRv3_rec_train/best_accuracy
[2022/11/22 15:42:40] ppocr INFO: During the training process, after the 0th iteration, an evaluation is run every 100 iterations
[2022/11/22 15:43:51] ppocr INFO: epoch: [1/50], global_step: 10, lr: 0.000016, acc: 0.121094, norm_edit_dis: 0.606677, Teacher_acc: 0.167969, Teacher_norm_edit_dis: 0.677345, dml_ctc_0: 3.437894, loss: 35.609940, dml_sar_0: 3.548777, loss_distance_l2_Student_Teacher_0: 0.052442, loss_ctc_Student_0: 13.116257, loss_ctc_Teacher_1: 11.995207, loss_sar_Student_0: 1.639074, loss_sar_Teacher_1: 1.661752, avg_reader_cost: 0.34177 s, avg_batch_cost: 7.12488 s, avg_samples: 128.0, ips: 17.96523 samples/s, eta: 2:45:03
[2022/11/22 15:44:51] ppocr INFO: epoch: [1/50], global_step: 20, lr: 0.000034, acc: 0.167969, norm_edit_dis: 0.688300, Teacher_acc: 0.234375, Teacher_norm_edit_dis: 0.717374, dml_ctc_0: 2.784870, loss: 26.732544, dml_sar_0: 3.463835, loss_distance_l2_Student_Teacher_0: 0.033331, loss_ctc_Student_0: 8.958948, loss_ctc_Teacher_1: 8.770853, loss_sar_Student_0: 1.382131, loss_sar_Teacher_1: 1.443961, avg_reader_cost: 0.00024 s, avg_batch_cost: 5.97099 s, avg_samples: 128.0, ips: 21.43698 samples/s, eta: 2:30:36
[2022/11/22 15:45:38] ppocr INFO: epoch: [1/50], global_step: 28, lr: 0.000062, acc: 0.406250, norm_edit_dis: 0.845976, Teacher_acc: 0.375000, Teacher_norm_edit_dis: 0.832357, dml_ctc_0: 2.184122, loss: 18.748760, dml_sar_0: 3.200224, loss_distance_l2_Student_Teacher_0: 0.030766, loss_ctc_Student_0: 5.591382, loss_ctc_Teacher_1: 5.406458, loss_sar_Student_0: 1.356752, loss_sar_Teacher_1: 1.350660, avg_reader_cost: 0.00014 s, avg_batch_cost: 4.78187 s, avg_samples: 102.4, ips: 21.41421 samples/s, eta: 2:26:00
......
[2022/11/22 17:58:28] ppocr INFO: epoch: [48/50], global_step: 1330, lr: 0.000500, acc: 0.941406, norm_edit_dis: 0.990900, Teacher_acc: 0.941406, Teacher_norm_edit_dis: 0.991584, dml_ctc_0: 0.437659, loss: 5.970131, dml_sar_0: 1.605164, loss_distance_l2_Student_Teacher_0: 0.009221, loss_ctc_Student_0: 0.399064, loss_ctc_Teacher_1: 0.373846, loss_sar_Student_0: 1.546151, loss_sar_Teacher_1: 1.543791, avg_reader_cost: 0.00022 s, avg_batch_cost: 5.92309 s, avg_samples: 128.0, ips: 21.61035 samples/s, eta: 0:07:05
[2022/11/22 17:59:27] ppocr INFO: epoch: [48/50], global_step: 1340, lr: 0.000500, acc: 0.945312, norm_edit_dis: 0.991877, Teacher_acc: 0.945312, Teacher_norm_edit_dis: 0.991806, dml_ctc_0: 0.412004, loss: 5.852842, dml_sar_0: 1.617879, loss_distance_l2_Student_Teacher_0: 0.008481, loss_ctc_Student_0: 0.323649, loss_ctc_Teacher_1: 0.335442, loss_sar_Student_0: 1.572509, loss_sar_Teacher_1: 1.564403, avg_reader_cost: 0.00020 s, avg_batch_cost: 5.92285 s, avg_samples: 128.0, ips: 21.61121 samples/s, eta: 0:06:05
[2022/11/22 17:59:50] ppocr INFO: epoch: [48/50], global_step: 1344, lr: 0.000500, acc: 0.949219, norm_edit_dis: 0.992547, Teacher_acc: 0.945312, Teacher_norm_edit_dis: 0.992294, dml_ctc_0: 0.382377, loss: 5.763916, dml_sar_0: 1.625467, loss_distance_l2_Student_Teacher_0: 0.008044, loss_ctc_Student_0: 0.295685, loss_ctc_Teacher_1: 0.305044, loss_sar_Student_0: 1.593660, loss_sar_Teacher_1: 1.602923, avg_reader_cost: 0.00006 s, avg_batch_cost: 2.36978 s, avg_samples: 51.2, ips: 21.60541 samples/s, eta: 0:05:40
[2022/11/22 17:59:52] ppocr INFO: save model in ./output/rec_ppocr_v3_distillation/latest
[2022/11/22 18:00:30] ppocr INFO: epoch: [49/50], global_step: 1350, lr: 0.000500, acc: 0.953125, norm_edit_dis: 0.992251, Teacher_acc: 0.949219, Teacher_norm_edit_dis: 0.992294, dml_ctc_0: 0.380184, loss: 5.734705, dml_sar_0: 1.613987, loss_distance_l2_Student_Teacher_0: 0.008281, loss_ctc_Student_0: 0.286936, loss_ctc_Teacher_1: 0.283989, loss_sar_Student_0: 1.616180, loss_sar_Teacher_1: 1.584866, avg_reader_cost: 0.43557 s, avg_batch_cost: 4.00024 s, avg_samples: 76.8, ips: 19.19884 samples/s, eta: 0:05:04
[2022/11/22 18:01:30] ppocr INFO: epoch: [49/50], global_step: 1360, lr: 0.000500, acc: 0.933594, norm_edit_dis: 0.990372, Teacher_acc: 0.949219, Teacher_norm_edit_dis: 0.991896, dml_ctc_0: 0.517823, loss: 6.069489, dml_sar_0: 1.640750, loss_distance_l2_Student_Teacher_0: 0.012120, loss_ctc_Student_0: 0.346327, loss_ctc_Teacher_1: 0.297514, loss_sar_Student_0: 1.613212, loss_sar_Teacher_1: 1.597852, avg_reader_cost: 0.00020 s, avg_batch_cost: 5.92192 s, avg_samples: 128.0, ips: 21.61461 samples/s, eta: 0:04:03
[2022/11/22 18:02:29] ppocr INFO: epoch: [49/50], global_step: 1370, lr: 0.000500, acc: 0.929687, norm_edit_dis: 0.988311, Teacher_acc: 0.945312, Teacher_norm_edit_dis: 0.989802, dml_ctc_0: 0.608851, loss: 6.130374, dml_sar_0: 1.658343, loss_distance_l2_Student_Teacher_0: 0.013111, loss_ctc_Student_0: 0.383777, loss_ctc_Teacher_1: 0.333669, loss_sar_Student_0: 1.607618, loss_sar_Teacher_1: 1.596914, avg_reader_cost: 0.00018 s, avg_batch_cost: 5.92189 s, avg_samples: 128.0, ips: 21.61470 samples/s, eta: 0:03:02
[2022/11/22 18:02:41] ppocr INFO: epoch: [49/50], global_step: 1372, lr: 0.000500, acc: 0.929687, norm_edit_dis: 0.989469, Teacher_acc: 0.949219, Teacher_norm_edit_dis: 0.990958, dml_ctc_0: 0.608851, loss: 6.100844, dml_sar_0: 1.638648, loss_distance_l2_Student_Teacher_0: 0.013032, loss_ctc_Student_0: 0.365399, loss_ctc_Teacher_1: 0.333669, loss_sar_Student_0: 1.607618, loss_sar_Teacher_1: 1.596914, avg_reader_cost: 0.00003 s, avg_batch_cost: 1.18402 s, avg_samples: 25.6, ips: 21.62132 samples/s, eta: 0:02:50
[2022/11/22 18:02:42] ppocr INFO: save model in ./output/rec_ppocr_v3_distillation/latest
[2022/11/22 18:03:32] ppocr INFO: epoch: [50/50], global_step: 1380, lr: 0.000500, acc: 0.937500, norm_edit_dis: 0.987867, Teacher_acc: 0.953125, Teacher_norm_edit_dis: 0.990702, dml_ctc_0: 0.586539, loss: 6.132624, dml_sar_0: 1.632728, loss_distance_l2_Student_Teacher_0: 0.012799, loss_ctc_Student_0: 0.408299, loss_ctc_Teacher_1: 0.324945, loss_sar_Student_0: 1.611600, loss_sar_Teacher_1: 1.575031, avg_reader_cost: 0.39964 s, avg_batch_cost: 5.15052 s, avg_samples: 102.4, ips: 19.88150 samples/s, eta: 0:02:01
[2022/11/22 18:04:31] ppocr INFO: epoch: [50/50], global_step: 1390, lr: 0.000500, acc: 0.941406, norm_edit_dis: 0.990503, Teacher_acc: 0.945312, Teacher_norm_edit_dis: 0.991241, dml_ctc_0: 0.504434, loss: 5.998446, dml_sar_0: 1.597987, loss_distance_l2_Student_Teacher_0: 0.012755, loss_ctc_Student_0: 0.347707, loss_ctc_Teacher_1: 0.278820, loss_sar_Student_0: 1.595945, loss_sar_Teacher_1: 1.588920, avg_reader_cost: 0.00023 s, avg_batch_cost: 5.92184 s, avg_samples: 128.0, ips: 21.61492 samples/s, eta: 0:01:00
[2022/11/22 18:05:31] ppocr INFO: epoch: [50/50], global_step: 1400, lr: 0.000500, acc: 0.941406, norm_edit_dis: 0.988858, Teacher_acc: 0.945312, Teacher_norm_edit_dis: 0.990264, dml_ctc_0: 0.435958, loss: 5.922035, dml_sar_0: 1.579629, loss_distance_l2_Student_Teacher_0: 0.009988, loss_ctc_Student_0: 0.420990, loss_ctc_Teacher_1: 0.376272, loss_sar_Student_0: 1.586890, loss_sar_Teacher_1: 1.588358, avg_reader_cost: 0.00019 s, avg_batch_cost: 5.92275 s, avg_samples: 128.0, ips: 21.61156 samples/s, eta: 0:00:00
eval model:: 100%|██████████████████████████████| 15/15 [00:03<00:00,  4.31it/s]
[2022/11/22 18:05:34] ppocr INFO: cur metric, acc: 0.9648458574102442, norm_edit_dis: 0.9943541495766799, Teacher_acc: 0.9616008601319586, Teacher_norm_edit_dis: 0.9932329163901207, fps: 1027.3727543805567
[2022/11/22 18:05:34] ppocr INFO: best metric, acc: 0.9670091889291013, is_float16: False, norm_edit_dis: 0.9949488316175777, Teacher_acc: 0.9643050245305299, Teacher_norm_edit_dis: 0.9935388030383228, fps: 1018.9137081527358, best_epoch: 40
[2022/11/22 18:05:36] ppocr INFO: save model in ./output/rec_ppocr_v3_distillation/latest
[2022/11/22 18:05:36] ppocr INFO: best metric, acc: 0.9670091889291013, is_float16: False, norm_edit_dis: 0.9949488316175777, Teacher_acc: 0.9643050245305299, Teacher_norm_edit_dis: 0.9935388030383228, fps: 1018.9137081527358, best_epoch: 40

使用默认超参数,模型ch_PP-OCRv3_rec_distillation在训练集上训练50个epoch后,模型在验证集上的精度达到:96.7%,后续无明显增长

六、结果展示

6.1 检测模型推理

在notebook中运行如下命令使用微调过的模型检测测试图片中的文字,其中:

  • Global.infer_img 为图片路径或图片文件夹路径,
  • Global.pretrained_model 为微调过的模型,
  • Global.save_res_path 为推理结果保存路径
!python PaddleOCR/tools/infer_det.py \
    -c PaddleOCR/configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_student.yml \
    -o Global.infer_img="./dataset/images" Global.pretrained_model="./output/ch_PP-OCR_V3_det/best_accuracy" Global.save_res_path="./output/det_infer_res/predicts.txt"
[2022/11/22 19:26:36] ppocr INFO: Architecture : 
[2022/11/22 19:26:36] ppocr INFO:     Backbone : 
[2022/11/22 19:26:36] ppocr INFO:         disable_se : True
[2022/11/22 19:26:36] ppocr INFO:         model_name : large
[2022/11/22 19:26:36] ppocr INFO:         name : MobileNetV3
[2022/11/22 19:26:36] ppocr INFO:         scale : 0.5
[2022/11/22 19:26:36] ppocr INFO:     Head : 
[2022/11/22 19:26:36] ppocr INFO:         k : 50
[2022/11/22 19:26:36] ppocr INFO:         name : DBHead
[2022/11/22 19:26:36] ppocr INFO:     Neck : 
[2022/11/22 19:26:36] ppocr INFO:         name : RSEFPN
[2022/11/22 19:26:36] ppocr INFO:         out_channels : 96
[2022/11/22 19:26:36] ppocr INFO:         shortcut : True
[2022/11/22 19:26:36] ppocr INFO:     Transform : None
[2022/11/22 19:26:36] ppocr INFO:     algorithm : DB
[2022/11/22 19:26:36] ppocr INFO:     model_type : det
[2022/11/22 19:26:36] ppocr INFO: Eval : 
[2022/11/22 19:26:36] ppocr INFO:     dataset : 
[2022/11/22 19:26:36] ppocr INFO:         data_dir : ./dataset/images
[2022/11/22 19:26:36] ppocr INFO:         label_file_list : ['./det_eval_label.txt']
[2022/11/22 19:26:36] ppocr INFO:         name : SimpleDataSet
[2022/11/22 19:26:36] ppocr INFO:         transforms : 
[2022/11/22 19:26:36] ppocr INFO:             DecodeImage : 
[2022/11/22 19:26:36] ppocr INFO:                 channel_first : False
[2022/11/22 19:26:36] ppocr INFO:                 img_mode : BGR
[2022/11/22 19:26:36] ppocr INFO:             DetLabelEncode : None
[2022/11/22 19:26:36] ppocr INFO:             DetResizeForTest : None
[2022/11/22 19:26:36] ppocr INFO:             NormalizeImage : 
[2022/11/22 19:26:36] ppocr INFO:                 mean : [0.485, 0.456, 0.406]
[2022/11/22 19:26:36] ppocr INFO:                 order : hwc
[2022/11/22 19:26:36] ppocr INFO:                 scale : 1./255.
[2022/11/22 19:26:36] ppocr INFO:                 std : [0.229, 0.224, 0.225]
[2022/11/22 19:26:36] ppocr INFO:             ToCHWImage : None
[2022/11/22 19:26:36] ppocr INFO:             KeepKeys : 
[2022/11/22 19:26:36] ppocr INFO:                 keep_keys : ['image', 'shape', 'polys', 'ignore_tags']
[2022/11/22 19:26:36] ppocr INFO:     loader : 
[2022/11/22 19:26:36] ppocr INFO:         batch_size_per_card : 1
[2022/11/22 19:26:36] ppocr INFO:         drop_last : False
[2022/11/22 19:26:36] ppocr INFO:         num_workers : 2
[2022/11/22 19:26:36] ppocr INFO:         shuffle : False
[2022/11/22 19:26:36] ppocr INFO: Global : 
[2022/11/22 19:26:36] ppocr INFO:     cal_metric_during_train : False
[2022/11/22 19:26:36] ppocr INFO:     checkpoints : None
[2022/11/22 19:26:36] ppocr INFO:     debug : False
[2022/11/22 19:26:36] ppocr INFO:     distributed : False
[2022/11/22 19:26:36] ppocr INFO:     epoch_num : 50
[2022/11/22 19:26:36] ppocr INFO:     eval_batch_step : [0, 200]
[2022/11/22 19:26:36] ppocr INFO:     infer_img : ./dataset/images
[2022/11/22 19:26:36] ppocr INFO:     log_smooth_window : 20
[2022/11/22 19:26:36] ppocr INFO:     pretrained_model : ./output/ch_PP-OCR_V3_det/best_accuracy
[2022/11/22 19:26:36] ppocr INFO:     print_batch_step : 10
[2022/11/22 19:26:36] ppocr INFO:     save_epoch_step : 100
[2022/11/22 19:26:36] ppocr INFO:     save_inference_dir : None
[2022/11/22 19:26:36] ppocr INFO:     save_model_dir : ./output/ch_PP-OCR_V3_det/
[2022/11/22 19:26:36] ppocr INFO:     save_res_path : ./output/det_infer_res/predicts.txt
[2022/11/22 19:26:36] ppocr INFO:     use_gpu : True
[2022/11/22 19:26:36] ppocr INFO:     use_visualdl : False
[2022/11/22 19:26:36] ppocr INFO: Loss : 
[2022/11/22 19:26:36] ppocr INFO:     alpha : 5
[2022/11/22 19:26:36] ppocr INFO:     balance_loss : True
[2022/11/22 19:26:36] ppocr INFO:     beta : 10
[2022/11/22 19:26:36] ppocr INFO:     main_loss_type : DiceLoss
[2022/11/22 19:26:36] ppocr INFO:     name : DBLoss
[2022/11/22 19:26:36] ppocr INFO:     ohem_ratio : 3
[2022/11/22 19:26:36] ppocr INFO: Metric : 
[2022/11/22 19:26:36] ppocr INFO:     main_indicator : hmean
[2022/11/22 19:26:36] ppocr INFO:     name : DetMetric
[2022/11/22 19:26:36] ppocr INFO: Optimizer : 
[2022/11/22 19:26:36] ppocr INFO:     beta1 : 0.9
[2022/11/22 19:26:36] ppocr INFO:     beta2 : 0.999
[2022/11/22 19:26:36] ppocr INFO:     lr : 
[2022/11/22 19:26:36] ppocr INFO:         learning_rate : 0.001
[2022/11/22 19:26:36] ppocr INFO:         name : Cosine
[2022/11/22 19:26:36] ppocr INFO:         warmup_epoch : 2
[2022/11/22 19:26:36] ppocr INFO:     name : Adam
[2022/11/22 19:26:36] ppocr INFO:     regularizer : 
[2022/11/22 19:26:36] ppocr INFO:         factor : 5e-05
[2022/11/22 19:26:36] ppocr INFO:         name : L2
[2022/11/22 19:26:36] ppocr INFO: PostProcess : 
[2022/11/22 19:26:36] ppocr INFO:     box_thresh : 0.6
[2022/11/22 19:26:36] ppocr INFO:     max_candidates : 1000
[2022/11/22 19:26:36] ppocr INFO:     name : DBPostProcess
[2022/11/22 19:26:36] ppocr INFO:     thresh : 0.3
[2022/11/22 19:26:36] ppocr INFO:     unclip_ratio : 1.5
[2022/11/22 19:26:36] ppocr INFO: Train : 
[2022/11/22 19:26:36] ppocr INFO:     dataset : 
[2022/11/22 19:26:36] ppocr INFO:         data_dir : ./dataset/images
[2022/11/22 19:26:36] ppocr INFO:         label_file_list : ['./det_train_label.txt']
[2022/11/22 19:26:36] ppocr INFO:         name : SimpleDataSet
[2022/11/22 19:26:36] ppocr INFO:         ratio_list : [1.0]
[2022/11/22 19:26:36] ppocr INFO:         transforms : 
[2022/11/22 19:26:36] ppocr INFO:             DecodeImage : 
[2022/11/22 19:26:36] ppocr INFO:                 channel_first : False
[2022/11/22 19:26:36] ppocr INFO:                 img_mode : BGR
[2022/11/22 19:26:36] ppocr INFO:             DetLabelEncode : None
[2022/11/22 19:26:36] ppocr INFO:             IaaAugment : 
[2022/11/22 19:26:36] ppocr INFO:                 augmenter_args : 
[2022/11/22 19:26:36] ppocr INFO:                     args : 
[2022/11/22 19:26:36] ppocr INFO:                         p : 0.5
[2022/11/22 19:26:36] ppocr INFO:                     type : Fliplr
[2022/11/22 19:26:36] ppocr INFO:                     args : 
[2022/11/22 19:26:36] ppocr INFO:                         rotate : [-10, 10]
[2022/11/22 19:26:36] ppocr INFO:                     type : Affine
[2022/11/22 19:26:36] ppocr INFO:                     args : 
[2022/11/22 19:26:36] ppocr INFO:                         size : [0.5, 3]
[2022/11/22 19:26:36] ppocr INFO:                     type : Resize
[2022/11/22 19:26:36] ppocr INFO:             EastRandomCropData : 
[2022/11/22 19:26:36] ppocr INFO:                 keep_ratio : True
[2022/11/22 19:26:36] ppocr INFO:                 max_tries : 50
[2022/11/22 19:26:36] ppocr INFO:                 size : [960, 960]
[2022/11/22 19:26:36] ppocr INFO:             MakeBorderMap : 
[2022/11/22 19:26:36] ppocr INFO:                 shrink_ratio : 0.4
[2022/11/22 19:26:36] ppocr INFO:                 thresh_max : 0.7
[2022/11/22 19:26:36] ppocr INFO:                 thresh_min : 0.3
[2022/11/22 19:26:36] ppocr INFO:             MakeShrinkMap : 
[2022/11/22 19:26:36] ppocr INFO:                 min_text_size : 8
[2022/11/22 19:26:36] ppocr INFO:                 shrink_ratio : 0.4
[2022/11/22 19:26:36] ppocr INFO:             NormalizeImage : 
[2022/11/22 19:26:36] ppocr INFO:                 mean : [0.485, 0.456, 0.406]
[2022/11/22 19:26:36] ppocr INFO:                 order : hwc
[2022/11/22 19:26:36] ppocr INFO:                 scale : 1./255.
[2022/11/22 19:26:36] ppocr INFO:                 std : [0.229, 0.224, 0.225]
[2022/11/22 19:26:36] ppocr INFO:             ToCHWImage : None
[2022/11/22 19:26:36] ppocr INFO:             KeepKeys : 
[2022/11/22 19:26:36] ppocr INFO:                 keep_keys : ['image', 'threshold_map', 'threshold_mask', 'shrink_map', 'shrink_mask']
[2022/11/22 19:26:36] ppocr INFO:     loader : 
[2022/11/22 19:26:36] ppocr INFO:         batch_size_per_card : 8
[2022/11/22 19:26:36] ppocr INFO:         drop_last : False
[2022/11/22 19:26:36] ppocr INFO:         num_workers : 4
[2022/11/22 19:26:36] ppocr INFO:         shuffle : True
[2022/11/22 19:26:36] ppocr INFO: profiler_options : None
[2022/11/22 19:26:36] ppocr INFO: train with paddle 2.3.2 and device Place(gpu:0)
W1122 19:26:36.014780 2248672 gpu_resources.cc:61] Please NOTE: device: 0, GPU Compute Capability: 8.0, Driver API Version: 11.8, Runtime API Version: 11.6
W1122 19:26:36.019917 2248672 gpu_resources.cc:91] device: 0, cuDNN Version: 8.4.
[2022/11/22 19:26:43] ppocr INFO: load pretrain successful from ./output/ch_PP-OCR_V3_det/best_accuracy
[2022/11/22 19:26:43] ppocr INFO: infer_img: ./dataset/images/1-122700001-OCR-LF-C01.jpg
[2022/11/22 19:26:50] ppocr INFO: The detected Image saved in ./output/det_infer_res/det_results/1-122700001-OCR-LF-C01.jpg
[2022/11/22 19:26:50] ppocr INFO: infer_img: ./dataset/images/1-122720001-OCR-AH-A01.jpg
[2022/11/22 19:26:50] ppocr INFO: The detected Image saved in ./output/det_infer_res/det_results/1-122720001-OCR-AH-A01.jpg
[2022/11/22 19:26:50] ppocr INFO: infer_img: ./dataset/images/1-122720001-OCR-AS-B01.jpg
[2022/11/22 19:26:51] ppocr INFO: The detected Image saved in ./output/det_infer_res/det_results/1-122720001-OCR-AS-B01.jpg
[2022/11/22 19:26:51] ppocr INFO: infer_img: ./dataset/images/1-122720001-OCR-LB-C02.jpg
[2022/11/22 19:26:51] ppocr INFO: The detected Image saved in ./output/det_infer_res/det_results/1-122720001-OCR-LB-C02.jpg
[2022/11/22 19:26:51] ppocr INFO: infer_img: ./dataset/images/1-122720001-OCR-RF-D01.jpg
[2022/11/22 19:26:51] ppocr INFO: The detected Image saved in ./output/det_infer_res/det_results/1-122720001-OCR-RF-D01.jpg
[2022/11/22 19:26:51] ppocr INFO: infer_img: ./dataset/images/1-122728001-OCR-AH-A01.jpg
[2022/11/22 19:26:51] ppocr INFO: The detected Image saved in ./output/det_infer_res/det_results/1-122728001-OCR-AH-A01.jpg
[2022/11/22 19:26:51] ppocr INFO: infer_img: ./dataset/images/1-122728001-OCR-RF-D01.jpg
[2022/11/22 19:26:51] ppocr INFO: The detected Image saved in ./output/det_infer_res/det_results/1-122728001-OCR-RF-D01.jpg
[2022/11/22 19:26:51] ppocr INFO: infer_img: ./dataset/images/1-122738001-OCR-AH-A01.jpg
[2022/11/22 19:26:51] ppocr INFO: The detected Image saved in ./output/det_infer_res/det_results/1-122738001-OCR-AH-A01.jpg
[2022/11/22 19:26:51] ppocr INFO: infer_img: ./dataset/images/1-122738001-OCR-AS-B01.jpg
[2022/11/22 19:26:52] ppocr INFO: The detected Image saved in ./output/det_infer_res/det_results/1-122738001-OCR-AS-B01.jpg
[2022/11/22 19:26:52] ppocr INFO: infer_img: ./dataset/images/1-122738001-OCR-LF-C01.jpg
[2022/11/22 19:26:52] ppocr INFO: The detected Image saved in ./output/det_infer_res/det_results/1-122738001-OCR-LF-C01.jpg
[2022/11/22 19:26:52] ppocr INFO: infer_img: ./dataset/images/1-122740001-OCR-AH-A01.jpg
[2022/11/22 19:26:52] ppocr INFO: The detected Image saved in ./output/det_infer_res/det_results/1-122740001-OCR-AH-A01.jpg
[2022/11/22 19:26:52] ppocr INFO: infer_img: ./dataset/images/1-122740001-OCR-LB-C02.jpg
[2022/11/22 19:26:52] ppocr INFO: The detected Image saved in ./output/det_infer_res/det_results/1-122740001-OCR-LB-C02.jpg
......
[2022/11/22 19:35:12] ppocr INFO: infer_img: ./dataset/images/1-155800001-OCR-AH-A01.jpg
[2022/11/22 19:35:12] ppocr INFO: The detected Image saved in ./output/det_infer_res/det_results/1-155800001-OCR-AH-A01.jpg
[2022/11/22 19:35:12] ppocr INFO: infer_img: ./dataset/images/1-155800001-OCR-AS-B01.jpg
[2022/11/22 19:35:12] ppocr INFO: The detected Image saved in ./output/det_infer_res/det_results/1-155800001-OCR-AS-B01.jpg
[2022/11/22 19:35:12] ppocr INFO: infer_img: ./dataset/images/1-155800001-OCR-LB-C02.jpg
[2022/11/22 19:35:12] ppocr INFO: The detected Image saved in ./output/det_infer_res/det_results/1-155800001-OCR-LB-C02.jpg
[2022/11/22 19:35:12] ppocr INFO: infer_img: ./dataset/images/1-155800001-OCR-LF-C01.jpg
[2022/11/22 19:35:12] ppocr INFO: The detected Image saved in ./output/det_infer_res/det_results/1-155800001-OCR-LF-C01.jpg
[2022/11/22 19:35:12] ppocr INFO: infer_img: ./dataset/images/1-155800001-OCR-RF-D01.jpg
[2022/11/22 19:35:12] ppocr INFO: The detected Image saved in ./output/det_infer_res/det_results/1-155800001-OCR-RF-D01.jpg
[2022/11/22 19:35:12] ppocr INFO: infer_img: ./dataset/images/1.png
[2022/11/22 19:35:12] ppocr INFO: success!

6.2 识别模型推理

在notebook中运行如下命令使用微调过的模型检测测试图片中的文字,其中:

  • Global.infer_img 为图片路径或图片文件夹路径,
  • Global.pretrained_model 为微调过的模型,
  • Global.save_res_path 为推理结果保存路径
%run PaddleOCR/tools/infer_rec.py \
    -c PaddleOCR/configs/rec/PP-OCRv3/ch_PP-OCRv3_rec_distillation.yml \
    -o Global.infer_img="./RecEvalData/" Global.pretrained_model="./output/rec_ppocr_v3_distillation/best_accuracy" Global.save_res_path="./output/rec_infer_res/predicts.txt"
[2022/11/22 19:35:16] ppocr INFO: Architecture : 
[2022/11/22 19:35:16] ppocr INFO:     Models : 
[2022/11/22 19:35:16] ppocr INFO:         Student : 
[2022/11/22 19:35:16] ppocr INFO:             Backbone : 
[2022/11/22 19:35:16] ppocr INFO:                 last_conv_stride : [1, 2]
[2022/11/22 19:35:16] ppocr INFO:                 last_pool_type : avg
[2022/11/22 19:35:16] ppocr INFO:                 name : MobileNetV1Enhance
[2022/11/22 19:35:16] ppocr INFO:                 scale : 0.5
[2022/11/22 19:35:16] ppocr INFO:             Head : 
[2022/11/22 19:35:16] ppocr INFO:                 head_list : 
[2022/11/22 19:35:16] ppocr INFO:                     CTCHead : 
[2022/11/22 19:35:16] ppocr INFO:                         Head : 
[2022/11/22 19:35:16] ppocr INFO:                             fc_decay : 1e-05
[2022/11/22 19:35:16] ppocr INFO:                         Neck : 
[2022/11/22 19:35:16] ppocr INFO:                             depth : 2
[2022/11/22 19:35:16] ppocr INFO:                             dims : 64
[2022/11/22 19:35:16] ppocr INFO:                             hidden_dims : 120
[2022/11/22 19:35:16] ppocr INFO:                             name : svtr
[2022/11/22 19:35:16] ppocr INFO:                             use_guide : True
[2022/11/22 19:35:16] ppocr INFO:                     SARHead : 
[2022/11/22 19:35:16] ppocr INFO:                         enc_dim : 512
[2022/11/22 19:35:16] ppocr INFO:                         max_text_length : 25
[2022/11/22 19:35:16] ppocr INFO:                 name : MultiHead
[2022/11/22 19:35:16] ppocr INFO:             Transform : None
[2022/11/22 19:35:16] ppocr INFO:             algorithm : SVTR
[2022/11/22 19:35:16] ppocr INFO:             freeze_params : False
[2022/11/22 19:35:16] ppocr INFO:             model_type : rec
[2022/11/22 19:35:16] ppocr INFO:             pretrained : None
[2022/11/22 19:35:16] ppocr INFO:             return_all_feats : True
[2022/11/22 19:35:16] ppocr INFO:         Teacher : 
[2022/11/22 19:35:16] ppocr INFO:             Backbone : 
[2022/11/22 19:35:16] ppocr INFO:                 last_conv_stride : [1, 2]
[2022/11/22 19:35:16] ppocr INFO:                 last_pool_type : avg
[2022/11/22 19:35:16] ppocr INFO:                 name : MobileNetV1Enhance
[2022/11/22 19:35:16] ppocr INFO:                 scale : 0.5
[2022/11/22 19:35:16] ppocr INFO:             Head : 
[2022/11/22 19:35:16] ppocr INFO:                 head_list : 
[2022/11/22 19:35:16] ppocr INFO:                     CTCHead : 
[2022/11/22 19:35:16] ppocr INFO:                         Head : 
[2022/11/22 19:35:16] ppocr INFO:                             fc_decay : 1e-05
[2022/11/22 19:35:16] ppocr INFO:                         Neck : 
[2022/11/22 19:35:16] ppocr INFO:                             depth : 2
[2022/11/22 19:35:16] ppocr INFO:                             dims : 64
[2022/11/22 19:35:16] ppocr INFO:                             hidden_dims : 120
[2022/11/22 19:35:16] ppocr INFO:                             name : svtr
[2022/11/22 19:35:16] ppocr INFO:                             use_guide : True
[2022/11/22 19:35:16] ppocr INFO:                     SARHead : 
[2022/11/22 19:35:16] ppocr INFO:                         enc_dim : 512
[2022/11/22 19:35:16] ppocr INFO:                         max_text_length : 25
[2022/11/22 19:35:16] ppocr INFO:                 name : MultiHead
[2022/11/22 19:35:16] ppocr INFO:             Transform : None
[2022/11/22 19:35:16] ppocr INFO:             algorithm : SVTR
[2022/11/22 19:35:16] ppocr INFO:             freeze_params : False
[2022/11/22 19:35:16] ppocr INFO:             model_type : rec
[2022/11/22 19:35:16] ppocr INFO:             pretrained : None
[2022/11/22 19:35:16] ppocr INFO:             return_all_feats : True
[2022/11/22 19:35:16] ppocr INFO:     algorithm : Distillation
[2022/11/22 19:35:16] ppocr INFO:     model_type : rec
[2022/11/22 19:35:16] ppocr INFO:     name : DistillationModel
[2022/11/22 19:35:16] ppocr INFO: Eval : 
[2022/11/22 19:35:16] ppocr INFO:     dataset : 
[2022/11/22 19:35:16] ppocr INFO:         data_dir : ./RecEvalData/
[2022/11/22 19:35:16] ppocr INFO:         label_file_list : ['./rec_eval_label.txt']
[2022/11/22 19:35:16] ppocr INFO:         name : SimpleDataSet
[2022/11/22 19:35:16] ppocr INFO:         transforms : 
[2022/11/22 19:35:16] ppocr INFO:             DecodeImage : 
[2022/11/22 19:35:16] ppocr INFO:                 channel_first : False
[2022/11/22 19:35:16] ppocr INFO:                 img_mode : BGR
[2022/11/22 19:35:16] ppocr INFO:             MultiLabelEncode : None
[2022/11/22 19:35:16] ppocr INFO:             RecResizeImg : 
[2022/11/22 19:35:16] ppocr INFO:                 image_shape : [3, 48, 320]
[2022/11/22 19:35:16] ppocr INFO:             KeepKeys : 
[2022/11/22 19:35:16] ppocr INFO:                 keep_keys : ['image', 'label_ctc', 'label_sar', 'length', 'valid_ratio']
[2022/11/22 19:35:16] ppocr INFO:     loader : 
[2022/11/22 19:35:16] ppocr INFO:         batch_size_per_card : 128
[2022/11/22 19:35:16] ppocr INFO:         drop_last : False
[2022/11/22 19:35:16] ppocr INFO:         num_workers : 4
[2022/11/22 19:35:16] ppocr INFO:         shuffle : False
[2022/11/22 19:35:16] ppocr INFO: Global : 
[2022/11/22 19:35:16] ppocr INFO:     cal_metric_during_train : True
[2022/11/22 19:35:16] ppocr INFO:     character_dict_path : ./PaddleOCR/ppocr/utils/ppocr_keys_v1.txt
[2022/11/22 19:35:16] ppocr INFO:     checkpoints : None
[2022/11/22 19:35:16] ppocr INFO:     debug : False
[2022/11/22 19:35:16] ppocr INFO:     distributed : False
[2022/11/22 19:35:16] ppocr INFO:     epoch_num : 50
[2022/11/22 19:35:16] ppocr INFO:     eval_batch_step : [0, 100]
[2022/11/22 19:35:16] ppocr INFO:     infer_img : ./RecEvalData/
[2022/11/22 19:35:16] ppocr INFO:     infer_mode : False
[2022/11/22 19:35:16] ppocr INFO:     log_smooth_window : 20
[2022/11/22 19:35:16] ppocr INFO:     max_text_length : 25
[2022/11/22 19:35:16] ppocr INFO:     pretrained_model : ./output/rec_ppocr_v3_distillation/best_accuracy
[2022/11/22 19:35:16] ppocr INFO:     print_batch_step : 10
[2022/11/22 19:35:16] ppocr INFO:     save_epoch_step : 100
[2022/11/22 19:35:16] ppocr INFO:     save_inference_dir : None
[2022/11/22 19:35:16] ppocr INFO:     save_model_dir : ./output/rec_ppocr_v3_distillation
[2022/11/22 19:35:16] ppocr INFO:     save_res_path : ./output/rec_infer_res/predicts.txt
[2022/11/22 19:35:16] ppocr INFO:     use_gpu : True
[2022/11/22 19:35:16] ppocr INFO:     use_space_char : True
[2022/11/22 19:35:16] ppocr INFO:     use_visualdl : False
[2022/11/22 19:35:16] ppocr INFO: Loss : 
[2022/11/22 19:35:16] ppocr INFO:     loss_config_list : 
[2022/11/22 19:35:16] ppocr INFO:         DistillationDMLLoss : 
[2022/11/22 19:35:16] ppocr INFO:             act : softmax
[2022/11/22 19:35:16] ppocr INFO:             dis_head : ctc
[2022/11/22 19:35:16] ppocr INFO:             key : head_out
[2022/11/22 19:35:16] ppocr INFO:             model_name_pairs : [['Student', 'Teacher']]
[2022/11/22 19:35:16] ppocr INFO:             multi_head : True
[2022/11/22 19:35:16] ppocr INFO:             name : dml_ctc
[2022/11/22 19:35:16] ppocr INFO:             use_log : True
[2022/11/22 19:35:16] ppocr INFO:             weight : 1.0
[2022/11/22 19:35:16] ppocr INFO:         DistillationDMLLoss : 
[2022/11/22 19:35:16] ppocr INFO:             act : softmax
[2022/11/22 19:35:16] ppocr INFO:             dis_head : sar
[2022/11/22 19:35:16] ppocr INFO:             key : head_out
[2022/11/22 19:35:16] ppocr INFO:             model_name_pairs : [['Student', 'Teacher']]
[2022/11/22 19:35:16] ppocr INFO:             multi_head : True
[2022/11/22 19:35:16] ppocr INFO:             name : dml_sar
[2022/11/22 19:35:16] ppocr INFO:             use_log : True
[2022/11/22 19:35:16] ppocr INFO:             weight : 0.5
[2022/11/22 19:35:16] ppocr INFO:         DistillationDistanceLoss : 
[2022/11/22 19:35:16] ppocr INFO:             key : backbone_out
[2022/11/22 19:35:16] ppocr INFO:             mode : l2
[2022/11/22 19:35:16] ppocr INFO:             model_name_pairs : [['Student', 'Teacher']]
[2022/11/22 19:35:16] ppocr INFO:             weight : 1.0
[2022/11/22 19:35:16] ppocr INFO:         DistillationCTCLoss : 
[2022/11/22 19:35:16] ppocr INFO:             key : head_out
[2022/11/22 19:35:16] ppocr INFO:             model_name_list : ['Student', 'Teacher']
[2022/11/22 19:35:16] ppocr INFO:             multi_head : True
[2022/11/22 19:35:16] ppocr INFO:             weight : 1.0
[2022/11/22 19:35:16] ppocr INFO:         DistillationSARLoss : 
[2022/11/22 19:35:16] ppocr INFO:             key : head_out
[2022/11/22 19:35:16] ppocr INFO:             model_name_list : ['Student', 'Teacher']
[2022/11/22 19:35:16] ppocr INFO:             multi_head : True
[2022/11/22 19:35:16] ppocr INFO:             weight : 1.0
[2022/11/22 19:35:16] ppocr INFO:     name : CombinedLoss
[2022/11/22 19:35:16] ppocr INFO: Metric : 
[2022/11/22 19:35:16] ppocr INFO:     base_metric_name : RecMetric
[2022/11/22 19:35:16] ppocr INFO:     ignore_space : False
[2022/11/22 19:35:16] ppocr INFO:     key : Student
[2022/11/22 19:35:16] ppocr INFO:     main_indicator : acc
[2022/11/22 19:35:16] ppocr INFO:     name : DistillationMetric
[2022/11/22 19:35:16] ppocr INFO: Optimizer : 
[2022/11/22 19:35:16] ppocr INFO:     beta1 : 0.9
[2022/11/22 19:35:16] ppocr INFO:     beta2 : 0.999
[2022/11/22 19:35:16] ppocr INFO:     lr : 
[2022/11/22 19:35:16] ppocr INFO:         decay_epochs : [700, 800]
[2022/11/22 19:35:16] ppocr INFO:         name : Piecewise
[2022/11/22 19:35:16] ppocr INFO:         values : [0.0005, 5e-05]
[2022/11/22 19:35:16] ppocr INFO:         warmup_epoch : 5
[2022/11/22 19:35:16] ppocr INFO:     name : Adam
[2022/11/22 19:35:16] ppocr INFO:     regularizer : 
[2022/11/22 19:35:16] ppocr INFO:         factor : 3e-05
[2022/11/22 19:35:16] ppocr INFO:         name : L2
[2022/11/22 19:35:16] ppocr INFO: PostProcess : 
[2022/11/22 19:35:16] ppocr INFO:     key : head_out
[2022/11/22 19:35:16] ppocr INFO:     model_name : ['Student', 'Teacher']
[2022/11/22 19:35:16] ppocr INFO:     multi_head : True
[2022/11/22 19:35:16] ppocr INFO:     name : DistillationCTCLabelDecode
[2022/11/22 19:35:16] ppocr INFO: Train : 
[2022/11/22 19:35:16] ppocr INFO:     dataset : 
[2022/11/22 19:35:16] ppocr INFO:         data_dir : ./RecTrainData/
[2022/11/22 19:35:16] ppocr INFO:         ext_op_transform_idx : 1
[2022/11/22 19:35:16] ppocr INFO:         label_file_list : ['./rec_train_label.txt']
[2022/11/22 19:35:16] ppocr INFO:         name : SimpleDataSet
[2022/11/22 19:35:16] ppocr INFO:         transforms : 
[2022/11/22 19:35:16] ppocr INFO:             DecodeImage : 
[2022/11/22 19:35:16] ppocr INFO:                 channel_first : False
[2022/11/22 19:35:16] ppocr INFO:                 img_mode : BGR
[2022/11/22 19:35:16] ppocr INFO:             RecConAug : 
[2022/11/22 19:35:16] ppocr INFO:                 ext_data_num : 2
[2022/11/22 19:35:16] ppocr INFO:                 image_shape : [48, 320, 3]
[2022/11/22 19:35:16] ppocr INFO:                 max_text_length : 25
[2022/11/22 19:35:16] ppocr INFO:                 prob : 0.5
[2022/11/22 19:35:16] ppocr INFO:             RecAug : None
[2022/11/22 19:35:16] ppocr INFO:             MultiLabelEncode : None
[2022/11/22 19:35:16] ppocr INFO:             RecResizeImg : 
[2022/11/22 19:35:16] ppocr INFO:                 image_shape : [3, 48, 320]
[2022/11/22 19:35:16] ppocr INFO:             KeepKeys : 
[2022/11/22 19:35:16] ppocr INFO:                 keep_keys : ['image', 'label_ctc', 'label_sar', 'length', 'valid_ratio']
[2022/11/22 19:35:16] ppocr INFO:     loader : 
[2022/11/22 19:35:16] ppocr INFO:         batch_size_per_card : 128
[2022/11/22 19:35:16] ppocr INFO:         drop_last : True
[2022/11/22 19:35:16] ppocr INFO:         num_workers : 4
[2022/11/22 19:35:16] ppocr INFO:         shuffle : True
[2022/11/22 19:35:16] ppocr INFO: profiler_options : None
[2022/11/22 19:35:16] ppocr INFO: train with paddle 2.3.2 and device Place(gpu:0)
[2022/11/22 19:35:21] ppocr INFO: load pretrain successful from ./output/rec_ppocr_v3_distillation/best_accuracy
[2022/11/22 19:35:21] ppocr INFO: infer_img: ./RecEvalData/0_1-122720001-OCR-AS-B01.jpg
[2022/11/22 19:35:27] ppocr INFO:      result: {"Student": {"label": "EITU1786393", "score": 0.9824612140655518}, "Teacher": {"label": "EITU1786393", "score": 0.9511095285415649}}
[2022/11/22 19:35:27] ppocr INFO: infer_img: ./RecEvalData/0_1-122720001-OCR-LB-C02.jpg
[2022/11/22 19:35:27] ppocr INFO:      result: {"Student": {"label": "EITU1786393", "score": 0.9859008193016052}, "Teacher": {"label": "EITU1786393", "score": 0.9860246777534485}}
[2022/11/22 19:35:27] ppocr INFO: infer_img: ./RecEvalData/0_1-122720001-OCR-RF-D01.jpg
[2022/11/22 19:35:27] ppocr INFO:      result: {"Student": {"label": "EITU1786393", "score": 0.9981781244277954}, "Teacher": {"label": "EITU1786393", "score": 0.9979627728462219}}
[2022/11/22 19:35:27] ppocr INFO: infer_img: ./RecEvalData/0_1-122728001-OCR-RF-D01.jpg
[2022/11/22 19:35:27] ppocr INFO:      result: {"Student": {"label": "DFSU4119250", "score": 0.9132381677627563}, "Teacher": {"label": "DFSU4119250", "score": 0.9551454782485962}}
......
[2022/11/22 19:36:37] ppocr INFO: infer_img: ./RecEvalData/2_1-154748001-OCR-AS-B01.jpg
[2022/11/22 19:36:37] ppocr INFO:      result: {"Student": {"label": "22G1", "score": 0.9983863234519958}, "Teacher": {"label": "22G1", "score": 0.9987906813621521}}
[2022/11/22 19:36:37] ppocr INFO: infer_img: ./RecEvalData/2_1-155210001-OCR-RF-D01.jpg
[2022/11/22 19:36:37] ppocr INFO:      result: {"Student": {"label": "22G1", "score": 0.9938058257102966}, "Teacher": {"label": "22G1", "score": 0.9972647428512573}}
[2022/11/22 19:36:37] ppocr INFO: infer_img: ./RecEvalData/2_1-155227001-OCR-RF-D01.jpg
[2022/11/22 19:36:37] ppocr INFO:      result: {"Student": {"label": "L5G1", "score": 0.9624487161636353}, "Teacher": {"label": "L5G1", "score": 0.946576714515686}}
[2022/11/22 19:36:37] ppocr INFO: infer_img: ./RecEvalData/2_1-155800001-OCR-LB-C02.jpg
[2022/11/22 19:36:37] ppocr INFO:      result: {"Student": {"label": "22G1", "score": 0.9714831113815308}, "Teacher": {"label": "22G1", "score": 0.9647958874702454}}
[2022/11/22 19:36:37] ppocr INFO: infer_img: ./RecEvalData/3_1-124902001-OCR-RF-D01.jpg
[2022/11/22 19:36:37] ppocr INFO:      result: {"Student": {"label": "1", "score": 0.9570513963699341}, "Teacher": {"label": "1", "score": 0.9826142191886902}}
[2022/11/22 19:36:37] ppocr INFO: infer_img: ./RecEvalData/3_1-133046001-OCR-AH-A01.jpg
[2022/11/22 19:36:37] ppocr INFO:      result: {"Student": {"label": "U", "score": 0.6551735997200012}, "Teacher": {"label": "U", "score": 0.6450684666633606}}
[2022/11/22 19:36:37] ppocr INFO: infer_img: ./RecEvalData/4_1-133046001-OCR-AH-A01.jpg
[2022/11/22 19:36:37] ppocr INFO:      result: {"Student": {"label": "2", "score": 0.9352047443389893}, "Teacher": {"label": "2", "score": 0.8710258603096008}}
[2022/11/22 19:36:37] ppocr INFO: infer_img: ./RecEvalData/5_1-133046001-OCR-AH-A01.jpg
[2022/11/22 19:36:37] ppocr INFO:      result: {"Student": {"label": "9", "score": 0.9969731569290161}, "Teacher": {"label": "9", "score": 0.9953775405883789}}
[2022/11/22 19:36:37] ppocr INFO: infer_img: ./RecEvalData/6_1-133046001-OCR-AH-A01.jpg
[2022/11/22 19:36:37] ppocr INFO:      result: {"Student": {"label": "0", "score": 0.9937646389007568}, "Teacher": {"label": "0", "score": 0.9952432513237}}
[2022/11/22 19:36:37] ppocr INFO: infer_img: ./RecEvalData/7_1-133046001-OCR-AH-A01.jpg
[2022/11/22 19:36:37] ppocr INFO:      result: {"Student": {"label": "1", "score": 0.6171978116035461}, "Teacher": {"label": "1", "score": 0.9941883087158203}}
[2022/11/22 19:36:37] ppocr INFO: infer_img: ./RecEvalData/8_1-133046001-OCR-AH-A01.jpg
[2022/11/22 19:36:37] ppocr INFO:      result: {"Student": {"label": "0", "score": 0.6773194074630737}, "Teacher": {"label": "0", "score": 0.7497578263282776}}
[2022/11/22 19:36:37] ppocr INFO: infer_img: ./RecEvalData/9_1-133046001-OCR-AH-A01.jpg
[2022/11/22 19:36:37] ppocr INFO:      result: {"Student": {"label": "2", "score": 0.9989407658576965}, "Teacher": {"label": "2", "score": 0.9984425902366638}}
[2022/11/22 19:36:37] ppocr INFO: success!

6.3 检测识别模型串联推理

6.3.1 模型转换

在串联推理前首先需要将训练保存的模型转换成推理模型,分别执行如下检测命令即可。其中,

  • -c传入要转换模型的配置文件路径,
  • -o Global.pretrained_model为要被转换的模型文件,
  • Global.save_inference_dir为转换得到推理模型的储存路径
# 检测模型转换
%run PaddleOCR/tools/export_model.py \
-c PaddleOCR/configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_student.yml \
-o Global.pretrained_model="./output/ch_PP-OCR_V3_det/best_accuracy" Global.save_inference_dir="./output/det_inference/"
[2022/11/22 19:53:25] ppocr INFO: load pretrain successful from ./output/ch_PP-OCR_V3_det/best_accuracy
[2022/11/22 19:53:28] ppocr INFO: inference model is saved to ./output/det_inference/inference
# 识别模型转换
%run PaddleOCR/tools/export_model.py \
-c PaddleOCR/configs/rec/PP-OCRv3/ch_PP-OCRv3_rec_distillation.yml \
-o Global.pretrained_model="./output/rec_ppocr_v3_distillation/best_accuracy" Global.save_inference_dir="./output/rec_inference/"
[2022/11/22 19:53:49] ppocr INFO: load pretrain successful from ./output/rec_ppocr_v3_distillation/best_accuracy
[2022/11/22 19:53:50] ppocr INFO: inference model is saved to ./output/rec_inference/Teacher/inference
[2022/11/22 19:53:51] ppocr INFO: inference model is saved to ./output/rec_inference/Student/inference

6.3.2 模型串联推理

转换完毕后,PaddleOCR提供了检测和识别模型的串联工具,可以将训练好的任一检测模型和任一识别模型串联成两阶段的文本识别系统。

输入图像经过文本检测、检测框矫正、文本识别、得分过滤四个主要阶段输出文本位置和识别结果。

执行代码如下,其中

image_dir为单张图像或者图像集合的路径,

det_model_dir为检测inference模型的路径,

rec_model_dir为识别inference模型的路径。

可视化识别结果默认保存到 ./inference_results 文件夹里面。

%run PaddleOCR/tools/infer/predict_system.py \
--image_dir="dataset/OCRTest" \
--det_model_dir="./output/det_inference/" \
--rec_model_dir="./output/rec_inference/Student/"
[2022/11/22 20:14:49] ppocr INFO: In PP-OCRv3, rec_image_shape parameter defaults to '3, 48, 320', if you are using recognition model with PP-OCRv2 or an older version, please set --rec_image_shape='3,32,320
[2022/11/22 20:14:52] ppocr DEBUG: dt_boxes num : 2, elapse : 2.663893222808838
[2022/11/22 20:14:52] ppocr DEBUG: rec_res num  : 2, elapse : 0.01429438591003418
[2022/11/22 20:14:52] ppocr DEBUG: 0  Predict time of dataset/OCRTest/1-122700001-OCR-RF-D01.jpg: 2.687s
[2022/11/22 20:14:52] ppocr DEBUG: 22G1, 0.814
[2022/11/22 20:14:52] ppocr DEBUG: TEMU3108252, 0.907
[2022/11/22 20:14:52] ppocr DEBUG: The visualized image saved in ./inference_results/1-122700001-OCR-RF-D01.jpg
[2022/11/22 20:14:52] ppocr DEBUG: dt_boxes num : 2, elapse : 0.03417468070983887
[2022/11/22 20:14:52] ppocr DEBUG: rec_res num  : 2, elapse : 0.010810136795043945
[2022/11/22 20:14:52] ppocr DEBUG: 1  Predict time of dataset/OCRTest/1-122728001-OCR-LB-C02.jpg: 0.054s
[2022/11/22 20:14:52] ppocr DEBUG: DFSU4119250, 0.953
[2022/11/22 20:14:52] ppocr DEBUG: 42G1, 0.971
[2022/11/22 20:14:52] ppocr DEBUG: The visualized image saved in ./inference_results/1-122728001-OCR-LB-C02.jpg
[2022/11/22 20:14:52] ppocr DEBUG: dt_boxes num : 2, elapse : 0.04606175422668457
[2022/11/22 20:14:52] ppocr DEBUG: rec_res num  : 2, elapse : 0.01572728157043457
[2022/11/22 20:14:52] ppocr DEBUG: 2  Predict time of dataset/OCRTest/1-122738001-OCR-RF-D01.jpg: 0.067s
[2022/11/22 20:14:52] ppocr DEBUG: IRSU2657681, 0.961
[2022/11/22 20:14:52] ppocr DEBUG: 22G1, 0.990
[2022/11/22 20:14:53] ppocr DEBUG: The visualized image saved in ./inference_results/1-122738001-OCR-RF-D01.jpg
[2022/11/22 20:14:53] ppocr DEBUG: dt_boxes num : 2, elapse : 0.032082319259643555
[2022/11/22 20:14:53] ppocr DEBUG: rec_res num  : 2, elapse : 0.009961605072021484
[2022/11/22 20:14:53] ppocr DEBUG: 3  Predict time of dataset/OCRTest/1-122740001-OCR-AS-B01.jpg: 0.048s
[2022/11/22 20:14:53] ppocr DEBUG: MRKU4306585, 0.977
[2022/11/22 20:14:53] ppocr DEBUG: 45G1, 0.952
[2022/11/22 20:14:53] ppocr DEBUG: The visualized image saved in ./inference_results/1-122740001-OCR-AS-B01.jpg
[2022/11/22 20:14:53] ppocr DEBUG: dt_boxes num : 2, elapse : 0.03204154968261719
[2022/11/22 20:14:53] ppocr DEBUG: rec_res num  : 2, elapse : 0.01087331771850586
[2022/11/22 20:14:53] ppocr DEBUG: 4  Predict time of dataset/OCRTest/1-122749001-OCR-AS-B01.jpg: 0.048s
[2022/11/22 20:14:53] ppocr DEBUG: FCGU4996010, 0.926
[2022/11/22 20:14:53] ppocr DEBUG: 42G1, 0.861
[2022/11/22 20:14:53] ppocr DEBUG: The visualized image saved in ./inference_results/1-122749001-OCR-AS-B01.jpg
[2022/11/22 20:14:53] ppocr DEBUG: dt_boxes num : 1, elapse : 0.03950977325439453
[2022/11/22 20:14:53] ppocr DEBUG: rec_res num  : 1, elapse : 0.010022163391113281
[2022/11/22 20:14:53] ppocr DEBUG: 5  Predict time of dataset/OCRTest/1-122749001-OCR-RF-D01.jpg: 0.056s
[2022/11/22 20:14:53] ppocr DEBUG: FCGU4996010, 0.936
[2022/11/22 20:14:53] ppocr DEBUG: The visualized image saved in ./inference_results/1-122749001-OCR-RF-D01.jpg
[2022/11/22 20:14:53] ppocr DEBUG: dt_boxes num : 2, elapse : 0.03342747688293457
[2022/11/22 20:14:53] ppocr DEBUG: rec_res num  : 2, elapse : 0.012051820755004883
[2022/11/22 20:14:53] ppocr DEBUG: 6  Predict time of dataset/OCRTest/1-122830001-OCR-LB-C02.jpg: 0.055s
[2022/11/22 20:14:53] ppocr DEBUG: 42G1, 0.994
[2022/11/22 20:14:53] ppocr DEBUG: MEDU4024195, 0.911
[2022/11/22 20:14:54] ppocr DEBUG: The visualized image saved in ./inference_results/1-122830001-OCR-LB-C02.jpg
[2022/11/22 20:14:54] ppocr DEBUG: dt_boxes num : 2, elapse : 0.032006263732910156
[2022/11/22 20:14:54] ppocr DEBUG: rec_res num  : 2, elapse : 0.010721683502197266
[2022/11/22 20:14:54] ppocr DEBUG: 7  Predict time of dataset/OCRTest/1-122856001-OCR-LB-C02.jpg: 0.049s
[2022/11/22 20:14:54] ppocr DEBUG: FCIU5949601, 0.951
[2022/11/22 20:14:54] ppocr DEBUG: 22G1, 0.915
[2022/11/22 20:14:54] ppocr DEBUG: The visualized image saved in ./inference_results/1-122856001-OCR-LB-C02.jpg
[2022/11/22 20:14:54] ppocr DEBUG: dt_boxes num : 2, elapse : 0.04093623161315918
[2022/11/22 20:14:54] ppocr DEBUG: rec_res num  : 2, elapse : 0.01970195770263672
......
[2022/11/22 20:17:22] ppocr DEBUG: The visualized image saved in ./inference_results/1-155405001-OCR-LB-C02.jpg
[2022/11/22 20:17:22] ppocr DEBUG: dt_boxes num : 1, elapse : 0.030061006546020508
[2022/11/22 20:17:22] ppocr DEBUG: rec_res num  : 1, elapse : 0.008541584014892578
[2022/11/22 20:17:22] ppocr DEBUG: 687  Predict time of dataset/OCRTest/1-155428001-OCR-AH-A01.jpg: 0.044s
[2022/11/22 20:17:22] ppocr DEBUG: TRHU259949, 0.951
[2022/11/22 20:17:22] ppocr DEBUG: The visualized image saved in ./inference_results/1-155428001-OCR-AH-A01.jpg
[2022/11/22 20:17:22] ppocr DEBUG: dt_boxes num : 2, elapse : 0.030470848083496094
[2022/11/22 20:17:22] ppocr DEBUG: rec_res num  : 2, elapse : 0.010222911834716797
[2022/11/22 20:17:22] ppocr DEBUG: 688  Predict time of dataset/OCRTest/1-155430001-OCR-LB-C02.jpg: 0.046s
[2022/11/22 20:17:22] ppocr DEBUG: TRLU8274540, 0.946
[2022/11/22 20:17:22] ppocr DEBUG: 15G1, 0.835
[2022/11/22 20:17:22] ppocr DEBUG: The visualized image saved in ./inference_results/1-155430001-OCR-LB-C02.jpg
[2022/11/22 20:17:22] ppocr DEBUG: dt_boxes num : 1, elapse : 0.02996039390563965
[2022/11/22 20:17:22] ppocr DEBUG: rec_res num  : 1, elapse : 0.008486032485961914
[2022/11/22 20:17:22] ppocr DEBUG: 689  Predict time of dataset/OCRTest/1-155431001-OCR-RF-D01.jpg: 0.043s
[2022/11/22 20:17:22] ppocr DEBUG: TEMU0723767, 0.970
[2022/11/22 20:17:22] ppocr DEBUG: The visualized image saved in ./inference_results/1-155431001-OCR-RF-D01.jpg
[2022/11/22 20:17:22] ppocr DEBUG: dt_boxes num : 1, elapse : 0.02987504005432129
[2022/11/22 20:17:22] ppocr DEBUG: rec_res num  : 1, elapse : 0.008971691131591797
[2022/11/22 20:17:22] ppocr DEBUG: 690  Predict time of dataset/OCRTest/1-155528001-OCR-AH-A01.jpg: 0.044s
[2022/11/22 20:17:22] ppocr DEBUG: UESU5131167, 0.992
[2022/11/22 20:17:22] ppocr DEBUG: The visualized image saved in ./inference_results/1-155528001-OCR-AH-A01.jpg
[2022/11/22 20:17:22] ppocr DEBUG: dt_boxes num : 1, elapse : 0.03020930290222168
[2022/11/22 20:17:22] ppocr DEBUG: rec_res num  : 1, elapse : 0.00933837890625
[2022/11/22 20:17:22] ppocr DEBUG: 691  Predict time of dataset/OCRTest/1-155554001-OCR-RF-D01.jpg: 0.043s
[2022/11/22 20:17:22] ppocr DEBUG: LGU6263148, 0.950
[2022/11/22 20:17:23] ppocr DEBUG: The visualized image saved in ./inference_results/1-155554001-OCR-RF-D01.jpg
[2022/11/22 20:17:23] ppocr DEBUG: dt_boxes num : 1, elapse : 0.029514074325561523
[2022/11/22 20:17:23] ppocr DEBUG: rec_res num  : 1, elapse : 0.00953054428100586
[2022/11/22 20:17:23] ppocr DEBUG: 692  Predict time of dataset/OCRTest/1-155612001-OCR-AH-A01.jpg: 0.045s
[2022/11/22 20:17:23] ppocr DEBUG: MSKU0539301, 0.973
[2022/11/22 20:17:23] ppocr DEBUG: The visualized image saved in ./inference_results/1-155612001-OCR-AH-A01.jpg
[2022/11/22 20:17:23] ppocr DEBUG: dt_boxes num : 2, elapse : 0.04407095909118652
[2022/11/22 20:17:23] ppocr DEBUG: rec_res num  : 2, elapse : 0.016852855682373047
[2022/11/22 20:17:23] ppocr DEBUG: 693  Predict time of dataset/OCRTest/1-155745001-OCR-RF-D01.jpg: 0.067s
[2022/11/22 20:17:23] ppocr DEBUG: FCIU5288353, 0.968
[2022/11/22 20:17:23] ppocr DEBUG: 22G1, 0.967
[2022/11/22 20:17:23] ppocr DEBUG: The visualized image saved in ./inference_results/1-155745001-OCR-RF-D01.jpg
[2022/11/22 20:17:23] ppocr DEBUG: dt_boxes num : 2, elapse : 0.04364299774169922
[2022/11/22 20:17:23] ppocr DEBUG: rec_res num  : 2, elapse : 0.011179685592651367
[2022/11/22 20:17:23] ppocr DEBUG: 694  Predict time of dataset/OCRTest/1-155758001-OCR-LB-C02.jpg: 0.061s
[2022/11/22 20:17:23] ppocr DEBUG: CMAU4966960, 0.902
[2022/11/22 20:17:23] ppocr DEBUG: 45G1, 0.872
[2022/11/22 20:17:23] ppocr DEBUG: The visualized image saved in ./inference_results/1-155758001-OCR-LB-C02.jpg
[2022/11/22 20:17:24] ppocr DEBUG: dt_boxes num : 1, elapse : 0.19166111946105957
[2022/11/22 20:17:24] ppocr DEBUG: rec_res num  : 1, elapse : 0.011732816696166992
[2022/11/22 20:17:24] ppocr DEBUG: 695  Predict time of dataset/OCRTest/1-155801001-OCR-AH-A01.jpg: 0.210s
[2022/11/22 20:17:24] ppocr DEBUG: CAIU9857547, 0.996
[2022/11/22 20:17:24] ppocr DEBUG: The visualized image saved in ./inference_results/1-155801001-OCR-AH-A01.jpg
[2022/11/22 20:17:24] ppocr INFO: The predict total time is 154.57923460006714

七、可视化结果展示

Image

左侧为待识别集装箱,右侧为箱号识别结果

PaddleOCR

PaddleOCR 是一种基于百度飞桨的 OCR 工具库,包含总模型仅 8.6M 的超轻量级中文 OCR,同时支持多种文本检测、文本识别的训练算法、服务部署和端侧部署。

更多详情请访问:https://github.com/PaddlePaddle/PaddleOCR

相关文章
|
C++ 计算机视觉 Python
Python Yolov5路面裂缝识别检测识别
Python Yolov5路面裂缝识别检测识别
203 0
|
2月前
|
机器学习/深度学习 人工智能 算法
基于YOLOV8的口罩佩戴实时检测系统【训练和系统源码+Pyside6+数据集+包运行】
本文介绍了基于YOLOv8算法的口罩佩戴实时检测系统,该系统通过7959张训练图片训练出有效识别模型,开发了带GUI界面的系统,支持图片、视频和摄像头实时检测口罩佩戴情况,提高疫情防控效率。
74 3
基于YOLOV8的口罩佩戴实时检测系统【训练和系统源码+Pyside6+数据集+包运行】
|
2月前
|
人工智能 算法 安全
基于YOLOv8的交通车辆实时检测系统【训练和系统源码+Pyside6+数据集+包运行】
基于YOLOv8的交通车辆实时检测系统,使用5830张图片训练出有效模型,开发了Python和Pyside6的GUI界面系统,支持图片、视频和摄像头实时检测,具备模型权重导入、检测置信度调节等功能,旨在提升道路安全和改善交通管理。
54 1
基于YOLOv8的交通车辆实时检测系统【训练和系统源码+Pyside6+数据集+包运行】
|
2月前
|
人工智能 算法 安全
基于YOLOV8的骑行智能守护实时检测系统【训练和系统源码+Pyside6+数据集+包运行】
基于YOLOv8的骑行智能守护实时检测系统,通过图像处理和AI技术,实时监测电动车及骑行者头盔佩戴情况,提升道路安全。该系统支持图片、视频和摄像头实时检测,具备GUI界面,便于操作和展示结果。使用5448张真实场景图片训练,包含电动车和骑行者是否佩戴头盔的三类标注。系统基于Python和Pyside6开发,具备模型权重导入、检测置信度调节等功能。
73 0
基于YOLOV8的骑行智能守护实时检测系统【训练和系统源码+Pyside6+数据集+包运行】
|
6月前
|
机器学习/深度学习 编解码 文字识别
【开源】轻松实现车牌检测与识别:yolov8+paddleocr【python源码+数据集】
【开源】轻松实现车牌检测与识别:yolov8+paddleocr【python源码+数据集】
|
7月前
|
数据采集 关系型数据库 BI
Python路面平整度检测车辆数据——速度修正
Python路面平整度检测车辆数据——速度修正
|
计算机视觉 C++ Python
Python OpenCV高速公路道路汽车车辆侦测检测识别统计数量
Python OpenCV高速公路道路汽车车辆侦测检测识别统计数量
302 0
Python OpenCV高速公路道路汽车车辆侦测检测识别统计数量
|
数据库 计算机视觉 Docker
Python 基于 opencv 的车牌识别系统, 可以准确识别车牌号
Python 基于 opencv 的车牌识别系统, 可以准确识别车牌号
|
计算机视觉
hyperlpr车牌检测使用报错解决方案
hyperlpr车牌检测使用报错解决方案
165 0
|
编解码 数据可视化
基于PaddleOCR的多视角集装箱箱号检测识别
基于PaddleOCR的多视角集装箱箱号检测识别
基于PaddleOCR的多视角集装箱箱号检测识别