【DSW Gallery】基于EasyCV的文字识别示例

2023-03-16 275

版权

本文内容由阿里云实名注册用户自发贡献，版权归原作者所有，阿里云开发者社区不拥有其著作权，亦不承担相应法律责任。具体规则请查看《阿里云开发者社区用户服务协议》和《阿里云开发者社区知识产权保护指引》。如果您发现本社区中有涉嫌抄袭的内容，填写侵权投诉表单进行举报，一经查实，本社区将立刻删除涉嫌侵权内容。

本文涉及的产品

模型训练 PAI-DLC，5000CU*H 3个月

模型在线服务 PAI-EAS，A10/V100等 500元 1个月

交互式建模 PAI-DSW，每月250计算时 3个月

简介： EasyCV是基于Pytorch，以自监督学习和Transformer技术为核心的 all-in-one 视觉算法建模工具，并包含图像分类，度量学习，目标检测，姿态识别等视觉任务的SOTA算法。本文以文字识别为例，为您介绍如何在PAI-DSW中使用EasyCV。

直接使用

请打开基于基于EasyCV的文字识别示例，并点击右上角 “ 在DSW中打开” 。

EasyCV文字识别

OCR(Optical Character Recognition，光学字符识别)是指从图像中自动提取文字信息的技术，目前在工业界有着许多成熟的落地应用。常见的OCR算法通常由文字检测和文字识别两个部分组成。

本文将介绍如何在pai-dsw基于EasyCV进行文本检测、文本识别模型的训练，并实现端到端的预测。

运行环境要求

PAI-Pytorch 1.7/1.8镜像， GPU机型 P100 or V100，内存 32G

安装依赖包

注：在PAI-DSW docker中无需安装相关依赖，可跳过此1，2步骤，在本地notebook环境中执行1，2 步骤安装环境

1、获取torch和cuda版本，并根据版本号修改mmcv安装命令，安装对应版本的mmcv和nvidia-dali

import torch
import os
os.environ['CUDA']='cu' + torch.version.cuda.replace('.', '')
os.environ['Torch']='torch'+torch.version.__version__.replace('+PAI', '')
!echo $CUDA
!echo $Torch

/opt/conda/lib/python3.7/site-packages/tqdm/auto.py:22: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from .autonotebook import tqdm as notebook_tqdm

cu113
torch1.11.0+cu113

# install some python deps
! pip install --upgrade tqdm
! pip install mmcv-full==1.4.4 -f https://download.openmmlab.com/mmcv/dist/cu101/torch1.8.0/index.html
! pip install http://pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com/third_party/nvidia_dali_cuda100-0.25.0-1535750-py3-none-manylinux2014_x86_64.whl

2、安装EasyCV算法包注：在PAI-DSW docker中预安装了pai-easycv库，可跳过该步骤，若训练测试过程中报错，尝试用下方命令更新easycv版本

#pip install pai-easycv
! echo y | pip uninstall pai-easycv easycv
!pip install pai-easycv

from easycv.apis import *

文字检测
ocr/det/
├── ch4_test_images
    ├── 0001.jpg
    ├── 0002.jpg
    ├── 0003.jpg
    |...
├── icdar_c4_train_imgs
    ├── 0001.jpg
    ├── 0002.jpg
    ├── 0003.jpg
    |...
└── test_icdar2015_label.txt
└── train_icdar2015_label.txt
文字识别
ocr/rec/data_lmdb_release/
├── validation
    ├── data.mdb
    ├── lock.mdb

执行如下命令下载解压

! wget http://pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com/EasyCV/configs/ocr/ocr.tar.gz && tar -xpf ocr.tar.gz

--2023-02-06 10:53:16--  https://pai-vision-exp.oss-cn-zhangjiakou.aliyuncs.com/gl_pp/dsw_example_data.tar.gz
正在解析主机 pai-vision-exp.oss-cn-zhangjiakou.aliyuncs.com (pai-vision-exp.oss-cn-zhangjiakou.aliyuncs.com)... 39.98.20.19
正在连接 pai-vision-exp.oss-cn-zhangjiakou.aliyuncs.com (pai-vision-exp.oss-cn-zhangjiakou.aliyuncs.com)|39.98.20.19|:443... 已连接。
已发出 HTTP 请求，正在等待回应... 200 OK
长度： 441050586 (421M) [application/gzip]
正在保存至: “dsw_example_data.tar.gz”
dsw_example_data.ta 100%[===================>] 420.62M  12.8MB/s    用时 36s     
2023-02-06 10:53:53 (11.7 MB/s) - 已保存 “dsw_example_data.tar.gz” [441050586/441050586])

训练模型#

文字检测

# 查看easycv安装位置
import easycv
print(easycv.__file__)

# 下载文字检测的config文件
! wget http://pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com/EasyCV/configs/ocr/det_model_en.py

/opt/conda/lib/python3.7/site-packages/easycv/__init__.py

为了缩短训练时间，打开配置文件 det_model_en.py，修改total_epoch参数为10，每隔1次迭代打印一次日志。

# runtime settings
total_epochs = 10
# log config
log_config=dict(interval=1)

下列我们只示例如何进行单个gpu训练指令

!python -m easycv.tools.train  det_model_en.py --work_dir work_dir/ocr/det/dbnet

文字识别

# 下载文字识别的config文件
! wget http://pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com/EasyCV/configs/ocr/rec_model_en.py

为了缩短训练时间，打开配置文件 rec_model_en.py，修改total_epoch参数为10，每隔1次迭代打印一次日志。

# runtime settings
total_epochs = 10
# log config
log_config=dict(interval=1)

下列我们只示例如何进行单个gpu训练指令

!python -m easycv.tools.train  rec_model_en.py --work_dir work_dir/ocr/rec/

模型导出

# 查看训练产生的pt文件
! ls  work_dir/ocr/det/dbnet/*.pth && ls work_dir/ocr/rec/*.pth

work_dir/ocr/det/dbnet/epoch_10.pth  work_dir/ocr/det/dbnet/epoch_6.pth
work_dir/ocr/det/dbnet/epoch_1.pth   work_dir/ocr/det/dbnet/epoch_7.pth
work_dir/ocr/det/dbnet/epoch_2.pth   work_dir/ocr/det/dbnet/epoch_8.pth
work_dir/ocr/det/dbnet/epoch_3.pth   work_dir/ocr/det/dbnet/epoch_9.pth
work_dir/ocr/det/dbnet/epoch_4.pth   work_dir/ocr/det/dbnet/export_best.pth
work_dir/ocr/det/dbnet/epoch_5.pth
work_dir/ocr/rec/epoch_10.pth  work_dir/ocr/rec/OCRRecEvaluator_acc_best.pth
work_dir/ocr/rec/epoch_5.pth

! python -m easycv.tools.export  det_model_en.py work_dir/ocr/det/dbnet/epoch_10.pth work_dir/ocr/det/dbnet/export_best.pth

ocr/det/det_model_en.py
WARNING:root:Export needs to set model.pretrained to false to avoid hanging during distributed training
load checkpoint from local path: work_dir/ocr/det/dbnet/epoch_10.pth

! python -m easycv.tools.export  rec_model_en.py work_dir/ocr/rec/OCRRecEvaluator_acc_best.pth work_dir/ocr/rec/export_best.pth

ocr/rec/rec_model_en.py
WARNING:root:Export needs to set model.pretrained to false to avoid hanging during distributed training
load checkpoint from local path: work_dir/ocr/rec/OCRRecEvaluator_acc_best.pth

预测

下面预测过程中如果出现以下报错，请手动卸载mmdet，至终端运行 pip uninstall mmdet

KeyError: 'YOLOXLrUpdaterHook is already registered in hook'

from easycv.predictors.ocr import OCRPredictor
import cv2
# 文字检测模型
detection_model_path = 'work_dir/ocr/det/dbnet/export_best.pth'
# 文字识别模型
rec_model_path = 'work_dir/ocr/rec/export_best.pth'
predictor = OCRPredictor(
            det_model_path=detection_model_path,
            rec_model_path=rec_model_path,
            use_angle_cls=False)
# 测试图片
input_img = 'ocr/det/ch4_test_images/img_103.jpg'
img = cv2.imread(input_img)
res = predictor([img])[0]
print(res['boxes'])
print(res['rec_res'])

load checkpoint from local path: work_dir/ocr/det/dbnet/export_best.pth
load checkpoint from local path: work_dir/ocr/rec/export_best.pth
[array([[751., 200.],
       [799., 200.],
       [799., 224.],
       [751., 224.]], dtype=float32), array([[1020.,  221.],
       [1107.,  194.],
       [1116.,  224.],
       [1029.,  251.]], dtype=float32), array([[1038.,  299.],
       [1125.,  286.],
       [1129.,  319.],
       [1042.,  332.]], dtype=float32), array([[ 966.,  364.],
       [1047.,  361.],
       [1048.,  383.],
       [ 967.,  387.]], dtype=float32), array([[ 944.,  454.],
       [1051.,  471.],
       [1045.,  509.],
       [ 938.,  492.]], dtype=float32)]
[('h', 0.7921210527420044), ('c', 0.5578954219818115), ('u', 0.8516407608985901), ('phr', 0.7313658595085144), ('ni', 0.5367882251739502)]

【DSW Gallery】基于EasyCV的文字识别示例

直接使用

EasyCV文字识别

运行环境要求

安装依赖包

训练模型#

文字检测

文字识别

模型导出

预测

人工智能平台PAI

热门文章

最新文章

相关电子书

相关实验场景