直接使用
请打开基于基于EasyCV的文字识别示例,并点击右上角 “ 在DSW中打开” 。
EasyCV文字识别
OCR(Optical Character Recognition,光学字符识别)是指从图像中自动提取文字信息的技术,目前在工业界有着许多成熟的落地应用。常见的OCR算法通常由文字检测和文字识别两个部分组成。
本文将介绍如何在pai-dsw基于EasyCV进行文本检测、文本识别模型的训练,并实现端到端的预测。
运行环境要求
PAI-Pytorch 1.7/1.8镜像, GPU机型 P100 or V100, 内存 32G
安装依赖包
注:在PAI-DSW docker中无需安装相关依赖,可跳过此1,2步骤, 在本地notebook环境中执行1,2 步骤安装环境
1、获取torch和cuda版本,并根据版本号修改mmcv安装命令,安装对应版本的mmcv和nvidia-dali
import torch import os os.environ['CUDA']='cu' + torch.version.cuda.replace('.', '') os.environ['Torch']='torch'+torch.version.__version__.replace('+PAI', '') !echo $CUDA !echo $Torch
/opt/conda/lib/python3.7/site-packages/tqdm/auto.py:22: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html from .autonotebook import tqdm as notebook_tqdm
cu113 torch1.11.0+cu113
# install some python deps ! pip install --upgrade tqdm ! pip install mmcv-full==1.4.4 -f https://download.openmmlab.com/mmcv/dist/cu101/torch1.8.0/index.html ! pip install http://pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com/third_party/nvidia_dali_cuda100-0.25.0-1535750-py3-none-manylinux2014_x86_64.whl
2、安装EasyCV算法包 注:在PAI-DSW docker中预安装了pai-easycv库,可跳过该步骤,若训练测试过程中报错,尝试用下方命令更新easycv版本
#pip install pai-easycv ! echo y | pip uninstall pai-easycv easycv !pip install pai-easycv
from easycv.apis import *
文字检测 ocr/det/ ├── ch4_test_images ├── 0001.jpg ├── 0002.jpg ├── 0003.jpg |... ├── icdar_c4_train_imgs ├── 0001.jpg ├── 0002.jpg ├── 0003.jpg |... └── test_icdar2015_label.txt └── train_icdar2015_label.txt 文字识别 ocr/rec/data_lmdb_release/ ├── validation ├── data.mdb ├── lock.mdb
执行如下命令下载解压
! wget http://pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com/EasyCV/configs/ocr/ocr.tar.gz && tar -xpf ocr.tar.gz
--2023-02-06 10:53:16-- https://pai-vision-exp.oss-cn-zhangjiakou.aliyuncs.com/gl_pp/dsw_example_data.tar.gz 正在解析主机 pai-vision-exp.oss-cn-zhangjiakou.aliyuncs.com (pai-vision-exp.oss-cn-zhangjiakou.aliyuncs.com)... 39.98.20.19 正在连接 pai-vision-exp.oss-cn-zhangjiakou.aliyuncs.com (pai-vision-exp.oss-cn-zhangjiakou.aliyuncs.com)|39.98.20.19|:443... 已连接。 已发出 HTTP 请求,正在等待回应... 200 OK 长度: 441050586 (421M) [application/gzip] 正在保存至: “dsw_example_data.tar.gz” dsw_example_data.ta 100%[===================>] 420.62M 12.8MB/s 用时 36s 2023-02-06 10:53:53 (11.7 MB/s) - 已保存 “dsw_example_data.tar.gz” [441050586/441050586])
训练模型#
文字检测
# 查看easycv安装位置 import easycv print(easycv.__file__)
# 下载文字检测的config文件 ! wget http://pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com/EasyCV/configs/ocr/det_model_en.py
/opt/conda/lib/python3.7/site-packages/easycv/__init__.py
为了缩短训练时间,打开配置文件 det_model_en.py,修改total_epoch参数为10, 每隔1次迭代打印一次日志。
# runtime settings total_epochs = 10 # log config log_config=dict(interval=1)
下列我们只示例如何进行单个gpu训练指令
!python -m easycv.tools.train det_model_en.py --work_dir work_dir/ocr/det/dbnet
文字识别
# 下载文字识别的config文件 ! wget http://pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com/EasyCV/configs/ocr/rec_model_en.py
为了缩短训练时间,打开配置文件 rec_model_en.py,修改total_epoch参数为10, 每隔1次迭代打印一次日志。
# runtime settings total_epochs = 10 # log config log_config=dict(interval=1)
下列我们只示例如何进行单个gpu训练指令
!python -m easycv.tools.train rec_model_en.py --work_dir work_dir/ocr/rec/
模型导出
# 查看训练产生的pt文件 ! ls work_dir/ocr/det/dbnet/*.pth && ls work_dir/ocr/rec/*.pth
work_dir/ocr/det/dbnet/epoch_10.pth work_dir/ocr/det/dbnet/epoch_6.pth work_dir/ocr/det/dbnet/epoch_1.pth work_dir/ocr/det/dbnet/epoch_7.pth work_dir/ocr/det/dbnet/epoch_2.pth work_dir/ocr/det/dbnet/epoch_8.pth work_dir/ocr/det/dbnet/epoch_3.pth work_dir/ocr/det/dbnet/epoch_9.pth work_dir/ocr/det/dbnet/epoch_4.pth work_dir/ocr/det/dbnet/export_best.pth work_dir/ocr/det/dbnet/epoch_5.pth work_dir/ocr/rec/epoch_10.pth work_dir/ocr/rec/OCRRecEvaluator_acc_best.pth work_dir/ocr/rec/epoch_5.pth
! python -m easycv.tools.export det_model_en.py work_dir/ocr/det/dbnet/epoch_10.pth work_dir/ocr/det/dbnet/export_best.pth
ocr/det/det_model_en.py WARNING:root:Export needs to set model.pretrained to false to avoid hanging during distributed training load checkpoint from local path: work_dir/ocr/det/dbnet/epoch_10.pth
! python -m easycv.tools.export rec_model_en.py work_dir/ocr/rec/OCRRecEvaluator_acc_best.pth work_dir/ocr/rec/export_best.pth
ocr/rec/rec_model_en.py WARNING:root:Export needs to set model.pretrained to false to avoid hanging during distributed training load checkpoint from local path: work_dir/ocr/rec/OCRRecEvaluator_acc_best.pth
预测
下面预测过程中如果出现以下报错,请手动卸载mmdet,至终端运行 pip uninstall mmdet
KeyError: 'YOLOXLrUpdaterHook is already registered in hook'
from easycv.predictors.ocr import OCRPredictor import cv2 # 文字检测模型 detection_model_path = 'work_dir/ocr/det/dbnet/export_best.pth' # 文字识别模型 rec_model_path = 'work_dir/ocr/rec/export_best.pth' predictor = OCRPredictor( det_model_path=detection_model_path, rec_model_path=rec_model_path, use_angle_cls=False) # 测试图片 input_img = 'ocr/det/ch4_test_images/img_103.jpg' img = cv2.imread(input_img) res = predictor([img])[0] print(res['boxes']) print(res['rec_res'])
load checkpoint from local path: work_dir/ocr/det/dbnet/export_best.pth load checkpoint from local path: work_dir/ocr/rec/export_best.pth [array([[751., 200.], [799., 200.], [799., 224.], [751., 224.]], dtype=float32), array([[1020., 221.], [1107., 194.], [1116., 224.], [1029., 251.]], dtype=float32), array([[1038., 299.], [1125., 286.], [1129., 319.], [1042., 332.]], dtype=float32), array([[ 966., 364.], [1047., 361.], [1048., 383.], [ 967., 387.]], dtype=float32), array([[ 944., 454.], [1051., 471.], [1045., 509.], [ 938., 492.]], dtype=float32)] [('h', 0.7921210527420044), ('c', 0.5578954219818115), ('u', 0.8516407608985901), ('phr', 0.7313658595085144), ('ni', 0.5367882251739502)]