1OCR
OCR (Optical Character Recognition,光学字符识别),是指电子设备(例如扫描仪或
数码相机)检查纸上打印的字符,通过检测暗、亮的模式确定其形状,然后用字符识别
方法将形状翻译成计算机文字的过程;针对印刷体字符,采用光学的方式将纸质文档中的文字转换成为黑白点阵的图像文件,并通过识别软件识别图像中的文字等信息的技术。
2 ddddocr
ddddocr 是一款简单实用的识别验证码的库。
安装方法如下:
镜像安装:pip install ddddocr -i https://pypi.tuna.tsinghua.edu.cn/simple
或 pip install ddddocr -i https://mirrors.aliyun.com/pypi/simple/
有的环境可能需要 numpy 更新
pip install --upgrade numpy -i https://mirrors.aliyun.com/pypi/simple/
3使用案例
import ddddocr ocr = ddddocr.DdddOcr() with open('code.png', 'rb') as f: img_bytes = f.read() res = ocr.classification(img_bytes) print('字母组合为'+res)
4 常见问题
报错:Microsoft Visual C++ Redistributable for Visual Studio 2019 not installed on the machine
解决方案:安装 Microsoft Visual C++ Redistributable 2019
Microsoft Visual C++ Redistributable 2019x86
Microsoft Visual C++ Redistributable 2019x64
代码详情
from selenium import webdriver from selenium.webdriver.common.by import By import ddddocr driver = webdriver.Chrome() driver.get('网址') t=True while(t): window1=driver.get_cookies() img = driver.find_element(By.XPATH,'/html/body/div[2]/div/div/div/form/div[3]/div[2]/div/img') img.screenshot('test.png') ocr = ddddocr.DdddOcr() with open("test.png", 'rb') as f: image = f.read() res = ocr.classification(image) driver.find_element(By.XPATH, value="/html/body/div[2]/div/div/div/form/div[1]/input").send_keys('账号') driver.find_element(By.XPATH,value="/html/body/div[2]/div/div/div/form/div[2]/input").send_keys('密码') driver.find_element(By.XPATH,value="/html/body/div[2]/div/div/div/form/div[3]/div[1]/div/input").send_keys(res) driver.find_element(By.XPATH, '/html/body/div[2]/div/div/div/form/div[4]/p[2]/button').click() window2=driver.get_cookies() if window1!=window2: t=False else: driver.refresh()
获得XPATH方法
1.打开登录页面,右键检查
2.再次点击右键,复制下来就可以了