验证码识别

OCR识别

brew install imagemagick
brew install tesseract-lang
pip3 install tesserocr pillow

# 普通无干扰情况下的识别
import tesserocr

print(tesserocr.file_to_text('code.jpg'))
# 有多余点干扰的识别（其中干扰点的颜色比文字浅，比如文字黑色，干扰点彩色）
import tesserocr
from PIL import Image
import numpy as np

image = Image.open('captcha2.png') #图一
# 将图片由彩色转为灰度图像
image = image.convert('L') # 图二
# 设定灰度阈值 自设置根据实际情况设置阈值去除噪声
threshold = 50
# 将图片转为Numpy数组
array = np.array(image)
# 通过where方法对数组进行筛选
# 将灰度大于阈值的图片像素设置为255，也就是白色，否则设置为0，也就是黑色(去噪处理)
array = np.where(array > threshold, 255, 0)
image = Image.fromarray(array.astype('uint8')) # 图三
print(tesserocr.image_to_text(image))

一些深度学习的办法目前无法理解，之后了解完深度学习再回来补充