本文共 2252 字,大约阅读时间需要 7 分钟。
为了解决在Jupyter Notebook和Streamlit中启动时需要切换到Py37环境的问题,以及PaddleOCR在Streamlit中的兼容性问题,可以按照以下步骤进行:
检查并切换到Py37环境:
conda create -n py37 python=3.7 -yconda activate py37
安装必要的依赖库:
pip install paddleocrpip install opencv-pythonpip install numpy
pip install paddleocr==1.1.0
调整模块导入路径:
from paddleocr import PaddleOCRocr = PaddleOCR()
清理Streamlit缓存:
streamlit cache clear
检查PaddleOCR版本:
import paddleocrprint(paddleocr.__version__)
优化预处理步骤:
import cv2image = cv2.imread("test.jpg")gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)blurred = cv2.GaussianBlur(gray, (5, 5), 0)edged = cv2.Canny(blurred, 50, 200, 255)color = cv2.cvtColor(edged, cv2.COLOR_GRAY2BGR)reverse = cv2.bitwise_not(edged)color = cv2.cvtColor(reverse, cv2.COLOR_GRAY2BGR)
使用inference模型进行推理:
python3 tools/export_model.py -c configs/rec/rec_icdar15_train.yml -o Global.checkpoints=output/rec_CRNN/best_accuracy Global.save_inference_dir=./inference/rec_crnn/
import paddleocrocr = paddleocr.PaddleOCR()result = ocr.ocr("test.jpg", rec=False)for line in result: print(line)
处理小数点和非数字字符问题:
import numpy as npdef findMaxBox(rsList, num=5): height = [] for i in rsList: oneHeight = i[2][1] - i[0][1] height.append(oneHeight) nplist = np.array(height) ind = np.argpartition(nplist, num*(-1))[num*(-1):] return rsList[ind]
优化模型并使用其他模型:
from paddleocr import PaddleOCR, draw_ocrocr = PaddleOCR(use_model='CTPN')
改进日志管理:
import subprocessresult = subprocess.run("python3 ./PaddleOCR/tools/infer/predict_rec.py --image_dir=\"test.jpg\" --rec_model_dir=./inference/rec_crnn/ --rec_image_shape=\"3, 32, 200\" --rec_char_type=\"ch\" --rec_char_dict_path=./PaddleOCR/ppocr/utils/num_dict.txt", shell=True, capture_output=True, text=True)
通过以上步骤,用户可以有效地解决在Py37环境下使用PaddleOCR和Streamlit时的各种问题,确保开发和部署过程顺利进行。
转载地址:http://knrfk.baihongyu.com/