update from original repo

revert-162-develop
Khanh Tran 5 years ago
commit 406463efb6

@ -2,9 +2,12 @@
PaddleOCR aims to create a rich, leading, and practical OCR tools that help users train better models and apply them into practice.
**Recent updates**
- 2020.5.30Model prediction and training support Windows systems, and the display of recognition results is optimized
- 2020.5.30Open source general Chinese OCR model
- 2020.5.30Provide Ultra-lightweight Chinese OCR model inference
- 2020.6.8 Add [dataset](./doc/datasets.md) and keep updating
- 2020.6.5 Add `attention` model in `inference_model`
- 2020.6.5 Support separate prediction and recognition, output result score
- 2020.5.30 Provide ultra-lightweight Chinese OCR online experience
- 2020.5.30 Model prediction and training supported on Windows system
- [more](./doc/update.md)
## Features
- Ultra-lightweight Chinese OCR model, total model size is only 8.6M
@ -38,6 +41,8 @@ Please see [Quick installation](./doc/installation.md)
#### 2. Download inference models
#### (1) Download Ultra-lightweight Chinese OCR models
*If wget is not installed in the windows system, you can copy the link to the browser to download the model. After model downloaded, unzip it and place it in the corresponding directory*
```
mkdir inference && cd inference
# Download the detection part of the Ultra-lightweight Chinese OCR and decompress it
@ -64,6 +69,9 @@ The following code implements text detection and recognition inference tandemly.
# Set PYTHONPATH environment variable
export PYTHONPATH=.
# Setting environment variable in Windows
SET PYTHONPATH=.
# Prediction on a single image by specifying image path to image_dir
python3 tools/infer/predict_system.py --image_dir="./doc/imgs/11.jpg" --det_model_dir="./inference/ch_det_mv3_db/" --rec_model_dir="./inference/ch_rec_mv3_crnn/"
@ -87,6 +95,7 @@ For more text detection and recognition models, please refer to the document [In
- [Text detection model training/evaluation/prediction](./doc/detection.md)
- [Text recognition model training/evaluation/prediction](./doc/recognition.md)
- [Inference](./doc/inference.md)
- [Dataset](./doc/datasets.md)
## Text detection algorithm
@ -104,6 +113,12 @@ On the ICDAR2015 dataset, the text detection result is as follows:
|DB|ResNet50_vd|83.79%|80.65%|82.19%|[Download link](https://paddleocr.bj.bcebos.com/det_r50_vd_db.tar)|
|DB|MobileNetV3|75.92%|73.18%|74.53%|[Download link](https://paddleocr.bj.bcebos.com/det_mv3_db.tar)|
For use of [LSVT](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/doc/datasets.md#1icdar2019-lsvt) street view dataset with a total of 3w training datathe related configuration and pre-trained models for Chinese detection task are as follows:
|Model|Backbone|Configuration file|Pre-trained model|
|-|-|-|-|
|Ultra-lightweight Chinese model|MobileNetV3|det_mv3_db.yml|[Download link](https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db.tar)|
|General Chinese OCR model|ResNet50_vd|det_r50_vd_db.yml|[Download link](https://paddleocr.bj.bcebos.com/ch_models/ch_det_r50_vd_db.tar)|
* Note: For the training and evaluation of the above DB model, post-processing parameters box_thresh=0.6 and unclip_ratio=1.5 need to be set. If using different datasets and different models for training, these two parameters can be adjusted for better result.
For the training guide and use of PaddleOCR text detection algorithms, please refer to the document [Text detection model training/evaluation/prediction](./doc/detection.md)
@ -130,6 +145,12 @@ Refer to [DTRB](https://arxiv.org/abs/1904.01906), the training and evaluation r
|RARE|Resnet34_vd|84.90%|rec_r34_vd_tps_bilstm_attn|[Download link](https://paddleocr.bj.bcebos.com/rec_r34_vd_tps_bilstm_attn.tar)|
|RARE|MobileNetV3|83.32%|rec_mv3_tps_bilstm_attn|[Download link](https://paddleocr.bj.bcebos.com/rec_mv3_tps_bilstm_attn.tar)|
We use [LSVT](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/doc/datasets.md#1icdar2019-lsvt) dataset and cropout 30w traning data from original photos by using position groundtruth and make some calibration needed. In addition, based on the LSVT corpus, 500w synthetic data is generated to train the Chinese model. The related configuration and pre-trained models are as follows:
|Model|Backbone|Configuration file|Pre-trained model|
|-|-|-|-|
|Ultra-lightweight Chinese model|MobileNetV3|rec_chinese_lite_train.yml|[Download link](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn.tar)|
|General Chinese OCR model|Resnet34_vd|rec_chinese_common_train.yml|[Download link](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn.tar)|
Please refer to the document for training guide and use of PaddleOCR text recognition algorithms [Text recognition model training/evaluation/prediction](./doc/recognition.md)
## End-to-end OCR algorithm
@ -173,6 +194,8 @@ Please refer to the document for training guide and use of PaddleOCR text recogn
Baidu Self-developed algorithms such as SAST, SRN and end2end PSL will be released in June or July. Please be patient.
[more](./doc/FAQ.md)
## Welcome to the PaddleOCR technical exchange group
Add Wechat: paddlehelp, remark OCR, small assistant will pull you into the group ~

@ -10,4 +10,3 @@ EvalReader:
TestReader:
reader_function: ppocr.data.rec.dataset_traversal,LMDBReader
lmdb_sets_dir: ./train_data/data_lmdb_release/evaluation/
infer_img: ./infer_img

@ -0,0 +1,43 @@
Global:
algorithm: CRNN
use_gpu: true
epoch_num: 3000
log_smooth_window: 20
print_batch_step: 10
save_model_dir: ./output/rec_CRNN
save_epoch_step: 3
eval_batch_step: 2000
train_batch_size_per_card: 128
test_batch_size_per_card: 128
image_shape: [3, 32, 320]
max_text_length: 25
character_type: ch
character_dict_path: ./ppocr/utils/ppocr_keys_v1.txt
loss_type: ctc
reader_yml: ./configs/rec/rec_chinese_reader.yml
pretrain_weights:
checkpoints:
save_inference_dir:
infer_img:
Architecture:
function: ppocr.modeling.architectures.rec_model,RecModel
Backbone:
function: ppocr.modeling.backbones.rec_resnet_vd,ResNet
layers: 34
Head:
function: ppocr.modeling.heads.rec_ctc_head,CTCPredict
encoder_type: rnn
SeqRNN:
hidden_size: 256
Loss:
function: ppocr.modeling.losses.rec_ctc_loss,CTCLoss
Optimizer:
function: ppocr.optimizer,AdamDecay
base_lr: 0.0005
beta1: 0.9
beta2: 0.999

@ -18,6 +18,8 @@ Global:
pretrain_weights:
checkpoints:
save_inference_dir:
infer_img:
Architecture:
function: ppocr.modeling.architectures.rec_model,RecModel

@ -11,4 +11,3 @@ EvalReader:
TestReader:
reader_function: ppocr.data.rec.dataset_traversal,SimpleReader
infer_img: ./infer_img

@ -11,4 +11,3 @@ EvalReader:
TestReader:
reader_function: ppocr.data.rec.dataset_traversal,SimpleReader
infer_img: ./infer_img

@ -17,6 +17,8 @@ Global:
pretrain_weights: ./pretrain_models/rec_mv3_none_bilstm_ctc/best_accuracy
checkpoints:
save_inference_dir:
infer_img:
Architecture:
function: ppocr.modeling.architectures.rec_model,RecModel

@ -17,6 +17,7 @@ Global:
pretrain_weights:
checkpoints:
save_inference_dir:
infer_img:
Architecture:
function: ppocr.modeling.architectures.rec_model,RecModel

@ -17,6 +17,7 @@ Global:
pretrain_weights:
checkpoints:
save_inference_dir:
infer_img:
Architecture:
function: ppocr.modeling.architectures.rec_model,RecModel

@ -13,10 +13,13 @@ Global:
max_text_length: 25
character_type: en
loss_type: attention
tps: true
reader_yml: ./configs/rec/rec_benchmark_reader.yml
pretrain_weights:
checkpoints:
save_inference_dir:
infer_img:
Architecture:
function: ppocr.modeling.architectures.rec_model,RecModel

@ -13,10 +13,12 @@ Global:
max_text_length: 25
character_type: en
loss_type: ctc
tps: true
reader_yml: ./configs/rec/rec_benchmark_reader.yml
pretrain_weights:
checkpoints:
save_inference_dir:
infer_img:
Architecture:

@ -17,6 +17,8 @@ Global:
pretrain_weights:
checkpoints:
save_inference_dir:
infer_img:
Architecture:
function: ppocr.modeling.architectures.rec_model,RecModel

@ -17,6 +17,7 @@ Global:
pretrain_weights:
checkpoints:
save_inference_dir:
infer_img:
Architecture:
function: ppocr.modeling.architectures.rec_model,RecModel

@ -13,10 +13,13 @@ Global:
max_text_length: 25
character_type: en
loss_type: attention
tps: true
reader_yml: ./configs/rec/rec_benchmark_reader.yml
pretrain_weights:
checkpoints:
save_inference_dir:
infer_img:
Architecture:
function: ppocr.modeling.architectures.rec_model,RecModel

@ -13,10 +13,13 @@ Global:
max_text_length: 25
character_type: en
loss_type: ctc
tps: true
reader_yml: ./configs/rec/rec_benchmark_reader.yml
pretrain_weights:
checkpoints:
save_inference_dir:
infer_img:
Architecture:
function: ppocr.modeling.architectures.rec_model,RecModel

@ -0,0 +1,43 @@
## FAQ
1. **预测报错got an unexpected keyword argument 'gradient_clip'**
安装的paddle版本不对目前本项目仅支持paddle1.7近期会适配到1.8。
2. **转换attention识别模型时报错KeyError: 'predict'**
基于Attention损失的识别模型推理还在调试中。对于中文文本识别建议优先选择基于CTC损失的识别模型实践中也发现基于Attention损失的效果不如基于CTC损失的识别模型。
3. **关于推理速度**
图片中的文字较多时,预测时间会增,可以使用--rec_batch_num设置更小预测batch num默认值为30可以改为10或其他数值。
4. **服务部署与移动端部署**
预计6月中下旬会先后发布基于Serving的服务部署方案和基于Paddle Lite的移动端部署方案欢迎持续关注。
5. **自研算法发布时间**
自研算法SAST、SRN、End2End-PSL都将在6-7月陆续发布敬请期待。
6. **如何在Windows或Mac系统上运行**
PaddleOCR已完成Windows和Mac系统适配运行时注意两点1、在[快速安装](installation.md)时如果不想安装docker可跳过第一步直接从第二步安装paddle开始。2、inference模型下载时如果没有安装wget可直接点击模型链接或将链接地址复制到浏览器进行下载并解压放置到相应目录。
7. **超轻量模型和通用OCR模型的区别**
目前PaddleOCR开源了2个中文模型分别是8.6M超轻量中文模型和通用中文OCR模型。两者对比信息如下
- 相同点:两者使用相同的**算法**和**训练数据**
- 不同点:不同之处在于**骨干网络**和**通道参数**超轻量模型使用MobileNetV3作为骨干网络通用模型使用Resnet50_vd作为检测模型backboneResnet34_vd作为识别模型backbone具体参数差异可对比两种模型训练的配置文件.
|模型|骨干网络|检测训练配置|识别训练配置|
|-|-|-|-|
|8.6M超轻量中文OCR模型|MobileNetV3+MobileNetV3|det_mv3_db.yml|rec_chinese_lite_train.yml|
|通用中文OCR模型|Resnet50_vd+Resnet34_vd|det_r50_vd_db.yml|rec_chinese_common_train.yml|
8. **是否有计划开源仅识别数字或仅识别英文+数字的模型**
暂不计划开源仅数字、仅数字+英文、或其他小垂类专用模型。PaddleOCR开源了多种检测、识别算法供用户自定义训练两种中文模型也是基于开源的算法库训练产出有小垂类需求的小伙伴可以按照教程准备好数据选择合适的配置文件自行训练相信能有不错的效果。训练有任何问题欢迎提issue或在交流群提问我们会及时解答。
9. **开源模型使用的训练数据是什么,能否开源**
目前开源的模型,数据集和量级如下:
- 检测:
英文数据集ICDAR2015
中文数据集LSVT街景数据集训练数据3w张图片
- 识别:
英文数据集MJSynth和SynthText合成数据数据量上千万。
中文数据集LSVT街景数据集根据真值将图crop出来并进行位置校准总共30w张图像。此外基于LSVT的语料合成数据500w。
其中,公开数据集都是开源的,用户可自行搜索下载,也可参考[中文数据集](datasets.md),合成数据暂不开源,用户可使用开源合成工具自行合成,可参考的合成工具包括[text_renderer](https://github.com/Sanster/text_renderer)、[SynthText](https://github.com/ankush-me/SynthText)、[TextRecognitionDataGenerator](https://github.com/Belval/TextRecognitionDataGenerator)等。

Binary file not shown.

Before

Width:  |  Height:  |  Size: 194 KiB

@ -0,0 +1,58 @@
## 数据集
这里整理了常用中文数据集,持续更新中,欢迎各位小伙伴贡献数据集~
- [ICDAR2019-LSVT](#ICDAR2019-LSVT)
- [ICDAR2017-RCTW-17](#ICDAR2017-RCTW-17)
- [中文街景文字识别](#中文街景文字识别)
- [中文文档文字识别](#中文文档文字识别)
- [ICDAR2019-ArT](#ICDAR2019-ArT)
除了开源数据,用户还可使用合成工具自行合成,可参考的合成工具包括[text_renderer](https://github.com/Sanster/text_renderer)、[SynthText](https://github.com/ankush-me/SynthText)、[TextRecognitionDataGenerator](https://github.com/Belval/TextRecognitionDataGenerator)等。
<a name="ICDAR2019-LSVT"></a>
#### 1、ICDAR2019-LSVT
- **数据来源**https://ai.baidu.com/broad/introduction?dataset=lsvt
- **数据简介** 共45w中文街景图像包含5w2w测试+3w训练全标注数据文本坐标+文本内容40w弱标注数据仅文本内容如下图所示
![](datasets/LSVT_1.jpg)
(a) 全标注数据
![](datasets/LSVT_2.jpg)
(b) 弱标注数据
- **下载地址**https://ai.baidu.com/broad/download?dataset=lsvt
<a name="ICDAR2017-RCTW-17"></a>
#### 2、ICDAR2017-RCTW-17
- **数据来源**https://rctw.vlrlab.net/
- **数据简介**共包含12,000+图像,大部分图片是通过手机摄像头在野外采集的。有些是截图。这些图片展示了各种各样的场景,包括街景、海报、菜单、室内场景和手机应用程序的截图。
![](datasets/rctw.jpg)
- **下载地址**https://rctw.vlrlab.net/dataset/
<a name="中文街景文字识别"></a>
#### 3、中文街景文字识别
- **数据来源**https://aistudio.baidu.com/aistudio/competition/detail/8
- **数据简介**共包括29万张图片其中21万张图片作为训练集带标注8万张作为测试集无标注。数据集采自中国街景并由街景图片中的文字行区域例如店铺标牌、地标等等截取出来而形成。所有图像都经过一些预处理将文字区域利用仿射变化等比映射为一张高为48像素的图片如图所示
![](datasets/ch_street_rec_1.png)
(a) 标注:魅派集成吊顶
![](datasets/ch_street_rec_2.png)
(b) 标注:母婴用品连锁
- **下载地址**
https://aistudio.baidu.com/aistudio/datasetdetail/8429
<a name="中文文档文字识别"></a>
#### 4、中文文档文字识别
- **数据来源**https://github.com/YCG09/chinese_ocr
- **数据简介**
- 共约364万张图片按照99:1划分成训练集和验证集。
- 数据利用中文语料库(新闻 + 文言文),通过字体、大小、灰度、模糊、透视、拉伸等变化随机生成
- 包含汉字、英文字母、数字和标点共5990个字符字符集合https://github.com/YCG09/chinese_ocr/blob/master/train/char_std_5990.txt
- 每个样本固定10个字符字符随机截取自语料库中的句子
- 图片分辨率统一为280x32
![](datasets/ch_doc1.jpg)
![](datasets/ch_doc2.jpg)
![](datasets/ch_doc3.jpg)
- **下载地址**https://pan.baidu.com/s/1QkI7kjah8SPHwOQ40rS1Pw (密码lu7m)
<a name="ICDAR2019-ArT"></a>
#### 5、ICDAR2019-ArT
- **数据来源**https://ai.baidu.com/broad/introduction?dataset=art
- **数据简介**共包含10,166张图像训练集5603图测试集4563图。由Total-Text、SCUT-CTW1500、Baidu Curved Scene Text三部分组成包含水平、多方向和弯曲等多种形状的文本。
![](datasets/ArT.jpg)
- **下载地址**https://ai.baidu.com/broad/download?dataset=art

Binary file not shown.

After

Width:  |  Height:  |  Size: 3.1 MiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 123 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 94 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 2.2 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 2.4 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 2.1 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 100 KiB

Some files were not shown because too many files have changed in this diff Show More

Loading…
Cancel
Save