update from original repo

5 years ago · 406463efb6
parent 4d22bf3af6 bc85ebd473
commit 406463efb6
42 changed files with 1025 additions and 85 deletions
--- a/README.md
+++ b/README.md
@ -2,9 +2,12 @@
 PaddleOCR aims to create a rich, leading, and practical OCR tools that help users train better models and apply them into practice.

 **Recent updates**
- 2020.5.30，Model prediction and training support Windows systems, and the display of recognition results is optimized
- 2020.5.30，Open source general Chinese OCR model
- 2020.5.30，Provide Ultra-lightweight Chinese OCR model inference
+- 2020.6.8 Add [dataset](./doc/datasets.md) and keep updating
+- 2020.6.5 Add `attention` model in `inference_model`
+- 2020.6.5 Support separate prediction and recognition, output result score
+- 2020.5.30 Provide ultra-lightweight Chinese OCR online experience
+- 2020.5.30 Model prediction and training supported on Windows system
+- [more](./doc/update.md)

 ## Features
 - Ultra-lightweight Chinese OCR model, total model size is only 8.6M
@ -38,6 +41,8 @@ Please see [Quick installation](./doc/installation.md)
 #### 2. Download inference models

 #### (1) Download Ultra-lightweight Chinese OCR models
+*If wget is not installed in the windows system, you can copy the link to the browser to download the model. After model downloaded, unzip it and place it in the corresponding directory*
+
 ```
 mkdir inference && cd inference
 # Download the detection part of the Ultra-lightweight Chinese OCR and decompress it
@ -64,6 +69,9 @@ The following code implements text detection and recognition inference tandemly.
 # Set PYTHONPATH environment variable
 export PYTHONPATH=.

+# Setting environment variable in Windows
+SET PYTHONPATH=.
+
 # Prediction on a single image by specifying image path to image_dir
 python3 tools/infer/predict_system.py --image_dir="./doc/imgs/11.jpg" --det_model_dir="./inference/ch_det_mv3_db/"  --rec_model_dir="./inference/ch_rec_mv3_crnn/"

@ -87,6 +95,7 @@ For more text detection and recognition models, please refer to the document [In
 - [Text detection model training/evaluation/prediction](./doc/detection.md)
 - [Text recognition model training/evaluation/prediction](./doc/recognition.md)
 - [Inference](./doc/inference.md)
+- [Dataset](./doc/datasets.md)

 ## Text detection algorithm

@ -104,6 +113,12 @@ On the ICDAR2015 dataset, the text detection result is as follows:
 |DB|ResNet50_vd|83.79%|80.65%|82.19%|[Download link](https://paddleocr.bj.bcebos.com/det_r50_vd_db.tar)|
 |DB|MobileNetV3|75.92%|73.18%|74.53%|[Download link](https://paddleocr.bj.bcebos.com/det_mv3_db.tar)|

+For use of [LSVT](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/doc/datasets.md#1icdar2019-lsvt) street view dataset with a total of 3w training data，the related configuration and pre-trained models for Chinese detection task are as follows:
+|Model|Backbone|Configuration file|Pre-trained model|
+|-|-|-|-|
+|Ultra-lightweight Chinese model|MobileNetV3|det_mv3_db.yml|[Download link](https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db.tar)|
+|General Chinese OCR model|ResNet50_vd|det_r50_vd_db.yml|[Download link](https://paddleocr.bj.bcebos.com/ch_models/ch_det_r50_vd_db.tar)|
+
 * Note: For the training and evaluation of the above DB model, post-processing parameters box_thresh=0.6 and unclip_ratio=1.5 need to be set. If using different datasets and different models for training, these two parameters can be adjusted for better result.

 For the training guide and use of PaddleOCR text detection algorithms, please refer to the document [Text detection model training/evaluation/prediction](./doc/detection.md)
@ -130,6 +145,12 @@ Refer to [DTRB](https://arxiv.org/abs/1904.01906), the training and evaluation r
 |RARE|Resnet34_vd|84.90%|rec_r34_vd_tps_bilstm_attn|[Download link](https://paddleocr.bj.bcebos.com/rec_r34_vd_tps_bilstm_attn.tar)|
 |RARE|MobileNetV3|83.32%|rec_mv3_tps_bilstm_attn|[Download link](https://paddleocr.bj.bcebos.com/rec_mv3_tps_bilstm_attn.tar)|

+We use [LSVT](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/doc/datasets.md#1icdar2019-lsvt) dataset and cropout 30w  traning data from original photos by using position groundtruth and make some calibration needed. In addition, based on the LSVT corpus, 500w synthetic data is generated to train the Chinese model. The related configuration and pre-trained models are as follows:
+|Model|Backbone|Configuration file|Pre-trained model|
+|-|-|-|-|
+|Ultra-lightweight Chinese model|MobileNetV3|rec_chinese_lite_train.yml|[Download link](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn.tar)|
+|General Chinese OCR model|Resnet34_vd|rec_chinese_common_train.yml|[Download link](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn.tar)|
+
 Please refer to the document for training guide and use of PaddleOCR text recognition algorithms [Text recognition model training/evaluation/prediction](./doc/recognition.md)

 ## End-to-end OCR algorithm
@ -173,6 +194,8 @@ Please refer to the document for training guide and use of PaddleOCR text recogn

    Baidu Self-developed algorithms such as SAST, SRN and end2end PSL will be released in June or July. Please be patient.
    
+[more](./doc/FAQ.md)
+
 ## Welcome to the PaddleOCR technical exchange group
 Add Wechat: paddlehelp, remark OCR, small assistant will pull you into the group ~

--- a/configs/rec/rec_benchmark_reader.yml
+++ b/configs/rec/rec_benchmark_reader.yml
@ -10,4 +10,3 @@ EvalReader:
 TestReader:
  reader_function: ppocr.data.rec.dataset_traversal,LMDBReader
  lmdb_sets_dir: ./train_data/data_lmdb_release/evaluation/
-  infer_img: ./infer_img
--- a/configs/rec/rec_chinese_common_train.yml
+++ b/configs/rec/rec_chinese_common_train.yml
@ -0,0 +1,43 @@
+Global:
+  algorithm: CRNN
+  use_gpu: true
+  epoch_num: 3000
+  log_smooth_window: 20
+  print_batch_step: 10
+  save_model_dir: ./output/rec_CRNN
+  save_epoch_step: 3
+  eval_batch_step: 2000
+  train_batch_size_per_card: 128
+  test_batch_size_per_card: 128
+  image_shape: [3, 32, 320]
+  max_text_length: 25
+  character_type: ch
+  character_dict_path: ./ppocr/utils/ppocr_keys_v1.txt
+  loss_type: ctc
+  reader_yml: ./configs/rec/rec_chinese_reader.yml
+  pretrain_weights:
+  checkpoints:
+  save_inference_dir:
+  infer_img:
+
+Architecture:
+  function: ppocr.modeling.architectures.rec_model,RecModel
+
+Backbone:
+  function: ppocr.modeling.backbones.rec_resnet_vd,ResNet
+  layers: 34
+
+Head:
+  function: ppocr.modeling.heads.rec_ctc_head,CTCPredict
+  encoder_type: rnn
+  SeqRNN:
+    hidden_size: 256
+    
+Loss:
+  function: ppocr.modeling.losses.rec_ctc_loss,CTCLoss
+
+Optimizer:
+  function: ppocr.optimizer,AdamDecay
+  base_lr: 0.0005
+  beta1: 0.9
+  beta2: 0.999
--- a/configs/rec/rec_chinese_lite_train.yml
+++ b/configs/rec/rec_chinese_lite_train.yml
@ -18,6 +18,8 @@ Global:
  pretrain_weights:
  checkpoints:
  save_inference_dir:
+  infer_img:
+
 Architecture:
  function: ppocr.modeling.architectures.rec_model,RecModel

--- a/configs/rec/rec_chinese_reader.yml
+++ b/configs/rec/rec_chinese_reader.yml
@ -11,4 +11,3 @@ EvalReader:

 TestReader:
  reader_function: ppocr.data.rec.dataset_traversal,SimpleReader
-  infer_img: ./infer_img
--- a/configs/rec/rec_icdar15_reader.yml
+++ b/configs/rec/rec_icdar15_reader.yml
@ -11,4 +11,3 @@ EvalReader:

 TestReader:
  reader_function: ppocr.data.rec.dataset_traversal,SimpleReader
-  infer_img: ./infer_img
--- a/configs/rec/rec_icdar15_train.yml
+++ b/configs/rec/rec_icdar15_train.yml
@ -17,6 +17,8 @@ Global:
  pretrain_weights: ./pretrain_models/rec_mv3_none_bilstm_ctc/best_accuracy
  checkpoints:
  save_inference_dir:
+  infer_img:
+
 Architecture:
  function: ppocr.modeling.architectures.rec_model,RecModel

--- a/configs/rec/rec_mv3_none_bilstm_ctc.yml
+++ b/configs/rec/rec_mv3_none_bilstm_ctc.yml
@ -17,6 +17,7 @@ Global:
  pretrain_weights:
  checkpoints:
  save_inference_dir:
+  infer_img:
  
 Architecture:
  function: ppocr.modeling.architectures.rec_model,RecModel
--- a/configs/rec/rec_mv3_none_none_ctc.yml
+++ b/configs/rec/rec_mv3_none_none_ctc.yml
@ -17,6 +17,7 @@ Global:
  pretrain_weights: 
  checkpoints:
  save_inference_dir:
+  infer_img:

 Architecture:
  function: ppocr.modeling.architectures.rec_model,RecModel
--- a/configs/rec/rec_mv3_tps_bilstm_attn.yml
+++ b/configs/rec/rec_mv3_tps_bilstm_attn.yml
@ -13,10 +13,13 @@ Global:
  max_text_length: 25
  character_type: en
  loss_type: attention
+  tps: true
  reader_yml: ./configs/rec/rec_benchmark_reader.yml
  pretrain_weights:
  checkpoints:
  save_inference_dir:
+  infer_img:
+

 Architecture:
  function: ppocr.modeling.architectures.rec_model,RecModel
--- a/configs/rec/rec_mv3_tps_bilstm_ctc.yml
+++ b/configs/rec/rec_mv3_tps_bilstm_ctc.yml
@ -13,10 +13,12 @@ Global:
  max_text_length: 25
  character_type: en
  loss_type: ctc
+  tps: true
  reader_yml: ./configs/rec/rec_benchmark_reader.yml
  pretrain_weights:
  checkpoints:
  save_inference_dir:
+  infer_img:

  
 Architecture:
--- a/configs/rec/rec_r34_vd_none_bilstm_ctc.yml
+++ b/configs/rec/rec_r34_vd_none_bilstm_ctc.yml
@ -17,6 +17,8 @@ Global:
  pretrain_weights:
  checkpoints:
  save_inference_dir:
+  infer_img:
+

 Architecture:
  function: ppocr.modeling.architectures.rec_model,RecModel
--- a/configs/rec/rec_r34_vd_none_none_ctc.yml
+++ b/configs/rec/rec_r34_vd_none_none_ctc.yml
@ -17,6 +17,7 @@ Global:
  pretrain_weights:
  checkpoints:
  save_inference_dir:
+  infer_img:
  
 Architecture:
  function: ppocr.modeling.architectures.rec_model,RecModel
--- a/configs/rec/rec_r34_vd_tps_bilstm_attn.yml
+++ b/configs/rec/rec_r34_vd_tps_bilstm_attn.yml
@ -13,10 +13,13 @@ Global:
  max_text_length: 25
  character_type: en
  loss_type: attention
+  tps: true
  reader_yml: ./configs/rec/rec_benchmark_reader.yml
  pretrain_weights:
  checkpoints:
  save_inference_dir:
+  infer_img:
+

 Architecture:
  function: ppocr.modeling.architectures.rec_model,RecModel
--- a/configs/rec/rec_r34_vd_tps_bilstm_ctc.yml
+++ b/configs/rec/rec_r34_vd_tps_bilstm_ctc.yml
@ -13,10 +13,13 @@ Global:
  max_text_length: 25
  character_type: en
  loss_type: ctc
+  tps: true
  reader_yml: ./configs/rec/rec_benchmark_reader.yml
  pretrain_weights:
  checkpoints:
  save_inference_dir:
+  infer_img:
+

 Architecture:
  function: ppocr.modeling.architectures.rec_model,RecModel
--- a/doc/FAQ.md
+++ b/doc/FAQ.md
@ -0,0 +1,43 @@
+## FAQ
+
+1. **预测报错：got an unexpected keyword argument 'gradient_clip'**  
+安装的paddle版本不对，目前本项目仅支持paddle1.7，近期会适配到1.8。
+
+2. **转换attention识别模型时报错：KeyError: 'predict'**  
+基于Attention损失的识别模型推理还在调试中。对于中文文本识别，建议优先选择基于CTC损失的识别模型，实践中也发现基于Attention损失的效果不如基于CTC损失的识别模型。
+
+3. **关于推理速度**  
+图片中的文字较多时，预测时间会增，可以使用--rec_batch_num设置更小预测batch num，默认值为30，可以改为10或其他数值。
+
+4. **服务部署与移动端部署**  
+预计6月中下旬会先后发布基于Serving的服务部署方案和基于Paddle Lite的移动端部署方案，欢迎持续关注。
+
+5. **自研算法发布时间**  
+自研算法SAST、SRN、End2End-PSL都将在6-7月陆续发布，敬请期待。
+    
+6. **如何在Windows或Mac系统上运行**  
+PaddleOCR已完成Windows和Mac系统适配，运行时注意两点：1、在[快速安装](installation.md)时，如果不想安装docker，可跳过第一步，直接从第二步安装paddle开始。2、inference模型下载时，如果没有安装wget，可直接点击模型链接或将链接地址复制到浏览器进行下载，并解压放置到相应目录。
+
+7. **超轻量模型和通用OCR模型的区别**  
+目前PaddleOCR开源了2个中文模型，分别是8.6M超轻量中文模型和通用中文OCR模型。两者对比信息如下：
+    - 相同点：两者使用相同的**算法**和**训练数据**；  
+    - 不同点：不同之处在于**骨干网络**和**通道参数**，超轻量模型使用MobileNetV3作为骨干网络，通用模型使用Resnet50_vd作为检测模型backbone，Resnet34_vd作为识别模型backbone，具体参数差异可对比两种模型训练的配置文件.
+    
+|模型|骨干网络|检测训练配置|识别训练配置|
+|-|-|-|-|
+|8.6M超轻量中文OCR模型|MobileNetV3+MobileNetV3|det_mv3_db.yml|rec_chinese_lite_train.yml|
+|通用中文OCR模型|Resnet50_vd+Resnet34_vd|det_r50_vd_db.yml|rec_chinese_common_train.yml|
+
+8. **是否有计划开源仅识别数字或仅识别英文+数字的模型**  
+暂不计划开源仅数字、仅数字+英文、或其他小垂类专用模型。PaddleOCR开源了多种检测、识别算法供用户自定义训练，两种中文模型也是基于开源的算法库训练产出，有小垂类需求的小伙伴，可以按照教程准备好数据，选择合适的配置文件，自行训练，相信能有不错的效果。训练有任何问题欢迎提issue或在交流群提问，我们会及时解答。
+
+9. **开源模型使用的训练数据是什么，能否开源**  
+目前开源的模型，数据集和量级如下：
+    - 检测：  
+    英文数据集，ICDAR2015  
+    中文数据集，LSVT街景数据集训练数据3w张图片
+    - 识别：  
+    英文数据集，MJSynth和SynthText合成数据，数据量上千万。  
+    中文数据集，LSVT街景数据集根据真值将图crop出来，并进行位置校准，总共30w张图像。此外基于LSVT的语料，合成数据500w。
+
+    其中，公开数据集都是开源的，用户可自行搜索下载，也可参考[中文数据集](datasets.md)，合成数据暂不开源，用户可使用开源合成工具自行合成，可参考的合成工具包括[text_renderer](https://github.com/Sanster/text_renderer)、[SynthText](https://github.com/ankush-me/SynthText)、[TextRecognitionDataGenerator](https://github.com/Belval/TextRecognitionDataGenerator)等。
--- a/doc/WeChat.jpeg
+++ b/doc/WeChat.jpeg
--- a/doc/datasets.md
+++ b/doc/datasets.md
@ -0,0 +1,58 @@
+## 数据集
+这里整理了常用中文数据集，持续更新中，欢迎各位小伙伴贡献数据集～
+- [ICDAR2019-LSVT](#ICDAR2019-LSVT)
+- [ICDAR2017-RCTW-17](#ICDAR2017-RCTW-17)
+- [中文街景文字识别](#中文街景文字识别)
+- [中文文档文字识别](#中文文档文字识别)
+- [ICDAR2019-ArT](#ICDAR2019-ArT)
+
+除了开源数据，用户还可使用合成工具自行合成，可参考的合成工具包括[text_renderer](https://github.com/Sanster/text_renderer)、[SynthText](https://github.com/ankush-me/SynthText)、[TextRecognitionDataGenerator](https://github.com/Belval/TextRecognitionDataGenerator)等。
+
+<a name="ICDAR2019-LSVT"></a>
+#### 1、ICDAR2019-LSVT
+- **数据来源**：https://ai.baidu.com/broad/introduction?dataset=lsvt
+- **数据简介**： 共45w中文街景图像，包含5w（2w测试+3w训练）全标注数据（文本坐标+文本内容），40w弱标注数据（仅文本内容），如下图所示：
+    ![](datasets/LSVT_1.jpg)
+    (a) 全标注数据
+    ![](datasets/LSVT_2.jpg)
+    (b) 弱标注数据
+- **下载地址**：https://ai.baidu.com/broad/download?dataset=lsvt
+
+<a name="ICDAR2017-RCTW-17"></a>
+#### 2、ICDAR2017-RCTW-17
+- **数据来源**：https://rctw.vlrlab.net/
+- **数据简介**：共包含12,000+图像，大部分图片是通过手机摄像头在野外采集的。有些是截图。这些图片展示了各种各样的场景，包括街景、海报、菜单、室内场景和手机应用程序的截图。
+    ![](datasets/rctw.jpg)
+- **下载地址**：https://rctw.vlrlab.net/dataset/
+
+<a name="中文街景文字识别"></a>
+#### 3、中文街景文字识别
+- **数据来源**：https://aistudio.baidu.com/aistudio/competition/detail/8
+- **数据简介**：共包括29万张图片，其中21万张图片作为训练集（带标注），8万张作为测试集（无标注）。数据集采自中国街景，并由街景图片中的文字行区域（例如店铺标牌、地标等等）截取出来而形成。所有图像都经过一些预处理，将文字区域利用仿射变化，等比映射为一张高为48像素的图片，如图所示：
+    ![](datasets/ch_street_rec_1.png)  
+    (a) 标注：魅派集成吊顶  
+    ![](datasets/ch_street_rec_2.png)  
+    (b) 标注：母婴用品连锁  
+- **下载地址**
+https://aistudio.baidu.com/aistudio/datasetdetail/8429
+
+<a name="中文文档文字识别"></a>
+#### 4、中文文档文字识别
+- **数据来源**：https://github.com/YCG09/chinese_ocr  
+- **数据简介**：  
+    - 共约364万张图片，按照99:1划分成训练集和验证集。
+    - 数据利用中文语料库（新闻 + 文言文），通过字体、大小、灰度、模糊、透视、拉伸等变化随机生成
+    - 包含汉字、英文字母、数字和标点共5990个字符（字符集合：https://github.com/YCG09/chinese_ocr/blob/master/train/char_std_5990.txt ）
+    - 每个样本固定10个字符，字符随机截取自语料库中的句子
+    - 图片分辨率统一为280x32  
+    ![](datasets/ch_doc1.jpg)  
+    ![](datasets/ch_doc2.jpg)  
+    ![](datasets/ch_doc3.jpg)  
+- **下载地址**：https://pan.baidu.com/s/1QkI7kjah8SPHwOQ40rS1Pw (密码：lu7m)
+
+<a name="ICDAR2019-ArT"></a>
+#### 5、ICDAR2019-ArT
+- **数据来源**：https://ai.baidu.com/broad/introduction?dataset=art
+- **数据简介**：共包含10,166张图像，训练集5603图，测试集4563图。由Total-Text、SCUT-CTW1500、Baidu Curved Scene Text三部分组成，包含水平、多方向和弯曲等多种形状的文本。
+    ![](datasets/ArT.jpg)
+- **下载地址**：https://ai.baidu.com/broad/download?dataset=art
--- a/doc/datasets/ArT.jpg
+++ b/doc/datasets/ArT.jpg
--- a/doc/datasets/LSVT_1.jpg
+++ b/doc/datasets/LSVT_1.jpg
--- a/doc/datasets/LSVT_2.jpg
+++ b/doc/datasets/LSVT_2.jpg
--- a/doc/datasets/ch_doc1.jpg
+++ b/doc/datasets/ch_doc1.jpg
--- a/doc/datasets/ch_doc2.jpg
+++ b/doc/datasets/ch_doc2.jpg
--- a/doc/datasets/ch_doc3.jpg
+++ b/doc/datasets/ch_doc3.jpg
--- a/doc/datasets/ch_street_rec_1.png
+++ b/doc/datasets/ch_street_rec_1.png
--- a/Show More
+++ b/Show More