Merge pull request #4 from PaddlePaddle/develop

merge paddleocr
5 years ago · 7c09c97d70
parent c1d19ce23f 5ee40948e0
commit 7c09c97d70
40 changed files with 4072 additions and 80 deletions
--- a/README.md
+++ b/README.md
@ -4,12 +4,11 @@ English | [简体中文](README_cn.md)
 PaddleOCR aims to create rich, leading, and practical OCR tools that help users train better models and apply them into practice.

 **Recent updates**
+- 2020.8.16, Release text detection algorithm [SAST](https://arxiv.org/abs/1908.05498) and text recognition algorithm [SRN](https://arxiv.org/abs/2003.12294)
 - 2020.7.23, Release the playback and PPT of live class on BiliBili station, PaddleOCR Introduction, [address](https://aistudio.baidu.com/aistudio/course/introduce/1519)
 - 2020.7.15, Add mobile App demo , support both iOS and  Android  ( based on easyedge and Paddle Lite)
 - 2020.7.15, Improve the  deployment ability, add the C + +  inference , serving deployment. In addtion, the benchmarks of the ultra-lightweight OCR model are provided.
 - 2020.7.15, Add several related datasets, data annotation and synthesis tools.
- 2020.7.9 Add a new model to support recognize the  character "space".
- 2020.7.9 Add the data augument and learning rate decay strategies during training.
 - [more](./doc/doc_en/update_en.md)

 ## Features
@ -91,7 +90,7 @@ Mobile DEMO experience (based on EasyEdge and Paddle-Lite, supports iOS and Andr
 PaddleOCR open source text detection algorithms list:
 - [x]  EAST([paper](https://arxiv.org/abs/1704.03155))
 - [x]  DB([paper](https://arxiv.org/abs/1911.08947))
- [ ]  SAST([paper](https://arxiv.org/abs/1908.05498))(Baidu Self-Research, comming soon)
+- [x]  SAST([paper](https://arxiv.org/abs/1908.05498))(Baidu Self-Research)

 On the ICDAR2015 dataset, the text detection result is as follows:

@ -101,6 +100,13 @@ On the ICDAR2015 dataset, the text detection result is as follows:
 |EAST|MobileNetV3|81.67%|79.83%|80.74%|[Download link](https://paddleocr.bj.bcebos.com/det_mv3_east.tar)|
 |DB|ResNet50_vd|83.79%|80.65%|82.19%|[Download link](https://paddleocr.bj.bcebos.com/det_r50_vd_db.tar)|
 |DB|MobileNetV3|75.92%|73.18%|74.53%|[Download link](https://paddleocr.bj.bcebos.com/det_mv3_db.tar)|
+|SAST|ResNet50_vd|92.18%|82.96%|87.33%|[Download link](https://paddleocr.bj.bcebos.com/SAST/sast_r50_vd_icdar2015.tar)|
+
+On Total-Text dataset, the text detection result is as follows:
+
+|Model|Backbone|precision|recall|Hmean|Download link|
+|-|-|-|-|-|-|
+|SAST|ResNet50_vd|88.74%|79.80%|84.03%|[Download link](https://paddleocr.bj.bcebos.com/SAST/sast_r50_vd_total_text.tar)|

 For use of [LSVT](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/doc/doc_en/datasets_en.md#1-icdar2019-lsvt) street view dataset with a total of 3w training data，the related configuration and pre-trained models for text detection task are as follows:
 |Model|Backbone|Configuration file|Pre-trained model|
@ -120,7 +126,7 @@ PaddleOCR open-source text recognition algorithms list:
 - [x]  Rosetta([paper](https://arxiv.org/abs/1910.05085))
 - [x]  STAR-Net([paper](http://www.bmva.org/bmvc/2016/papers/paper043/index.html))
 - [x]  RARE([paper](https://arxiv.org/abs/1603.03915v1))
- [ ]  SRN([paper](https://arxiv.org/abs/2003.12294))(Baidu Self-Research, comming soon)
+- [x]  SRN([paper](https://arxiv.org/abs/2003.12294))(Baidu Self-Research)

 Refer to [DTRB](https://arxiv.org/abs/1904.01906), the training and evaluation result of these above text recognition (using MJSynth and SynthText for training, evaluate on IIIT, SVT, IC03, IC13, IC15, SVTP, CUTE) is as follow:

@ -134,8 +140,14 @@ Refer to [DTRB](https://arxiv.org/abs/1904.01906), the training and evaluation r
 |STAR-Net|MobileNetV3|81.56%|rec_mv3_tps_bilstm_ctc|[Download link](https://paddleocr.bj.bcebos.com/rec_mv3_tps_bilstm_ctc.tar)|
 |RARE|Resnet34_vd|84.90%|rec_r34_vd_tps_bilstm_attn|[Download link](https://paddleocr.bj.bcebos.com/rec_r34_vd_tps_bilstm_attn.tar)|
 |RARE|MobileNetV3|83.32%|rec_mv3_tps_bilstm_attn|[Download link](https://paddleocr.bj.bcebos.com/rec_mv3_tps_bilstm_attn.tar)|
+|SRN|Resnet50_vd_fpn|88.33%|rec_r50fpn_vd_none_srn|[Download link](https://paddleocr.bj.bcebos.com/SRN/rec_r50fpn_vd_none_srn.tar)|
+
+**Note：** SRN model uses data expansion method to expand the two training sets mentioned above, and the expanded data can be downloaded from [Baidu Drive](todo).
+
+The average accuracy of the two-stage training in the original paper is 89.74%, and that of one stage training in paddleocr is 88.33%. Both pre-trained weights can be downloaded [here](https://paddleocr.bj.bcebos.com/SRN/rec_r50fpn_vd_none_srn.tar).

 We use [LSVT](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/doc/doc_en/datasets_en.md#1-icdar2019-lsvt) dataset and cropout 30w  traning data from original photos by using position groundtruth and make some calibration needed. In addition, based on the LSVT corpus, 500w synthetic data is generated to train the model. The related configuration and pre-trained models are as follows:
+
 |Model|Backbone|Configuration file|Pre-trained model|
 |-|-|-|-|
 |ultra-lightweight OCR model|MobileNetV3|rec_chinese_lite_train.yml|[Download link](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn.tar)|[inference model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_enhance_infer.tar) & [pre-trained model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_enhance.tar)|
--- a/README_cn.md
+++ b/README_cn.md
@ -4,12 +4,11 @@
 PaddleOCR旨在打造一套丰富、领先、且实用的OCR工具库，助力使用者训练出更好的模型，并应用落地。

 **近期更新**
+- 2020.8.16 开源文本检测算法[SAST](https://arxiv.org/abs/1908.05498)和文本识别算法[SRN](https://arxiv.org/abs/2003.12294)
 - 2020.7.23 发布7月21日B站直播课回放和PPT，PaddleOCR开源大礼包全面解读，[获取地址](https://aistudio.baidu.com/aistudio/course/introduce/1519)
 - 2020.7.15 添加基于EasyEdge和Paddle-Lite的移动端DEMO，支持iOS和Android系统
 - 2020.7.15 完善预测部署，添加基于C++预测引擎推理、服务化部署和端侧部署方案，以及超轻量级中文OCR模型预测耗时Benchmark
 - 2020.7.15 整理OCR相关数据集、常用数据标注以及合成工具
- 2020.7.9 添加支持空格的识别模型，识别效果，预测及训练方式请参考快速开始和文本识别训练相关文档
- 2020.7.9 添加数据增强、学习率衰减策略,具体参考[配置文件](./doc/doc_ch/config.md)
 - [more](./doc/doc_ch/update.md)


@ -93,7 +92,7 @@ PaddleOCR旨在打造一套丰富、领先、且实用的OCR工具库，助力
 PaddleOCR开源的文本检测算法列表：
 - [x]  EAST([paper](https://arxiv.org/abs/1704.03155))
 - [x]  DB([paper](https://arxiv.org/abs/1911.08947))
- [ ]  SAST([paper](https://arxiv.org/abs/1908.05498))(百度自研, coming soon)
+- [x]  SAST([paper](https://arxiv.org/abs/1908.05498))(百度自研)

 在ICDAR2015文本检测公开数据集上，算法效果如下：

@ -103,8 +102,16 @@ PaddleOCR开源的文本检测算法列表：
 |EAST|MobileNetV3|81.67%|79.83%|80.74%|[下载链接](https://paddleocr.bj.bcebos.com/det_mv3_east.tar)|
 |DB|ResNet50_vd|83.79%|80.65%|82.19%|[下载链接](https://paddleocr.bj.bcebos.com/det_r50_vd_db.tar)|
 |DB|MobileNetV3|75.92%|73.18%|74.53%|[下载链接](https://paddleocr.bj.bcebos.com/det_mv3_db.tar)|
+|SAST|ResNet50_vd|92.18%|82.96%|87.33%|[下载链接](https://paddleocr.bj.bcebos.com/SAST/sast_r50_vd_icdar2015.tar)|
+
+在Total-text文本检测公开数据集上，算法效果如下：
+
+|模型|骨干网络|precision|recall|Hmean|下载链接|
+|-|-|-|-|-|-|
+|SAST|ResNet50_vd|88.74%|79.80%|84.03%|[下载链接](https://paddleocr.bj.bcebos.com/SAST/sast_r50_vd_total_text.tar)|

 使用[LSVT](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/doc/doc_ch/datasets.md#1icdar2019-lsvt)街景数据集共3w张数据，训练中文检测模型的相关配置和预训练文件如下：
+
 |模型|骨干网络|配置文件|预训练模型|
 |-|-|-|-|
 |超轻量中文模型|MobileNetV3|det_mv3_db.yml|[下载链接](https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db.tar)|
@ -122,7 +129,7 @@ PaddleOCR开源的文本识别算法列表：
 - [x]  Rosetta([paper](https://arxiv.org/abs/1910.05085))
 - [x]  STAR-Net([paper](http://www.bmva.org/bmvc/2016/papers/paper043/index.html))
 - [x]  RARE([paper](https://arxiv.org/abs/1603.03915v1))
- [ ]  SRN([paper](https://arxiv.org/abs/2003.12294))(百度自研, coming soon)
+- [x]  SRN([paper](https://arxiv.org/abs/2003.12294))(百度自研)

 参考[DTRB](https://arxiv.org/abs/1904.01906)文字识别训练和评估流程，使用MJSynth和SynthText两个文字识别数据集训练，在IIIT, SVT, IC03, IC13, IC15, SVTP, CUTE数据集上进行评估，算法效果如下：

@ -136,6 +143,10 @@ PaddleOCR开源的文本识别算法列表：
 |STAR-Net|MobileNetV3|81.56%|rec_mv3_tps_bilstm_ctc|[下载链接](https://paddleocr.bj.bcebos.com/rec_mv3_tps_bilstm_ctc.tar)|
 |RARE|Resnet34_vd|84.90%|rec_r34_vd_tps_bilstm_attn|[下载链接](https://paddleocr.bj.bcebos.com/rec_r34_vd_tps_bilstm_attn.tar)|
 |RARE|MobileNetV3|83.32%|rec_mv3_tps_bilstm_attn|[下载链接](https://paddleocr.bj.bcebos.com/rec_mv3_tps_bilstm_attn.tar)|
+|SRN|Resnet50_vd_fpn|88.33%|rec_r50fpn_vd_none_srn|[下载链接](https://paddleocr.bj.bcebos.com/SRN/rec_r50fpn_vd_none_srn.tar)|
+
+**说明：** SRN模型使用了数据扰动方法对上述提到对两个训练集进行增广,增广后的数据可以在[百度网盘](todo)上下载。
+原始论文使用两阶段训练平均精度为89.74%，PaddleOCR中使用one-stage训练，平均精度为88.33%。两种预训练权重均在[下载链接](https://paddleocr.bj.bcebos.com/SRN/rec_r50fpn_vd_none_srn.tar)中。

 使用[LSVT](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/doc/doc_ch/datasets.md#1icdar2019-lsvt)街景数据集根据真值将图crop出来30w数据，进行位置校准。此外基于LSVT语料生成500w合成数据训练中文模型，相关配置和预训练文件如下：  

--- a/configs/det/det_r50_vd_sast_icdar15.yml
+++ b/configs/det/det_r50_vd_sast_icdar15.yml
@ -0,0 +1,50 @@
+Global:
+  algorithm: SAST
+  use_gpu: true
+  epoch_num: 2000
+  log_smooth_window: 20
+  print_batch_step: 2
+  save_model_dir: ./output/det_sast/
+  save_epoch_step: 20
+  eval_batch_step: 5000
+  train_batch_size_per_card: 8
+  test_batch_size_per_card: 8
+  image_shape: [3, 512, 512]
+  reader_yml: ./configs/det/det_sast_icdar15_reader.yml
+  pretrain_weights: ./pretrain_models/ResNet50_vd_ssld_pretrained/
+  save_res_path: ./output/det_sast/predicts_sast.txt
+  checkpoints: 
+  save_inference_dir:
+
+Architecture:
+  function: ppocr.modeling.architectures.det_model,DetModel
+
+Backbone:
+  function: ppocr.modeling.backbones.det_resnet_vd_sast,ResNet
+  layers: 50
+
+Head:
+  function: ppocr.modeling.heads.det_sast_head,SASTHead
+  model_name: large
+  only_fpn_up: False
+#   with_cab: False
+  with_cab: True
+
+Loss:
+  function: ppocr.modeling.losses.det_sast_loss,SASTLoss
+
+Optimizer:
+  function: ppocr.optimizer,RMSProp
+  base_lr: 0.001
+  decay:
+    function: piecewise_decay
+    boundaries: [30000, 50000, 80000, 100000, 150000]
+    decay_rate: 0.3
+
+PostProcess:
+  function: ppocr.postprocess.sast_postprocess,SASTPostProcess
+  score_thresh: 0.5
+  sample_pts_num: 2
+  nms_thresh: 0.2
+  expand_scale: 1.0
+  shrink_ratio_of_width: 0.3
--- a/configs/det/det_r50_vd_sast_totaltext.yml
+++ b/configs/det/det_r50_vd_sast_totaltext.yml
@ -0,0 +1,50 @@
+Global:
+  algorithm: SAST
+  use_gpu: true
+  epoch_num: 2000
+  log_smooth_window: 20
+  print_batch_step: 2
+  save_model_dir: ./output/det_sast/
+  save_epoch_step: 20
+  eval_batch_step: 5000
+  train_batch_size_per_card: 8
+  test_batch_size_per_card: 1
+  image_shape: [3, 512, 512]
+  reader_yml: ./configs/det/det_sast_totaltext_reader.yml
+  pretrain_weights: ./pretrain_models/ResNet50_vd_ssld_pretrained/
+  save_res_path: ./output/det_sast/predicts_sast.txt
+  checkpoints:
+  save_inference_dir:
+
+Architecture:
+  function: ppocr.modeling.architectures.det_model,DetModel
+
+Backbone:
+  function: ppocr.modeling.backbones.det_resnet_vd_sast,ResNet
+  layers: 50
+
+Head:
+  function: ppocr.modeling.heads.det_sast_head,SASTHead
+  model_name: large
+  only_fpn_up: False
+  # with_cab: False
+  with_cab: True
+
+Loss:
+  function: ppocr.modeling.losses.det_sast_loss,SASTLoss
+
+Optimizer:
+  function: ppocr.optimizer,RMSProp
+  base_lr: 0.001
+  decay:
+    function: piecewise_decay
+    boundaries: [30000, 50000, 80000, 100000, 150000]
+    decay_rate: 0.3
+
+PostProcess:
+  function: ppocr.postprocess.sast_postprocess,SASTPostProcess
+  score_thresh: 0.5
+  sample_pts_num: 6
+  nms_thresh: 0.2
+  expand_scale: 1.2
+  shrink_ratio_of_width: 0.2
--- a/configs/det/det_sast_icdar15_reader.yml
+++ b/configs/det/det_sast_icdar15_reader.yml
@ -0,0 +1,26 @@
+TrainReader:
+  reader_function: ppocr.data.det.dataset_traversal,TrainReader
+  process_function: ppocr.data.det.sast_process,SASTProcessTrain
+  num_workers: 8
+  img_set_dir: ./train_data/
+  label_file_path: [./train_data/icdar13/train_label_json.txt, ./train_data/icdar15/train_label_json.txt, ./train_data/icdar17_mlt_latin/train_label_json.txt, ./train_data/coco_text_icdar_4pts/train_label_json.txt]
+  data_ratio_list: [0.1, 0.45, 0.3, 0.15]
+  min_crop_side_ratio: 0.3
+  min_crop_size: 24
+  min_text_size: 4
+  max_text_size: 512
+
+EvalReader:
+  reader_function: ppocr.data.det.dataset_traversal,EvalTestReader
+  process_function: ppocr.data.det.sast_process,SASTProcessTest
+  img_set_dir: ./train_data/icdar2015/text_localization/
+  label_file_path: ./train_data/icdar2015/text_localization/test_icdar2015_label.txt
+  max_side_len: 1536
+  
+TestReader:
+  reader_function: ppocr.data.det.dataset_traversal,EvalTestReader
+  process_function: ppocr.data.det.sast_process,SASTProcessTest
+  infer_img: 
+  img_set_dir: ./train_data/icdar2015/text_localization/
+  label_file_path: ./train_data/icdar2015/text_localization/test_icdar2015_label.txt
+  do_eval: True
--- a/configs/det/det_sast_totaltext_reader.yml
+++ b/configs/det/det_sast_totaltext_reader.yml
@ -0,0 +1,24 @@
+TrainReader:
+  reader_function: ppocr.data.det.dataset_traversal,TrainReader
+  process_function: ppocr.data.det.sast_process,SASTProcessTrain
+  num_workers: 8
+  img_set_dir: ./train_data/
+  label_file_path: [./train_data/art_latin_icdar_14pt/train_no_tt_test/train_label_json.txt, ./train_data/total_text_icdar_14pt/train/train_label_json.txt]
+  data_ratio_list: [0.5, 0.5]
+  min_crop_side_ratio: 0.3
+  min_crop_size: 24
+  min_text_size: 4
+  max_text_size: 512
+
+EvalReader:
+  reader_function: ppocr.data.det.dataset_traversal,EvalTestReader
+  process_function: ppocr.data.det.sast_process,SASTProcessTest
+  img_set_dir: ./train_data/afs/
+  label_file_path: ./train_data/afs/total_text/test_label_json.txt
+  max_side_len: 768
+  
+TestReader:
+  reader_function: ppocr.data.det.dataset_traversal,EvalTestReader
+  process_function: ppocr.data.det.sast_process,SASTProcessTest
+  infer_img: 
+  max_side_len: 768
--- a/configs/rec/rec_r50fpn_vd_none_srn.yml
+++ b/configs/rec/rec_r50fpn_vd_none_srn.yml
@ -0,0 +1,49 @@
+Global:
+  algorithm: SRN
+  use_gpu: true
+  epoch_num: 72
+  log_smooth_window: 20
+  print_batch_step: 10
+  save_model_dir: output/rec_pvam_withrotate
+  save_epoch_step: 1
+  eval_batch_step: 8000
+  train_batch_size_per_card: 64
+  test_batch_size_per_card: 1
+  image_shape: [1, 64, 256]
+  max_text_length: 25
+  character_type: en
+  loss_type: srn
+  num_heads: 8
+  average_window: 0.15
+  max_average_window: 15625
+  min_average_window: 10000
+  reader_yml: ./configs/rec/rec_benchmark_reader.yml
+  pretrain_weights: 
+  checkpoints:
+  save_inference_dir:
+  infer_img:
+
+Architecture:
+  function: ppocr.modeling.architectures.rec_model,RecModel
+
+Backbone:
+  function: ppocr.modeling.backbones.rec_resnet50_fpn,ResNet
+  layers: 50
+ 
+Head:
+  function: ppocr.modeling.heads.rec_srn_all_head,SRNPredict
+  encoder_type: rnn
+  num_encoder_TUs: 2
+  num_decoder_TUs: 4
+  hidden_dims: 512
+  SeqRNN:
+    hidden_size: 256
+    
+Loss:
+  function: ppocr.modeling.losses.rec_srn_loss,SRNLoss
+
+Optimizer:
+  function: ppocr.optimizer,AdamDecay
+  base_lr: 0.0001
+  beta1: 0.9
+  beta2: 0.999
--- a/deploy/android_demo/gradle/wrapper/gradle-wrapper.properties
+++ b/deploy/android_demo/gradle/wrapper/gradle-wrapper.properties
@ -1,4 +1,4 @@
-#Thu Aug 22 15:05:37 CST 2019
+#Wed Jul 22 23:48:44 CST 2020
 distributionBase=GRADLE_USER_HOME
 distributionPath=wrapper/dists
 zipStoreBase=GRADLE_USER_HOME
--- a/doc/doc_ch/config.md
+++ b/doc/doc_ch/config.md
@ -32,6 +32,9 @@
 |      loss_type           |    设置 loss 类型              |       ctc         |    支持两种loss： ctc / attention |
 |       distort            |    设置是否使用数据增强          |       false       |  设置为true时，将在训练时随机进行扰动，支持的扰动操作可阅读[img_tools.py](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/ppocr/data/rec/img_tools.py)                 |
 |       use_space_char     |    设置是否识别空格             |        false      |          仅在 character_type=ch 时支持空格                 |
+|      average_window      |    ModelAverage优化器中的窗口长度计算比例 |  0.15       |       目前仅应用与SRN |
+|      max_average_window  |    平均值计算窗口长度的最大值   |   15625              | 推荐设置为一轮训练中mini-batchs的数目|
+|      min_average_window  |    平均值计算窗口长度的最小值  |    10000              |      \          |
 |      reader_yml          |    设置reader配置文件          |  ./configs/rec/rec_icdar15_reader.yml  |  \          |
 |      pretrain_weights    |    加载预训练模型路径      |  ./pretrain_models/CRNN/best_accuracy  |  \          |
 |      checkpoints         |    加载模型参数路径            |       None        |    用于中断后加载参数继续训练 |
--- a/doc/doc_ch/update.md
+++ b/doc/doc_ch/update.md
@ -1,4 +1,5 @@
 # 更新
+- 2020.8.16 开源文本检测算法[SAST](https://arxiv.org/abs/1908.05498)和文本识别算法[SRN](https://arxiv.org/abs/2003.12294)
 - 2020.7.23 发布7月21日B站直播课回放和PPT，PaddleOCR开源大礼包全面解读，[获取地址](https://aistudio.baidu.com/aistudio/course/introduce/1519)
 - 2020.7.15 添加基于EasyEdge和Paddle-Lite的移动端DEMO，支持iOS和Android系统
 - 2020.7.15 完善预测部署，添加基于C++预测引擎推理、服务化部署和端侧部署方案，以及超轻量级中文OCR模型预测耗时Benchmark
--- a/doc/doc_en/update_en.md
+++ b/doc/doc_en/update_en.md
@ -1,4 +1,5 @@
 # RECENT UPDATES
+- 2020.8.16 Release text detection algorithm [SAST](https://arxiv.org/abs/1908.05498) and text recognition algorithm [SRN](https://arxiv.org/abs/2003.12294)
 - 2020.7.23, Release the playback and PPT of live class on BiliBili station, PaddleOCR Introduction, [address](https://aistudio.baidu.com/aistudio/course/introduce/1519)
 - 2020.7.15, Add mobile App demo , support both iOS and  Android  ( based on easyedge and Paddle Lite)
 - 2020.7.15, Improve the  deployment ability, add the C + +  inference , serving deployment. In addtion, the benchmarks of the ultra-lightweight Chinese OCR model are provided.
--- a/docker/hubserving/cpu/Dockerfile
+++ b/docker/hubserving/cpu/Dockerfile
@ -0,0 +1,28 @@
+# Version: 1.0.0
+FROM hub.baidubce.com/paddlepaddle/paddle:latest-gpu-cuda9.0-cudnn7-dev
+
+# PaddleOCR base on Python3.7
+RUN pip3.7 install --upgrade pip -i https://pypi.tuna.tsinghua.edu.cn/simple
+
+RUN python3.7 -m pip install paddlepaddle==1.7.2 -i https://pypi.tuna.tsinghua.edu.cn/simple
+
+RUN pip3.7 install paddlehub --upgrade -i https://pypi.tuna.tsinghua.edu.cn/simple
+
+RUN git clone https://gitee.com/PaddlePaddle/PaddleOCR
+
+WORKDIR /PaddleOCR
+
+RUN pip3.7 install -r requirments.txt -i https://pypi.tuna.tsinghua.edu.cn/simple
+
+RUN mkdir -p /PaddleOCR/inference
+# Download orc detect model(light version). if you want to change normal version, you can change ch_det_mv3_db_infer to ch_det_r50_vd_db_infer, also remember change det_model_dir in deploy/hubserving/ocr_system/params.py）
+ADD https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db_infer.tar /PaddleOCR/inference
+RUN tar xf /PaddleOCR/inference/ch_det_mv3_db_infer.tar -C /PaddleOCR/inference
+
+# Download orc recognition model(light version). If you want to change normal version, you can change ch_rec_mv3_crnn_infer to ch_rec_r34_vd_crnn_enhance_infer, also remember change rec_model_dir in deploy/hubserving/ocr_system/params.py）
+ADD https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_infer.tar /PaddleOCR/inference
+RUN tar xf /PaddleOCR/inference/ch_rec_mv3_crnn_infer.tar -C /PaddleOCR/inference
+
+EXPOSE 8866
+
+CMD ["/bin/bash","-c","export PYTHONPATH=. && hub install deploy/hubserving/ocr_system/ && hub serving start -m ocr_system"]
--- a/docker/hubserving/gpu/Dockerfile
+++ b/docker/hubserving/gpu/Dockerfile
@ -0,0 +1,28 @@
+# Version: 1.0.0
+FROM hub.baidubce.com/paddlepaddle/paddle:latest-gpu-cuda10.0-cudnn7-dev
+
+# PaddleOCR base on Python3.7
+RUN pip3.7 install --upgrade pip -i https://pypi.tuna.tsinghua.edu.cn/simple
+
+RUN python3.7 -m pip install paddlepaddle-gpu==1.7.2.post107 -i https://pypi.tuna.tsinghua.edu.cn/simple
+
+RUN pip3.7 install paddlehub --upgrade -i https://pypi.tuna.tsinghua.edu.cn/simple
+
+RUN git clone https://gitee.com/PaddlePaddle/PaddleOCR
+
+WORKDIR /home/PaddleOCR
+
+RUN pip3.7 install -r requirments.txt -i https://pypi.tuna.tsinghua.edu.cn/simple
+
+RUN mkdir -p /PaddleOCR/inference
+# Download orc detect model(light version). if you want to change normal version, you can change ch_det_mv3_db_infer to ch_det_r50_vd_db_infer, also remember change det_model_dir in deploy/hubserving/ocr_system/params.py）
+ADD https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db_infer.tar /PaddleOCR/inference
+RUN tar xf /PaddleOCR/inference/ch_det_mv3_db_infer.tar -C /PaddleOCR/inference
+
+# Download orc recognition model(light version). If you want to change normal version, you can change ch_rec_mv3_crnn_infer to ch_rec_r34_vd_crnn_enhance_infer, also remember change rec_model_dir in deploy/hubserving/ocr_system/params.py）
+ADD https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_infer.tar /PaddleOCR/inference
+RUN tar xf /PaddleOCR/inference/ch_rec_mv3_crnn_infer.tar -C /PaddleOCR/inference
+
+EXPOSE 8866
+
+CMD ["/bin/bash","-c","export PYTHONPATH=. && hub install deploy/hubserving/ocr_system/ && hub serving start -m ocr_system"]
--- a/docker/hubserving/readme.md
+++ b/docker/hubserving/readme.md
@ -0,0 +1,55 @@
+# Docker化部署服务
+在日常项目应用中，相信大家一般都会希望能通过Docker技术，把PaddleOCR服务打包成一个镜像，以便在Docker或k8s环境里，快速发布上线使用。
+
+本文将提供一些标准化的代码来实现这样的目标。大家通过如下步骤可以把PaddleOCR项目快速发布成可调用的Restful API服务。（目前暂时先实现了基于HubServing模式的部署，后续作者计划增加PaddleServing模式的部署）
+
+## 1.实施前提准备
+
+需要先完成如下基本组件的安装：
+a. Docker环境
+b. 显卡驱动和CUDA 10.0+（GPU）
+c. NVIDIA Container Toolkit（GPU，Docker 19.03以上版本可以跳过此步）
+d. cuDNN 7.6+（GPU）
+
+## 2.制作镜像
+a.下载PaddleOCR项目代码
+```
+git clone https://github.com/PaddlePaddle/PaddleOCR.git
+```
+b.切换至Dockerfile目录（注：需要区分cpu或gpu版本，下文以cpu为例，gpu版本需要替换一下关键字即可）
+```
+cd docker/cpu
+```
+c.生成镜像
+```
+docker build -t paddleocr:cpu . 
+```
+
+## 3.启动Docker容器
+a. CPU 版本
+```
+sudo docker run -dp 8866:8866 --name paddle_ocr paddleocr:cpu
+```
+b. GPU 版本 (通过NVIDIA Container Toolkit)
+```
+sudo nvidia-docker run -dp 8866:8866 --name paddle_ocr paddleocr:gpu
+```
+c. GPU 版本 (Docker 19.03以上版本，可以直接用如下命令)
+```
+sudo docker run -dp 8866:8866 --gpus all --name paddle_ocr paddleocr:gpu
+```
+d. 检查服务运行情况（出现：Successfully installed ocr_system和Running on http://0.0.0.0:8866/等信息，表示运行成功）
+```
+docker logs -f paddle_ocr
+```
+
+## 4.测试服务
+a. 计算待识别图片的Base64编码（如果只是测试一下效果，可以通过免费的在线工具实现，如：http://tool.chinaz.com/tools/imgtobase/）
+b. 发送服务请求（可参见sample_request.txt中的值）
+```
+curl -H "Content-Type:application/json" -X POST --data "{\"images\": [\"填入图片Base64编码(需要删除'data:image/jpg;base64,'）\"]}" http://localhost:8866/predict/ocr_system
+```
+c. 返回结果（如果调用成功，会返回如下结果）
+```
+{"msg":"","results":[[{"confidence":0.8403433561325073,"text":"约定","text_region":[[345,377],[641,390],[634,540],[339,528]]},{"confidence":0.8131805658340454,"text":"最终相遇","text_region":[[356,532],[624,530],[624,596],[356,598]]}]],"status":"0"}
+```
--- a/docker/hubserving/sample_request.txt
+++ b/docker/hubserving/sample_request.txt
--- a/ppocr/data/det/dataset_traversal.py
+++ b/ppocr/data/det/dataset_traversal.py
@ -31,22 +31,27 @@ class TrainReader(object):
    def __init__(self, params):
        self.num_workers = params['num_workers']
        self.label_file_path = params['label_file_path']
+        print(self.label_file_path)
+        self.use_mul_data = False
+        if isinstance(self.label_file_path, list):
+            self.use_mul_data = True
+            self.data_ratio_list = params['data_ratio_list']
        self.batch_size = params['train_batch_size_per_card']
        assert 'process_function' in params,\
            "absence process_function in Reader"
        self.process = create_module(params['process_function'])(params)

    def __call__(self, process_id):     
-        with open(self.label_file_path, "rb") as fin:
-            label_infor_list = fin.readlines()
-        img_num = len(label_infor_list)
-        img_id_list = list(range(img_num))
-        if sys.platform == "win32" and self.num_workers != 1:
-            print("multiprocess is not fully compatible with Windows."
-                  "num_workers will be 1.")
-            self.num_workers = 1
        def sample_iter_reader():
+            with open(self.label_file_path, "rb") as fin:
+                label_infor_list = fin.readlines()
+            img_num = len(label_infor_list)
+            img_id_list = list(range(img_num))
            random.shuffle(img_id_list)
+            if sys.platform == "win32" and self.num_workers != 1:
+                print("multiprocess is not fully compatible with Windows."
+                      "num_workers will be 1.")
+                self.num_workers = 1
            for img_id in range(process_id, img_num, self.num_workers):
                label_infor = label_infor_list[img_id_list[img_id]]
                outs = self.process(label_infor)
@ -54,13 +59,64 @@ class TrainReader(object):
                    continue
                yield outs

+        def sample_iter_reader_mul():
+            batch_size = 1000
+            data_source_list = self.label_file_path
+            batch_size_list = list(map(int, [max(1.0, batch_size * x) for x in self.data_ratio_list]))
+            print(self.data_ratio_list, batch_size_list)
+
+            data_filename_list, data_size_list, fetch_record_list = [], [], []
+            for data_source in data_source_list:
+                image_files = open(data_source, "rb").readlines()
+                random.shuffle(image_files)
+                data_filename_list.append(image_files)
+                data_size_list.append(len(image_files))
+                fetch_record_list.append(0)
+
+            image_batch = []
+            # get a batch of img_fns and poly_fns
+            for i in range(0, len(batch_size_list)):
+                bs = batch_size_list[i]
+                ds = data_size_list[i]
+                image_names = data_filename_list[i]
+                fetch_record = fetch_record_list[i]
+                data_path = data_source_list[i]
+                for j in range(fetch_record, fetch_record + bs):
+                    index = j % ds
+                    image_batch.append(image_names[index])
+
+                if (fetch_record + bs) > ds:
+                    fetch_record_list[i] = 0
+                    random.shuffle(data_filename_list[i])
+                else:
+                    fetch_record_list[i] = fetch_record + bs
+
+            if sys.platform == "win32":
+                print("multiprocess is not fully compatible with Windows."
+                      "num_workers will be 1.")
+                self.num_workers = 1
+
+            for label_infor in image_batch:
+                outs = self.process(label_infor)
+                if outs is None:
+                    continue
+                yield outs
+
        def batch_iter_reader():
            batch_outs = []
-            for outs in sample_iter_reader():
-                batch_outs.append(outs)
-                if len(batch_outs) == self.batch_size:
-                    yield batch_outs
-                    batch_outs = []
+            if self.use_mul_data:
+                print("Sample date from multiple datasets!")
+                for outs in sample_iter_reader_mul():
+                    batch_outs.append(outs)
+                    if len(batch_outs) == self.batch_size:
+                        yield batch_outs
+                        batch_outs = []                
+            else:
+                for outs in sample_iter_reader():
+                    batch_outs.append(outs)
+                    if len(batch_outs) == self.batch_size:
+                        yield batch_outs
+                        batch_outs = []

        return batch_iter_reader

--- a/ppocr/data/det/sast_process.py
+++ b/ppocr/data/det/sast_process.py
--- a/ppocr/data/rec/dataset_traversal.py
+++ b/ppocr/data/rec/dataset_traversal.py
@ -26,7 +26,7 @@ from ppocr.utils.utility import initial_logger
 from ppocr.utils.utility import get_image_file_list
 logger = initial_logger()

-from .img_tools import process_image, get_img_data
+from .img_tools import process_image, process_image_srn, get_img_data


 class LMDBReader(object):
@ -43,6 +43,9 @@ class LMDBReader(object):
        self.mode = params['mode']
        self.drop_last = False
        self.use_tps = False
+        self.num_heads = None
+        if "num_heads" in params:
+            self.num_heads = params['num_heads']
        if "tps" in params:
            self.ues_tps = True
        self.use_distort = False
@ -119,12 +122,19 @@ class LMDBReader(object):
                    img = cv2.imread(single_img)
                    if img.shape[-1] == 1 or len(list(img.shape)) == 2:
                        img = cv2.cvtColor(img, cv2.COLOR_GRAY2BGR)
-                    norm_img = process_image(
-                        img=img,
-                        image_shape=self.image_shape,
-                        char_ops=self.char_ops,
-                        tps=self.use_tps,
-                        infer_mode=True)
+                    if self.loss_type == 'srn':
+                        norm_img = process_image_srn(
+                            img=img,
+                            image_shape=self.image_shape,
+                            num_heads=self.num_heads,
+                            max_text_length=self.max_text_length)
+                    else:
+                        norm_img = process_image(
+                            img=img,
+                            image_shape=self.image_shape,
+                            char_ops=self.char_ops,
+                            tps=self.use_tps,
+                            infer_mode=True)
                    yield norm_img
            else:
                lmdb_sets = self.load_hierarchical_lmdb_dataset()
@ -144,14 +154,25 @@ class LMDBReader(object):
                            if sample_info is None:
                                continue
                            img, label = sample_info
-                            outs = process_image(
-                                img=img,
-                                image_shape=self.image_shape,
-                                label=label,
-                                char_ops=self.char_ops,
-                                loss_type=self.loss_type,
-                                max_text_length=self.max_text_length,
-                                distort=self.use_distort)
+                            outs = []
+                            if self.loss_type == "srn":
+                                outs = process_image_srn(
+                                    img=img,
+                                    image_shape=self.image_shape,
+                                    num_heads=self.num_heads,
+                                    max_text_length=self.max_text_length,
+                                    label=label,
+                                    char_ops=self.char_ops,
+                                    loss_type=self.loss_type)
+
+                            else:
+                                outs = process_image(
+                                    img=img,
+                                    image_shape=self.image_shape,
+                                    label=label,
+                                    char_ops=self.char_ops,
+                                    loss_type=self.loss_type,
+                                    max_text_length=self.max_text_length)
                            if outs is None:
                                continue
                            yield outs
--- a/ppocr/data/rec/img_tools.py
+++ b/ppocr/data/rec/img_tools.py
@ -381,3 +381,84 @@ def process_image(img,
                assert False, "Unsupport loss_type %s in process_image"\
                    % loss_type
    return (norm_img)
+
+def resize_norm_img_srn(img, image_shape):
+    imgC, imgH, imgW = image_shape
+
+    img_black = np.zeros((imgH, imgW))
+    im_hei = img.shape[0]
+    im_wid = img.shape[1]
+
+    if im_wid <= im_hei * 1:
+        img_new = cv2.resize(img, (imgH * 1, imgH))
+    elif im_wid <= im_hei * 2:
+        img_new = cv2.resize(img, (imgH * 2, imgH))
+    elif im_wid <= im_hei * 3:
+        img_new = cv2.resize(img, (imgH * 3, imgH))
+    else:
+        img_new = cv2.resize(img, (imgW, imgH))
+
+    img_np = np.asarray(img_new)
+    img_np = cv2.cvtColor(img_np, cv2.COLOR_BGR2GRAY)
+    img_black[:, 0:img_np.shape[1]] = img_np
+    img_black = img_black[:, :, np.newaxis]
+
+    row, col, c = img_black.shape
+    c = 1
+
+    return np.reshape(img_black, (c, row, col)).astype(np.float32)
+
+def srn_other_inputs(image_shape,
+                     num_heads,
+                     max_text_length):
+
+    imgC, imgH, imgW = image_shape
+    feature_dim = int((imgH / 8) * (imgW / 8))
+
+    encoder_word_pos = np.array(range(0, feature_dim)).reshape((feature_dim, 1)).astype('int64')
+    gsrm_word_pos = np.array(range(0, max_text_length)).reshape((max_text_length, 1)).astype('int64')
+
+    lbl_weight = np.array([37] * max_text_length).reshape((-1,1)).astype('int64')
+
+    gsrm_attn_bias_data = np.ones((1, max_text_length, max_text_length)) 
+    gsrm_slf_attn_bias1 = np.triu(gsrm_attn_bias_data, 1).reshape([-1, 1, max_text_length, max_text_length])
+    gsrm_slf_attn_bias1 = np.tile(gsrm_slf_attn_bias1, [1, num_heads, 1, 1]) * [-1e9] 
+
+    gsrm_slf_attn_bias2 = np.tril(gsrm_attn_bias_data, -1).reshape([-1, 1, max_text_length, max_text_length])
+    gsrm_slf_attn_bias2 = np.tile(gsrm_slf_attn_bias2, [1, num_heads, 1, 1]) * [-1e9] 
+
+    encoder_word_pos = encoder_word_pos[np.newaxis, :]
+    gsrm_word_pos = gsrm_word_pos[np.newaxis, :]
+
+    return [lbl_weight, encoder_word_pos, gsrm_word_pos, gsrm_slf_attn_bias1, gsrm_slf_attn_bias2]
+
+def process_image_srn(img,
+                      image_shape,
+                      num_heads,
+                      max_text_length,
+                      label=None,
+                      char_ops=None,
+                      loss_type=None):
+    norm_img = resize_norm_img_srn(img, image_shape)
+    norm_img = norm_img[np.newaxis, :]
+    [lbl_weight, encoder_word_pos, gsrm_word_pos, gsrm_slf_attn_bias1, gsrm_slf_attn_bias2] = \
+        srn_other_inputs(image_shape, num_heads, max_text_length)
+
+    if label is not None:
+        char_num = char_ops.get_char_num()
+        text = char_ops.encode(label)
+        if len(text) == 0 or len(text) > max_text_length:
+            return None
+        else:
+            if loss_type == "srn":
+                text_padded = [37] * max_text_length
+                for i in range(len(text)):
+                    text_padded[i] = text[i]
+                    lbl_weight[i] = [1.0]
+                text_padded = np.array(text_padded)
+                text = text_padded.reshape(-1, 1)
+                return (norm_img, text,encoder_word_pos, gsrm_word_pos, gsrm_slf_attn_bias1, gsrm_slf_attn_bias2,lbl_weight)
+            else:
+                assert False, "Unsupport loss_type %s in process_image"\
+                    % loss_type
+    return (norm_img, encoder_word_pos, gsrm_word_pos, gsrm_slf_attn_bias1, gsrm_slf_attn_bias2)
--- a/ppocr/modeling/architectures/det_model.py
+++ b/ppocr/modeling/architectures/det_model.py
@ -97,6 +97,23 @@ class DetModel(object):
                    'shrink_mask':shrink_mask,\
                    'threshold_map':threshold_map,\
                    'threshold_mask':threshold_mask}
+            elif self.algorithm == "SAST":
+                input_score = fluid.layers.data(
+                    name='score', shape=[1, 128, 128], dtype='float32')
+                input_border = fluid.layers.data(
+                    name='border', shape=[5, 128, 128], dtype='float32')
+                input_mask = fluid.layers.data(
+                    name='mask', shape=[1, 128, 128], dtype='float32')
+                input_tvo = fluid.layers.data(
+                    name='tvo', shape=[9, 128, 128], dtype='float32')
+                input_tco = fluid.layers.data(
+                    name='tco', shape=[3, 128, 128], dtype='float32')
+                feed_list = [image, input_score, input_border, input_mask, input_tvo, input_tco]
+                labels = {'input_score': input_score,\
+                    'input_border': input_border,\
+                    'input_mask': input_mask,\
+                    'input_tvo': input_tvo,\
+                    'input_tco': input_tco}
            loader = fluid.io.DataLoader.from_generator(
                feed_list=feed_list,
                capacity=64,
--- a/ppocr/modeling/architectures/rec_model.py
+++ b/ppocr/modeling/architectures/rec_model.py
@ -58,6 +58,10 @@ class RecModel(object):
        self.loss_type = global_params['loss_type']
        self.image_shape = global_params['image_shape']
        self.max_text_length = global_params['max_text_length']
+        if "num_heads" in global_params:
+            self.num_heads = global_params["num_heads"]
+        else:
+            self.num_heads = None

    def create_feed(self, mode):
        image_shape = deepcopy(self.image_shape)
@ -77,6 +81,48 @@ class RecModel(object):
                    lod_level=1)
                feed_list = [image, label_in, label_out]
                labels = {'label_in': label_in, 'label_out': label_out}
+            elif self.loss_type == "srn":
+                encoder_word_pos = fluid.data(
+                    name="encoder_word_pos",
+                    shape=[
+                        -1, int((image_shape[-2] / 8) * (image_shape[-1] / 8)),
+                        1
+                    ],
+                    dtype="int64")
+                gsrm_word_pos = fluid.data(
+                    name="gsrm_word_pos",
+                    shape=[-1, self.max_text_length, 1],
+                    dtype="int64")
+                gsrm_slf_attn_bias1 = fluid.data(
+                    name="gsrm_slf_attn_bias1",
+                    shape=[
+                        -1, self.num_heads, self.max_text_length,
+                        self.max_text_length
+                    ],
+                    dtype="float32")
+                gsrm_slf_attn_bias2 = fluid.data(
+                    name="gsrm_slf_attn_bias2",
+                    shape=[
+                        -1, self.num_heads, self.max_text_length,
+                        self.max_text_length
+                    ],
+                    dtype="float32")
+                lbl_weight = fluid.layers.data(
+                    name="lbl_weight", shape=[-1, 1], dtype='int64')
+                label = fluid.data(
+                    name='label', shape=[-1, 1], dtype='int32', lod_level=1)
+                feed_list = [
+                    image, label, encoder_word_pos, gsrm_word_pos,
+                    gsrm_slf_attn_bias1, gsrm_slf_attn_bias2, lbl_weight
+                ]
+                labels = {
+                    'label': label,
+                    'encoder_word_pos': encoder_word_pos,
+                    'gsrm_word_pos': gsrm_word_pos,
+                    'gsrm_slf_attn_bias1': gsrm_slf_attn_bias1,
+                    'gsrm_slf_attn_bias2': gsrm_slf_attn_bias2,
+                    'lbl_weight': lbl_weight
+                }
            else:
                label = fluid.data(
                    name='label', shape=[None, 1], dtype='int32', lod_level=1)
@ -88,6 +134,8 @@ class RecModel(object):
                use_double_buffer=True,
                iterable=False)
        else:
+            labels = None
+            loader = None
            if self.char_type == "ch" and self.infer_img:
                image_shape[-1] = -1
                if self.tps != None:
@ -98,8 +146,42 @@ class RecModel(object):
                    )
                    image_shape = deepcopy(self.image_shape)
            image = fluid.data(name='image', shape=image_shape, dtype='float32')
-            labels = None
-            loader = None
+            if self.loss_type == "srn":
+                encoder_word_pos = fluid.data(
+                    name="encoder_word_pos",
+                    shape=[
+                        -1, int((image_shape[-2] / 8) * (image_shape[-1] / 8)),
+                        1
+                    ],
+                    dtype="int64")
+                gsrm_word_pos = fluid.data(
+                    name="gsrm_word_pos",
+                    shape=[-1, self.max_text_length, 1],
+                    dtype="int64")
+                gsrm_slf_attn_bias1 = fluid.data(
+                    name="gsrm_slf_attn_bias1",
+                    shape=[
+                        -1, self.num_heads, self.max_text_length,
+                        self.max_text_length
+                    ],
+                    dtype="float32")
+                gsrm_slf_attn_bias2 = fluid.data(
+                    name="gsrm_slf_attn_bias2",
+                    shape=[
+                        -1, self.num_heads, self.max_text_length,
+                        self.max_text_length
+                    ],
+                    dtype="float32")
+                feed_list = [
+                    image, encoder_word_pos, gsrm_word_pos, gsrm_slf_attn_bias1,
+                    gsrm_slf_attn_bias2
+                ]
+                labels = {
+                    'encoder_word_pos': encoder_word_pos,
+                    'gsrm_word_pos': gsrm_word_pos,
+                    'gsrm_slf_attn_bias1': gsrm_slf_attn_bias1,
+                    'gsrm_slf_attn_bias2': gsrm_slf_attn_bias2
+                }
        return image, labels, loader

    def __call__(self, mode):
@ -117,13 +199,27 @@ class RecModel(object):
                label = labels['label_out']
            else:
                label = labels['label']
-            outputs = {'total_loss':loss, 'decoded_out':\
-                decoded_out, 'label':label}
+            if self.loss_type == 'srn':
+                total_loss, img_loss, word_loss = self.loss(predicts, labels)
+                outputs = {
+                    'total_loss': total_loss,
+                    'img_loss': img_loss,
+                    'word_loss': word_loss,
+                    'decoded_out': decoded_out,
+                    'label': label
+                }
+            else:
+                outputs = {'total_loss':loss, 'decoded_out':\
+                    decoded_out, 'label':label}
            return loader, outputs
+
        elif mode == "export":
            predict = predicts['predict']
            if self.loss_type == "ctc":
                predict = fluid.layers.softmax(predict)
+            if self.loss_type == "srn":
+                raise Exception(
+                    "Warning! SRN does not support export model currently")
            return [image, {'decoded_out': decoded_out, 'predicts': predict}]
        else:
            predict = predicts['predict']
--- a/ppocr/modeling/backbones/det_resnet_vd_sast.py
+++ b/ppocr/modeling/backbones/det_resnet_vd_sast.py
--- a/ppocr/modeling/backbones/rec_resnet50_fpn.py
+++ b/ppocr/modeling/backbones/rec_resnet50_fpn.py
@ -0,0 +1,172 @@
+#copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
+#
+#Licensed under the Apache License, Version 2.0 (the "License");
+#you may not use this file except in compliance with the License.
+#You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+#Unless required by applicable law or agreed to in writing, software
+#distributed under the License is distributed on an "AS IS" BASIS,
+#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#See the License for the specific language governing permissions and
+#limitations under the License.
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+import math
+
+import paddle
+import paddle.fluid as fluid
+from paddle.fluid.param_attr import ParamAttr
+
+
+__all__ = ["ResNet", "ResNet18", "ResNet34", "ResNet50", "ResNet101", "ResNet152"]
+
+Trainable = True
+w_nolr = fluid.ParamAttr(
+        trainable = Trainable)
+train_parameters = {
+    "input_size": [3, 224, 224],
+    "input_mean": [0.485, 0.456, 0.406],
+    "input_std": [0.229, 0.224, 0.225],
+    "learning_strategy": {
+        "name": "piecewise_decay",
+        "batch_size": 256,
+        "epochs": [30, 60, 90],
+        "steps": [0.1, 0.01, 0.001, 0.0001]
+    }
+}
+
+class ResNet():
+    def __init__(self, params):
+        self.layers = params['layers']
+        self.params = train_parameters
+
+
+    def __call__(self, input):
+        layers = self.layers
+        supported_layers = [18, 34, 50, 101, 152]
+        assert layers in supported_layers, \
+            "supported layers are {} but input layer is {}".format(supported_layers, layers)
+
+        if layers == 18:
+            depth = [2, 2, 2, 2]
+        elif layers == 34 or layers == 50:
+            depth = [3, 4, 6, 3]
+        elif layers == 101:
+            depth = [3, 4, 23, 3]
+        elif layers == 152:
+            depth = [3, 8, 36, 3]
+        stride_list = [(2,2),(2,2),(1,1),(1,1)]
+        num_filters = [64, 128, 256, 512]
+
+        conv = self.conv_bn_layer(
+            input=input, num_filters=64, filter_size=7, stride=2, act='relu', name="conv1")
+        F = [] 
+        if layers >= 50:
+            for block in range(len(depth)):
+                for i in range(depth[block]):
+                    if layers in [101, 152] and block == 2:
+                        if i == 0:
+                            conv_name = "res" + str(block + 2) + "a"
+                        else:
+                            conv_name = "res" + str(block + 2) + "b" + str(i)
+                    else:
+                        conv_name = "res" + str(block + 2) + chr(97 + i)
+                    conv = self.bottleneck_block(
+                        input=conv,
+                        num_filters=num_filters[block],
+                        stride=stride_list[block]  if i == 0 else 1, name=conv_name)
+                F.append(conv)
+
+        base = F[-1]
+        for i in [-2, -3]:  
+            b, c, w, h = F[i].shape
+            if (w,h) == base.shape[2:]:
+                base = base
+            else:
+                base = fluid.layers.conv2d_transpose( input=base, num_filters=c,filter_size=4, stride=2,
+                    padding=1,act=None,
+                    param_attr=w_nolr,
+                    bias_attr=w_nolr)
+                base = fluid.layers.batch_norm(base, act = "relu", param_attr=w_nolr, bias_attr=w_nolr)
+            base = fluid.layers.concat([base, F[i]], axis=1)
+            base = fluid.layers.conv2d(base, num_filters=c, filter_size=1, param_attr=w_nolr, bias_attr=w_nolr)
+            base = fluid.layers.conv2d(base, num_filters=c, filter_size=3,padding = 1, param_attr=w_nolr, bias_attr=w_nolr)
+            base = fluid.layers.batch_norm(base, act = "relu", param_attr=w_nolr, bias_attr=w_nolr)
+
+        base = fluid.layers.conv2d(base, num_filters=512, filter_size=1,bias_attr=w_nolr,param_attr=w_nolr)
+
+        return base
+
+    def conv_bn_layer(self,
+                      input,
+                      num_filters,
+                      filter_size,
+                      stride=1,
+                      groups=1,
+                      act=None,
+                      name=None):
+        conv = fluid.layers.conv2d(
+            input=input,
+            num_filters=num_filters,
+            filter_size= 2  if stride==(1,1)  else filter_size,
+            dilation = 2 if stride==(1,1) else 1,
+            stride=stride,
+            padding=(filter_size - 1) // 2,
+            groups=groups,
+            act=None,
+            param_attr=ParamAttr(name=name + "_weights",trainable = Trainable),
+            bias_attr=False,
+            name=name + '.conv2d.output.1')
+
+        if name == "conv1":
+            bn_name = "bn_" + name
+        else:
+            bn_name = "bn" + name[3:]
+        return fluid.layers.batch_norm(input=conv,
+                                       act=act,
+                                       name=bn_name + '.output.1',
+                                       param_attr=ParamAttr(name=bn_name + '_scale',trainable = Trainable),
+                                       bias_attr=ParamAttr(bn_name + '_offset',trainable = Trainable),
+                                       moving_mean_name=bn_name + '_mean',
+                                       moving_variance_name=bn_name + '_variance', )
+
+    def shortcut(self, input, ch_out, stride, is_first, name):
+        ch_in = input.shape[1]
+        if ch_in != ch_out or stride != 1 or is_first == True:
+            if stride == (1,1):
+                return self.conv_bn_layer(input, ch_out, 1, 1, name=name)
+            else: #stride == (2,2)
+                return self.conv_bn_layer(input, ch_out, 1, stride, name=name)
+                
+        else:
+            return input
+
+    def bottleneck_block(self, input, num_filters, stride, name):
+        conv0 = self.conv_bn_layer(
+            input=input, num_filters=num_filters, filter_size=1, act='relu', name=name + "_branch2a")
+        conv1 = self.conv_bn_layer(
+            input=conv0,
+            num_filters=num_filters,
+            filter_size=3,
+            stride=stride,
+            act='relu',
+            name=name + "_branch2b")
+        conv2 = self.conv_bn_layer(
+            input=conv1, num_filters=num_filters * 4, filter_size=1, act=None, name=name + "_branch2c")
+
+        short = self.shortcut(input, num_filters * 4, stride, is_first=False, name=name + "_branch1")
+
+        return fluid.layers.elementwise_add(x=short, y=conv2, act='relu', name=name + ".add.output.5")
+
+    def basic_block(self, input, num_filters, stride, is_first, name):
+        conv0 = self.conv_bn_layer(input=input, num_filters=num_filters, filter_size=3, act='relu', stride=stride,
+                                   name=name + "_branch2a")
+        conv1 = self.conv_bn_layer(input=conv0, num_filters=num_filters, filter_size=3, act=None,
+                                   name=name + "_branch2b")
+        short = self.shortcut(input, num_filters, stride, is_first, name=name + "_branch1")
+        return fluid.layers.elementwise_add(x=short, y=conv1, act='relu')
--- a/ppocr/modeling/heads/det_sast_head.py
+++ b/ppocr/modeling/heads/det_sast_head.py
@ -0,0 +1,228 @@
+#copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
+#
+#Licensed under the Apache License, Version 2.0 (the "License");
+#you may not use this file except in compliance with the License.
+#You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+#Unless required by applicable law or agreed to in writing, software
+#distributed under the License is distributed on an "AS IS" BASIS,
+#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#See the License for the specific language governing permissions and
+#limitations under the License.
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+import paddle.fluid as fluid
+from ..common_functions import conv_bn_layer, deconv_bn_layer
+from collections import OrderedDict
+
+
+class SASTHead(object):
+    """
+    SAST: 
+        see arxiv: https://arxiv.org/abs/1908.05498
+    args:
+        params(dict): the super parameters for network build
+    """
+
+    def __init__(self, params):
+        self.model_name = params['model_name']
+        self.with_cab = params['with_cab']
+
+    def FPN_Up_Fusion(self, blocks):
+        """
+        blocks{}: contain block_2, block_3, block_4, block_5, block_6, block_7 with
+                1/4, 1/8, 1/16, 1/32, 1/64, 1/128 resolution.
+        """
+        f = [blocks['block_6'], blocks['block_5'], blocks['block_4'], blocks['block_3'], blocks['block_2']]
+        num_outputs = [256, 256, 192, 192, 128]
+        g = [None, None, None, None, None]
+        h = [None, None, None, None, None] 
+        for i in range(5):
+            h[i] = conv_bn_layer(input=f[i], num_filters=num_outputs[i],
+                                filter_size=1, stride=1, act=None, name='fpn_up_h'+str(i))
+
+        for i in range(4):
+            if i == 0:
+                g[i] = deconv_bn_layer(input=h[i], num_filters=num_outputs[i + 1], act=None, name='fpn_up_g0')
+                print("g[{}] shape: {}".format(i, g[i].shape))
+            else:
+                g[i] = fluid.layers.elementwise_add(x=g[i - 1], y=h[i])
+                g[i] = fluid.layers.relu(g[i])
+                #g[i] = conv_bn_layer(input=g[i], num_filters=num_outputs[i],
+                #                    filter_size=1, stride=1, act='relu')
+                g[i] = conv_bn_layer(input=g[i], num_filters=num_outputs[i],
+                                    filter_size=3, stride=1, act='relu', name='fpn_up_g%d_1'%i)
+                g[i] = deconv_bn_layer(input=g[i], num_filters=num_outputs[i + 1], act=None, name='fpn_up_g%d_2'%i)
+                print("g[{}] shape: {}".format(i, g[i].shape))
+
+        g[4] = fluid.layers.elementwise_add(x=g[3], y=h[4])
+        g[4] = fluid.layers.relu(g[4])
+        g[4] = conv_bn_layer(input=g[4], num_filters=num_outputs[4],
+                            filter_size=3, stride=1, act='relu', name='fpn_up_fusion_1')
+        g[4] = conv_bn_layer(input=g[4], num_filters=num_outputs[4],
+                            filter_size=1, stride=1, act=None, name='fpn_up_fusion_2')
+        
+        return g[4]
+
+    def FPN_Down_Fusion(self, blocks):
+        """
+        blocks{}: contain block_2, block_3, block_4, block_5, block_6, block_7 with
+                1/4, 1/8, 1/16, 1/32, 1/64, 1/128 resolution.
+        """
+        f = [blocks['block_0'], blocks['block_1'], blocks['block_2']]
+        num_outputs = [32, 64, 128]
+        g = [None, None, None]
+        h = [None, None, None] 
+        for i in range(3):
+            h[i] = conv_bn_layer(input=f[i], num_filters=num_outputs[i],
+                                filter_size=3, stride=1, act=None, name='fpn_down_h'+str(i))
+        for i in range(2):
+            if i == 0:
+                g[i] = conv_bn_layer(input=h[i], num_filters=num_outputs[i+1], filter_size=3, stride=2, act=None, name='fpn_down_g0')
+            else:
+                g[i] = fluid.layers.elementwise_add(x=g[i - 1], y=h[i])
+                g[i] = fluid.layers.relu(g[i])
+                g[i] = conv_bn_layer(input=g[i], num_filters=num_outputs[i], filter_size=3, stride=1, act='relu', name='fpn_down_g%d_1'%i)
+                g[i] = conv_bn_layer(input=g[i], num_filters=num_outputs[i+1], filter_size=3, stride=2, act=None, name='fpn_down_g%d_2'%i)
+            # print("g[{}] shape: {}".format(i, g[i].shape)) 
+        g[2] = fluid.layers.elementwise_add(x=g[1], y=h[2])
+        g[2] = fluid.layers.relu(g[2])
+        g[2] = conv_bn_layer(input=g[2], num_filters=num_outputs[2],
+                            filter_size=3, stride=1, act='relu', name='fpn_down_fusion_1')
+        g[2] = conv_bn_layer(input=g[2], num_filters=num_outputs[2],
+                            filter_size=1, stride=1, act=None, name='fpn_down_fusion_2')
+        return g[2]
+
+    def SAST_Header1(self, f_common):
+        """Detector header."""
+        #f_score
+        f_score = conv_bn_layer(input=f_common, num_filters=64, filter_size=1, stride=1, act='relu', name='f_score1')
+        f_score = conv_bn_layer(input=f_score, num_filters=64, filter_size=3, stride=1, act='relu', name='f_score2')
+        f_score = conv_bn_layer(input=f_score, num_filters=128, filter_size=1, stride=1, act='relu', name='f_score3')
+        f_score = conv_bn_layer(input=f_score, num_filters=1, filter_size=3, stride=1, name='f_score4')
+        f_score = fluid.layers.sigmoid(f_score)
+        # print("f_score shape: {}".format(f_score.shape))
+
+        #f_boder
+        f_border = conv_bn_layer(input=f_common, num_filters=64, filter_size=1, stride=1, act='relu', name='f_border1')
+        f_border = conv_bn_layer(input=f_border, num_filters=64, filter_size=3, stride=1, act='relu', name='f_border2')
+        f_border = conv_bn_layer(input=f_border, num_filters=128, filter_size=1, stride=1, act='relu', name='f_border3')
+        f_border = conv_bn_layer(input=f_border, num_filters=4, filter_size=3, stride=1, name='f_border4')
+        # print("f_border shape: {}".format(f_border.shape))
+        
+        return f_score, f_border
+
+    def SAST_Header2(self, f_common):
+        """Detector header.""" 
+        #f_tvo
+        f_tvo = conv_bn_layer(input=f_common, num_filters=64, filter_size=1, stride=1, act='relu', name='f_tvo1')
+        f_tvo = conv_bn_layer(input=f_tvo, num_filters=64, filter_size=3, stride=1, act='relu', name='f_tvo2')
+        f_tvo = conv_bn_layer(input=f_tvo, num_filters=128, filter_size=1, stride=1, act='relu', name='f_tvo3')
+        f_tvo = conv_bn_layer(input=f_tvo, num_filters=8, filter_size=3, stride=1, name='f_tvo4')
+        # print("f_tvo shape: {}".format(f_tvo.shape))
+
+        #f_tco
+        f_tco = conv_bn_layer(input=f_common, num_filters=64, filter_size=1, stride=1, act='relu', name='f_tco1')
+        f_tco = conv_bn_layer(input=f_tco, num_filters=64, filter_size=3, stride=1, act='relu', name='f_tco2')
+        f_tco = conv_bn_layer(input=f_tco, num_filters=128, filter_size=1, stride=1, act='relu', name='f_tco3')
+        f_tco = conv_bn_layer(input=f_tco, num_filters=2, filter_size=3, stride=1, name='f_tco4')
+        # print("f_tco shape: {}".format(f_tco.shape))
+        
+        return f_tvo, f_tco
+
+    def cross_attention(self, f_common):
+        """
+        """
+        f_shape = fluid.layers.shape(f_common)
+        f_theta = conv_bn_layer(input=f_common, num_filters=128, filter_size=1, stride=1, act='relu', name='f_theta')
+        f_phi = conv_bn_layer(input=f_common, num_filters=128, filter_size=1, stride=1, act='relu', name='f_phi')
+        f_g = conv_bn_layer(input=f_common, num_filters=128, filter_size=1, stride=1, act='relu', name='f_g')
+        ### horizon
+        fh_theta = f_theta
+        fh_phi = f_phi
+        fh_g = f_g
+        #flatten
+        fh_theta = fluid.layers.transpose(fh_theta, [0, 2, 3, 1])
+        fh_theta = fluid.layers.reshape(fh_theta, [f_shape[0] * f_shape[2], f_shape[3], 128])
+        fh_phi = fluid.layers.transpose(fh_phi, [0, 2, 3, 1])
+        fh_phi = fluid.layers.reshape(fh_phi, [f_shape[0] * f_shape[2], f_shape[3], 128])
+        fh_g = fluid.layers.transpose(fh_g, [0, 2, 3, 1])
+        fh_g = fluid.layers.reshape(fh_g, [f_shape[0] * f_shape[2], f_shape[3], 128])
+        #correlation
+        fh_attn = fluid.layers.matmul(fh_theta, fluid.layers.transpose(fh_phi, [0, 2, 1]))
+        #scale
+        fh_attn = fh_attn / (128 ** 0.5)
+        fh_attn = fluid.layers.softmax(fh_attn)
+        #weighted sum
+        fh_weight = fluid.layers.matmul(fh_attn, fh_g)
+        fh_weight = fluid.layers.reshape(fh_weight, [f_shape[0], f_shape[2], f_shape[3], 128])
+        # print("fh_weight: {}".format(fh_weight.shape))
+        fh_weight = fluid.layers.transpose(fh_weight, [0, 3, 1, 2])
+        fh_weight = conv_bn_layer(input=fh_weight, num_filters=128, filter_size=1, stride=1, name='fh_weight')
+        #short cut
+        fh_sc = conv_bn_layer(input=f_common, num_filters=128, filter_size=1, stride=1, name='fh_sc')
+        f_h = fluid.layers.relu(fh_weight + fh_sc)
+        ######
+        #vertical
+        fv_theta = fluid.layers.transpose(f_theta, [0, 1, 3, 2])
+        fv_phi = fluid.layers.transpose(f_phi, [0, 1, 3, 2])
+        fv_g = fluid.layers.transpose(f_g, [0, 1, 3, 2])
+        #flatten
+        fv_theta = fluid.layers.transpose(fv_theta, [0, 2, 3, 1])
+        fv_theta = fluid.layers.reshape(fv_theta, [f_shape[0] * f_shape[3], f_shape[2], 128])
+        fv_phi = fluid.layers.transpose(fv_phi, [0, 2, 3, 1])
+        fv_phi = fluid.layers.reshape(fv_phi, [f_shape[0] * f_shape[3], f_shape[2], 128])
+        fv_g = fluid.layers.transpose(fv_g, [0, 2, 3, 1])
+        fv_g = fluid.layers.reshape(fv_g, [f_shape[0] * f_shape[3], f_shape[2], 128])
+        #correlation
+        fv_attn = fluid.layers.matmul(fv_theta, fluid.layers.transpose(fv_phi, [0, 2, 1]))
+        #scale
+        fv_attn = fv_attn / (128 ** 0.5)
+        fv_attn = fluid.layers.softmax(fv_attn)
+        #weighted sum
+        fv_weight = fluid.layers.matmul(fv_attn, fv_g)
+        fv_weight = fluid.layers.reshape(fv_weight, [f_shape[0], f_shape[3], f_shape[2], 128])
+        # print("fv_weight: {}".format(fv_weight.shape))
+        fv_weight = fluid.layers.transpose(fv_weight, [0, 3, 2, 1])
+        fv_weight = conv_bn_layer(input=fv_weight, num_filters=128, filter_size=1, stride=1, name='fv_weight')
+        #short cut
+        fv_sc = conv_bn_layer(input=f_common, num_filters=128, filter_size=1, stride=1, name='fv_sc')
+        f_v = fluid.layers.relu(fv_weight + fv_sc)
+        ######
+        f_attn = fluid.layers.concat([f_h, f_v], axis=1)
+        f_attn = conv_bn_layer(input=f_attn, num_filters=128, filter_size=1, stride=1, act='relu', name='f_attn')  
+        return f_attn
+        
+    def __call__(self, blocks, with_cab=False):
+        # for k, v in blocks.items():
+        #     print(k, v.shape)
+
+        #down fpn
+        f_down = self.FPN_Down_Fusion(blocks)
+        # print("f_down shape: {}".format(f_down.shape))
+        #up fpn
+        f_up = self.FPN_Up_Fusion(blocks)
+        # print("f_up shape: {}".format(f_up.shape))
+        #fusion
+        f_common = fluid.layers.elementwise_add(x=f_down, y=f_up)
+        f_common = fluid.layers.relu(f_common)
+        # print("f_common: {}".format(f_common.shape))
+        
+        if self.with_cab:
+            # print('enhence f_common with CAB.')
+            f_common = self.cross_attention(f_common)
+            
+        f_score, f_border= self.SAST_Header1(f_common)
+        f_tvo, f_tco = self.SAST_Header2(f_common)
+
+        predicts = OrderedDict()
+        predicts['f_score'] = f_score
+        predicts['f_border'] = f_border
+        predicts['f_tvo'] = f_tvo
+        predicts['f_tco'] = f_tco
+        return predicts
--- a/ppocr/modeling/heads/rec_srn_all_head.py
+++ b/ppocr/modeling/heads/rec_srn_all_head.py
@ -0,0 +1,230 @@
+#copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
+#
+#Licensed under the Apache License, Version 2.0 (the "License");
+#you may not use this file except in compliance with the License.
+#You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+#Unless required by applicable law or agreed to in writing, software
+#distributed under the License is distributed on an "AS IS" BASIS,
+#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#See the License for the specific language governing permissions and
+#limitations under the License.
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+import math
+
+import paddle
+import paddle.fluid as fluid
+from paddle.fluid.param_attr import ParamAttr
+import numpy as np
+from .self_attention.model import wrap_encoder
+from .self_attention.model import wrap_encoder_forFeature
+gradient_clip = 10
+
+
+class SRNPredict(object):
+    def __init__(self, params):
+        super(SRNPredict, self).__init__()
+        self.char_num = params['char_num']
+        self.max_length = params['max_text_length']
+
+        self.num_heads = params['num_heads']
+        self.num_encoder_TUs = params['num_encoder_TUs']
+        self.num_decoder_TUs = params['num_decoder_TUs']
+        self.hidden_dims = params['hidden_dims']
+
+    def pvam(self, inputs, others):
+
+        b, c, h, w = inputs.shape
+        conv_features = fluid.layers.reshape(x=inputs, shape=[-1, c, h * w])
+        conv_features = fluid.layers.transpose(x=conv_features, perm=[0, 2, 1])
+
+        #===== Transformer encoder =====
+        b, t, c = conv_features.shape
+        encoder_word_pos = others["encoder_word_pos"]
+        gsrm_word_pos = others["gsrm_word_pos"]
+
+        enc_inputs = [conv_features, encoder_word_pos, None]
+        word_features = wrap_encoder_forFeature(
+            src_vocab_size=-1,
+            max_length=t,
+            n_layer=self.num_encoder_TUs,
+            n_head=self.num_heads,
+            d_key=int(self.hidden_dims / self.num_heads),
+            d_value=int(self.hidden_dims / self.num_heads),
+            d_model=self.hidden_dims,
+            d_inner_hid=self.hidden_dims,
+            prepostprocess_dropout=0.1,
+            attention_dropout=0.1,
+            relu_dropout=0.1,
+            preprocess_cmd="n",
+            postprocess_cmd="da",
+            weight_sharing=True,
+            enc_inputs=enc_inputs, )
+        fluid.clip.set_gradient_clip(
+            fluid.clip.GradientClipByValue(gradient_clip))
+
+        #===== Parallel Visual Attention Module =====
+        b, t, c = word_features.shape
+
+        word_features = fluid.layers.fc(word_features, c, num_flatten_dims=2)
+        word_features_ = fluid.layers.reshape(word_features, [-1, 1, t, c])
+        word_features_ = fluid.layers.expand(word_features_,
+                                             [1, self.max_length, 1, 1])
+        word_pos_feature = fluid.layers.embedding(gsrm_word_pos,
+                                                  [self.max_length, c])
+        word_pos_ = fluid.layers.reshape(word_pos_feature,
+                                         [-1, self.max_length, 1, c])
+        word_pos_ = fluid.layers.expand(word_pos_, [1, 1, t, 1])
+        temp = fluid.layers.elementwise_add(
+            word_features_, word_pos_, act='tanh')
+
+        attention_weight = fluid.layers.fc(input=temp,
+                                           size=1,
+                                           num_flatten_dims=3,
+                                           bias_attr=False)
+        attention_weight = fluid.layers.reshape(
+            x=attention_weight, shape=[-1, self.max_length, t])
+        attention_weight = fluid.layers.softmax(input=attention_weight, axis=-1)
+
+        pvam_features = fluid.layers.matmul(attention_weight,
+                                            word_features)  #[b, max_length, c]
+
+        return pvam_features
+
+    def gsrm(self, pvam_features, others):
+
+        #===== GSRM Visual-to-semantic embedding block =====
+        b, t, c = pvam_features.shape
+        word_out = fluid.layers.fc(
+            input=fluid.layers.reshape(pvam_features, [-1, c]),
+            size=self.char_num,
+            act="softmax")
+        #word_out.stop_gradient = True
+        word_ids = fluid.layers.argmax(word_out, axis=1)
+        word_ids.stop_gradient = True
+        word_ids = fluid.layers.reshape(x=word_ids, shape=[-1, t, 1])
+
+        #===== GSRM Semantic reasoning block =====
+        """
+        This module is achieved through bi-transformers,
+        ngram_feature1 is the froward one, ngram_fetaure2 is the backward one
+        """
+        pad_idx = self.char_num
+        gsrm_word_pos = others["gsrm_word_pos"]
+        gsrm_slf_attn_bias1 = others["gsrm_slf_attn_bias1"]
+        gsrm_slf_attn_bias2 = others["gsrm_slf_attn_bias2"]
+
+        def prepare_bi(word_ids):
+            """
+            prepare bi for gsrm
+            word1 for forward; word2 for backward
+            """
+            word1 = fluid.layers.cast(word_ids, "float32")
+            word1 = fluid.layers.pad(word1, [0, 0, 1, 0, 0, 0],
+                                     pad_value=1.0 * pad_idx)
+            word1 = fluid.layers.cast(word1, "int64")
+            word1 = word1[:, :-1, :]
+            word2 = word_ids
+            return word1, word2
+
+        word1, word2 = prepare_bi(word_ids)
+        word1.stop_gradient = True
+        word2.stop_gradient = True
+        enc_inputs_1 = [word1, gsrm_word_pos, gsrm_slf_attn_bias1]
+        enc_inputs_2 = [word2, gsrm_word_pos, gsrm_slf_attn_bias2]
+
+        gsrm_feature1 = wrap_encoder(
+            src_vocab_size=self.char_num + 1,
+            max_length=self.max_length,
+            n_layer=self.num_decoder_TUs,
+            n_head=self.num_heads,
+            d_key=int(self.hidden_dims / self.num_heads),
+            d_value=int(self.hidden_dims / self.num_heads),
+            d_model=self.hidden_dims,
+            d_inner_hid=self.hidden_dims,
+            prepostprocess_dropout=0.1,
+            attention_dropout=0.1,
+            relu_dropout=0.1,
+            preprocess_cmd="n",
+            postprocess_cmd="da",
+            weight_sharing=True,
+            enc_inputs=enc_inputs_1, )
+        gsrm_feature2 = wrap_encoder(
+            src_vocab_size=self.char_num + 1,
+            max_length=self.max_length,
+            n_layer=self.num_decoder_TUs,
+            n_head=self.num_heads,
+            d_key=int(self.hidden_dims / self.num_heads),
+            d_value=int(self.hidden_dims / self.num_heads),
+            d_model=self.hidden_dims,
+            d_inner_hid=self.hidden_dims,
+            prepostprocess_dropout=0.1,
+            attention_dropout=0.1,
+            relu_dropout=0.1,
+            preprocess_cmd="n",
+            postprocess_cmd="da",
+            weight_sharing=True,
+            enc_inputs=enc_inputs_2, )
+        gsrm_feature2 = fluid.layers.pad(gsrm_feature2, [0, 0, 0, 1, 0, 0],
+                                         pad_value=0.)
+        gsrm_feature2 = gsrm_feature2[:, 1:, ]
+        gsrm_features = gsrm_feature1 + gsrm_feature2
+
+        b, t, c = gsrm_features.shape
+
+        gsrm_out = fluid.layers.matmul(
+            x=gsrm_features,
+            y=fluid.default_main_program().global_block().var(
+                "src_word_emb_table"),
+            transpose_y=True)
+        b, t, c = gsrm_out.shape
+        gsrm_out = fluid.layers.softmax(input=fluid.layers.reshape(gsrm_out,
+                                                                   [-1, c]))
+
+        return gsrm_features, word_out, gsrm_out
+
+    def vsfd(self, pvam_features, gsrm_features):
+
+        #===== Visual-Semantic Fusion Decoder Module =====
+        b, t, c1 = pvam_features.shape
+        b, t, c2 = gsrm_features.shape
+        combine_features_ = fluid.layers.concat(
+            [pvam_features, gsrm_features], axis=2)
+        img_comb_features_ = fluid.layers.reshape(
+            x=combine_features_, shape=[-1, c1 + c2])
+        img_comb_features_map = fluid.layers.fc(input=img_comb_features_,
+                                                size=c1,
+                                                act="sigmoid")
+        img_comb_features_map = fluid.layers.reshape(
+            x=img_comb_features_map, shape=[-1, t, c1])
+        combine_features = img_comb_features_map * pvam_features + (
+            1.0 - img_comb_features_map) * gsrm_features
+        img_comb_features = fluid.layers.reshape(
+            x=combine_features, shape=[-1, c1])
+
+        fc_out = fluid.layers.fc(input=img_comb_features,
+                                 size=self.char_num,
+                                 act="softmax")
+        return fc_out
+
+    def __call__(self, inputs, others, mode=None):
+
+        pvam_features = self.pvam(inputs, others)
+        gsrm_features, word_out, gsrm_out = self.gsrm(pvam_features, others)
+        final_out = self.vsfd(pvam_features, gsrm_features)
+
+        _, decoded_out = fluid.layers.topk(input=final_out, k=1)
+        predicts = {
+            'predict': final_out,
+            'decoded_out': decoded_out,
+            'word_out': word_out,
+            'gsrm_out': gsrm_out
+        }
+
+        return predicts
--- a/Show More
+++ b/Show More