# Contents - [Contents](#contents) - [CRNN-Seq2Seq-OCR Description](#crnn-seq2seq-ocr-description) - [Model Architecture](#model-architecture) - [Dataset](#dataset) - [Dataset Prepare](#dataset-prepare) - [Environment Requirements](#environment-requirements) - [Quick Start](#quick-start) - [Script Description](#script-description) - [Script and Sample Code](#script-and-sample-code) - [Script Parameters](#script-parameters) - [Training Script Parameters](#training-script-parameters) - [Parameters Configuration](#parameters-configuration) - [Dataset Preparation](#dataset-preparation) - [Training Process](#training-process) - [Training](#training) - [Distributed Training](#distributed-training) - [Evaluation Process](#evaluation-process) - [Evaluation](#evaluation) - [Model Description](#model-description) - [Performance](#performance) - [Training Performance](#training-performance) - [Evaluation Performance](#evaluation-performance) ## [CRNN-Seq2Seq-OCR Description](#contents) CRNN-Seq2Seq-OCR is a neural network model for image based sequence recognition tasks, such as scene text recognition and optical character recognition (OCR). Its architecture is a combination of CNN and sequence to sequence model with attention mechanism. ## [Model Architecture](#content) CRNN-Seq2Seq-OCR applies a vgg structure to extract features from processed images, following with attention-based encoder and decoder layer, finally utilizes NLL to calculate loss. See src/attention_ocr.py for details. ## [Dataset](#content) For training and evaluation, we use the French Street Name Signs (FSNS) released by Google as the training data, which contains approximately 1 million training images and their corresponding ground truth words. ## [Environment Requirements](#contents) - Hardware(Ascend) - Prepare hardware environment with Ascend processor. If you want to try Ascend, please send the [application form](https://obs-9be7.obs.cn-east-2.myhuaweicloud.com/file/other/Ascend%20Model%20Zoo%E4%BD%93%E9%AA%8C%E8%B5%84%E6%BA%90%E7%94%B3%E8%AF%B7%E8%A1%A8.docx) to ascend@huawei.com. You will be able to have access to related resources once approved. - Framework - [MindSpore](https://gitee.com/mindspore/mindspore) - For more information, please check the resources below: - [MindSpore Tutorials](https://www.mindspore.cn/tutorial/training/en/master/index.html) - [MindSpore Python API](https://www.mindspore.cn/doc/api_python/en/master/index.html) ## [Quick Start](#contents) - After the dataset is prepared, you may start running the training or the evaluation scripts as follows: - Running on Ascend ```shell # distribute training example in Ascend $ bash run_distribute_train.sh [RANK_TABLE_FILE] [DATASET_PATH] # evaluation example in Ascend $ bash run_eval_ascend.sh [DATASET_PATH] [CHECKPOINT_PATH] # standalone training example in Ascend $ bash run_standalone_train.sh [DATASET_NAME] [DATASET_PATH] [PLATFORM] ``` For distributed training, a hccl configuration file with JSON format needs to be created in advance. Please follow the instructions in the link below: [hccl_tools](https://gitee.com/mindspore/mindspore/tree/master/model_zoo/utils/hccl_tools). ## [Script Description](#contents) ### [Script and Sample Code](#contents) ```shell crnn-seq2seq-ocr ├── README.md # Descriptions about CRNN-Seq2Seq-OCR ├── scripts │   ├── run_distribute_train.sh # Launch distributed training on Ascend(8 pcs) │   ├── run_eval_ascend.sh # Launch Ascend evaluation │   └── run_standalone_train.sh # Launch standalone training on Ascend(1 pcs) ├── src │   ├── attention_ocr.py # CRNN-Seq2Seq-OCR training wrapper │   ├── cnn.py # VGG network │   ├── config.py # Parameter configuration │   ├── create_mindrecord_files.py # Create mindrecord files from images and ground truth │   ├── dataset.py # Data preprocessing for training and evaluation │   ├── gru.py # GRU cell wrapper │   ├── logger.py # Logger configuration │   ├── lstm.py # LSTM cell wrapper │   ├── seq2seq.py # CRNN-Seq2Seq-OCR model structure │   └── utils.py # Utility functions for training and data pre-processing │   ├── weight_init.py # weight initialization of LSTM and GRU └── train.py # Training script ├── eval.py # Evaluation Script ``` ### [Script Parameters](#contents) #### Training Script Parameters ```shell # distributed training on Ascend Usage: bash run_distribute_train.sh [RANK_TABLE_FILE] [DATASET_PATH] # standalone training Usage: bash run_standalone_train.sh [DATASET_PATH] ``` #### Parameters Configuration Parameters for both training and evaluation can be set in config.py. ### [Dataset Preparation](#contents) - You may refer to "Generate dataset" in [Quick Start](#quick-start) to automatically generate a dataset, or you may choose to generate a text image dataset by yourself. ## [Training Process](#contents) - Set options in `config.py`, including learning rate and other network hyperparameters. Click [MindSpore dataset preparation tutorial](https://www.mindspore.cn/tutorial/training/zh-CN/master/use/data_preparation.html) for more information about dataset. ### [Training](#contents) - Run `run_standalone_train.sh` for non-distributed training of CRNN-Seq2Seq-OCR model, only support Ascend now. ``` bash bash run_standalone_train.sh [DATASET_PATH] ``` #### [Distributed Training](#contents) - Run `run_distribute_train.sh` for distributed training of CRNN-Seq2Seq-OCR model on Ascend. ``` bash bash run_distribute_train.sh [RANK_TABLE_FILE] [DATASET_PATH] ``` Check the `train_parallel0/log.txt` and you will get outputs as following: ```shell epoch: 20 step: 4080, loss is 1.56112 epoch: 20 step: 4081, loss is 1.6368448 epoch time: 1559886.096 ms, per step time: 382.231 ms ``` ## [Evaluation Process](#contents) ### [Evaluation](#contents) - Run `run_eval_ascend.sh` for evaluation on Ascend. ``` bash bash run_eval_ascend.sh [DATASET_PATH] [CHECKPOINT_PATH] ``` Check the `eval/log` and you will get outputs as following: ```shell character precision = 0.967522 Annotation precision precision = 0.635204 ``` # Model Description ## Performance ### Evaluation Performance | Parameters | Ascend | | -------------------------- | ----------------------------------------------------------- | | Model Version | V1 | | Resource | Ascend 910 ;CPU 2.60GHz,192cores;Memory,755G | | uploaded Date | 02/11/2021 (month/day/year) | | MindSpore Version | 1.2.0 | | Dataset | FSNS | | Training Parameters | epoch=20, batch_size=32 | | Optimizer | SGD | | Loss Function | Negative Log Likelihood | | Speed | 1pc: 355 ms/step; 8pcs: 385 ms/step | | Total time | 1pc: 64 hours; 8pcs: 9 hours | | Parameters (M) | 12 | | Scripts | [crnn_seq2seq_ocr script](https://gitee.com/mindspore/mindspore/tree/master/model_zoo/official/cv/crnn_seq2seq_ocr) | ### Inference Performance | Parameters | Ascend | | ------------------- | --------------------------- | | Model Version | V1 | | Resource | Ascend 910 | | Uploaded Date | 02/11/2021 (month/day/year) | | MindSpore Version | 1.2.0 | | Dataset | FSNS | | batch_size | 32 | | outputs | Annotation Precision, Character Precision | | Accuracy | Annotation Precision=63.52%, Character Precision=96.75% | | Model for inference | 12M (.ckpt file) |