History

caojiewen da60f433f1 removed the useless link of apply form		4 years ago
..
scripts	Added crnn-seq2seq-ocr code	4 years ago
src	Added crnn-seq2seq-ocr code	4 years ago
README.md	removed the useless link of apply form	4 years ago
eval.py	Added crnn-seq2seq-ocr code	4 years ago
train.py	Added crnn-seq2seq-ocr code	4 years ago

CRNN-Seq2Seq-OCR is a neural network model for image based sequence recognition tasks, such as scene text recognition and optical character recognition (OCR). Its architecture is a combination of CNN and sequence to sequence model with attention mechanism.

Model Architecture

CRNN-Seq2Seq-OCR applies a vgg structure to extract features from processed images, following with attention-based encoder and decoder layer, finally utilizes NLL to calculate loss. See src/attention_ocr.py for details.

Dataset

For training and evaluation, we use the French Street Name Signs (FSNS) released by Google as the training data, which contains approximately 1 million training images and their corresponding ground truth words.

Environment Requirements

Hardware（Ascend）
- Prepare hardware environment with Ascend processor.
Framework
- MindSpore
For more information, please check the resources below：
- MindSpore Tutorials
- MindSpore Python API

Quick Start

After the dataset is prepared, you may start running the training or the evaluation scripts as follows:

Running on Ascend

# distribute training example in Ascend
$ bash run_distribute_train.sh [RANK_TABLE_FILE] [DATASET_PATH]

# evaluation example in Ascend
$ bash run_eval_ascend.sh [DATASET_PATH] [CHECKPOINT_PATH]

# standalone training example in Ascend
$ bash run_standalone_train.sh [DATASET_NAME] [DATASET_PATH] [PLATFORM]

For distributed training, a hccl configuration file with JSON format needs to be created in advance.

Please follow the instructions in the link below: hccl_tools.

Script Description

Script and Sample Code

crnn-seq2seq-ocr
├── README.md                                   # Descriptions about CRNN-Seq2Seq-OCR
├── scripts
│   ├── run_distribute_train.sh                 # Launch distributed training on Ascend(8 pcs)
│   ├── run_eval_ascend.sh                      # Launch Ascend evaluation
│   └── run_standalone_train.sh                 # Launch standalone training on Ascend(1 pcs)
├── src
│   ├── attention_ocr.py                        # CRNN-Seq2Seq-OCR training wrapper
│   ├── cnn.py                                  # VGG network
│   ├── config.py                               # Parameter configuration
│   ├── create_mindrecord_files.py              # Create mindrecord files from images and ground truth
│   ├── dataset.py                              # Data preprocessing for training and evaluation
│   ├── gru.py                                  # GRU cell wrapper
│   ├── logger.py                               # Logger configuration
│   ├── lstm.py                                 # LSTM cell wrapper
│   ├── seq2seq.py                              # CRNN-Seq2Seq-OCR model structure
│   └── utils.py                                # Utility functions for training and data pre-processing
│   ├── weight_init.py                          # weight initialization of LSTM and GRU
└── train.py                                    # Training script
├── eval.py                                     # Evaluation Script

Script Parameters

Training Script Parameters

# distributed training on Ascend
Usage: bash run_distribute_train.sh [RANK_TABLE_FILE] [DATASET_PATH]

# standalone training
Usage: bash run_standalone_train.sh [DATASET_PATH]

Parameters Configuration

Parameters for both training and evaluation can be set in config.py.

Dataset Preparation

You may refer to "Generate dataset" in Quick Start to automatically generate a dataset, or you may choose to generate a text image dataset by yourself.

Training Process

Set options in config.py, including learning rate and other network hyperparameters. Click MindSpore dataset preparation tutorial for more information about dataset.

Training

Run run_standalone_train.sh for non-distributed training of CRNN-Seq2Seq-OCR model, only support Ascend now.

bash run_standalone_train.sh [DATASET_PATH]

Distributed Training

Run run_distribute_train.sh for distributed training of CRNN-Seq2Seq-OCR model on Ascend.

bash run_distribute_train.sh [RANK_TABLE_FILE] [DATASET_PATH]

Check the train_parallel0/log.txt and you will get outputs as following:

epoch: 20 step: 4080, loss is 1.56112
epoch: 20 step: 4081, loss is 1.6368448
epoch time: 1559886.096 ms, per step time: 382.231 ms

Evaluation Process

Evaluation

Run run_eval_ascend.sh for evaluation on Ascend.

bash run_eval_ascend.sh [DATASET_PATH] [CHECKPOINT_PATH]

Check the eval/log and you will get outputs as following:

character precision = 0.967522

Annotation precision precision = 0.635204

Model Description

Performance

Evaluation Performance

Parameters	Ascend
Model Version	V1
Resource	Ascend 910 ；CPU 2.60GHz，192cores；Memory，755G
uploaded Date	02/11/2021 (month/day/year)
MindSpore Version	1.2.0
Dataset	FSNS
Training Parameters	epoch=20, batch_size=32
Optimizer	SGD
Loss Function	Negative Log Likelihood
Speed	1pc: 355 ms/step; 8pcs: 385 ms/step
Total time	1pc: 64 hours; 8pcs: 9 hours
Parameters (M)	12
Scripts	crnn_seq2seq_ocr script

Inference Performance

Parameters	Ascend
Model Version	V1
Resource	Ascend 910
Uploaded Date	02/11/2021 (month/day/year)
MindSpore Version	1.2.0
Dataset	FSNS
batch_size	32
outputs	Annotation Precision, Character Precision
Accuracy	Annotation Precision=63.52%, Character Precision=96.75%
Model for inference	12M (.ckpt file)

README.md

Unescape Escape

Contents

CRNN-Seq2Seq-OCR Description

Model Architecture

Dataset

Environment Requirements

Quick Start

Script Description

Script and Sample Code

Script Parameters

Training Script Parameters

Parameters Configuration

Dataset Preparation

Training Process

Training

Distributed Training

Evaluation Process

Evaluation

Model Description

Performance

Evaluation Performance

Inference Performance

README.md Unescape Escape

Contents

Training Script Parameters

Parameters Configuration

Model Description

Performance

Evaluation Performance

Inference Performance

README.md

Unescape Escape