You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
mindspore/model_zoo/official/cv/crnn_seq2seq_ocr
caojiewen da60f433f1
removed the useless link of apply form
4 years ago
..
scripts Added crnn-seq2seq-ocr code 4 years ago
src Added crnn-seq2seq-ocr code 4 years ago
README.md removed the useless link of apply form 4 years ago
eval.py Added crnn-seq2seq-ocr code 4 years ago
train.py Added crnn-seq2seq-ocr code 4 years ago

README.md

Contents

CRNN-Seq2Seq-OCR Description

CRNN-Seq2Seq-OCR is a neural network model for image based sequence recognition tasks, such as scene text recognition and optical character recognition (OCR). Its architecture is a combination of CNN and sequence to sequence model with attention mechanism.

Model Architecture

CRNN-Seq2Seq-OCR applies a vgg structure to extract features from processed images, following with attention-based encoder and decoder layer, finally utilizes NLL to calculate loss. See src/attention_ocr.py for details.

Dataset

For training and evaluation, we use the French Street Name Signs (FSNS) released by Google as the training data, which contains approximately 1 million training images and their corresponding ground truth words.

Environment Requirements

Quick Start

  • After the dataset is prepared, you may start running the training or the evaluation scripts as follows:

    • Running on Ascend
    # distribute training example in Ascend
    $ bash run_distribute_train.sh [RANK_TABLE_FILE] [DATASET_PATH]
    
    # evaluation example in Ascend
    $ bash run_eval_ascend.sh [DATASET_PATH] [CHECKPOINT_PATH]
    
    # standalone training example in Ascend
    $ bash run_standalone_train.sh [DATASET_NAME] [DATASET_PATH] [PLATFORM]
    

    For distributed training, a hccl configuration file with JSON format needs to be created in advance.

    Please follow the instructions in the link below: hccl_tools.

Script Description

Script and Sample Code

crnn-seq2seq-ocr
├── README.md                                   # Descriptions about CRNN-Seq2Seq-OCR
├── scripts
│   ├── run_distribute_train.sh                 # Launch distributed training on Ascend(8 pcs)
│   ├── run_eval_ascend.sh                      # Launch Ascend evaluation
│   └── run_standalone_train.sh                 # Launch standalone training on Ascend(1 pcs)
├── src
│   ├── attention_ocr.py                        # CRNN-Seq2Seq-OCR training wrapper
│   ├── cnn.py                                  # VGG network
│   ├── config.py                               # Parameter configuration
│   ├── create_mindrecord_files.py              # Create mindrecord files from images and ground truth
│   ├── dataset.py                              # Data preprocessing for training and evaluation
│   ├── gru.py                                  # GRU cell wrapper
│   ├── logger.py                               # Logger configuration
│   ├── lstm.py                                 # LSTM cell wrapper
│   ├── seq2seq.py                              # CRNN-Seq2Seq-OCR model structure
│   └── utils.py                                # Utility functions for training and data pre-processing
│   ├── weight_init.py                          # weight initialization of LSTM and GRU
└── train.py                                    # Training script
├── eval.py                                     # Evaluation Script

Script Parameters

Training Script Parameters

# distributed training on Ascend
Usage: bash run_distribute_train.sh [RANK_TABLE_FILE] [DATASET_PATH]

# standalone training
Usage: bash run_standalone_train.sh [DATASET_PATH]

Parameters Configuration

Parameters for both training and evaluation can be set in config.py.

Dataset Preparation

  • You may refer to "Generate dataset" in Quick Start to automatically generate a dataset, or you may choose to generate a text image dataset by yourself.

Training Process

Training

  • Run run_standalone_train.sh for non-distributed training of CRNN-Seq2Seq-OCR model, only support Ascend now.
bash run_standalone_train.sh [DATASET_PATH]

Distributed Training

  • Run run_distribute_train.sh for distributed training of CRNN-Seq2Seq-OCR model on Ascend.
bash run_distribute_train.sh [RANK_TABLE_FILE] [DATASET_PATH]

Check the train_parallel0/log.txt and you will get outputs as following:

epoch: 20 step: 4080, loss is 1.56112
epoch: 20 step: 4081, loss is 1.6368448
epoch time: 1559886.096 ms, per step time: 382.231 ms

Evaluation Process

Evaluation

  • Run run_eval_ascend.sh for evaluation on Ascend.
bash run_eval_ascend.sh [DATASET_PATH] [CHECKPOINT_PATH]

Check the eval/log and you will get outputs as following:

character precision = 0.967522

Annotation precision precision = 0.635204

Model Description

Performance

Evaluation Performance

Parameters Ascend
Model Version V1
Resource Ascend 910 CPU 2.60GHz192coresMemory755G
uploaded Date 02/11/2021 (month/day/year)
MindSpore Version 1.2.0
Dataset FSNS
Training Parameters epoch=20, batch_size=32
Optimizer SGD
Loss Function Negative Log Likelihood
Speed 1pc: 355 ms/step; 8pcs: 385 ms/step
Total time 1pc: 64 hours; 8pcs: 9 hours
Parameters (M) 12
Scripts crnn_seq2seq_ocr script

Inference Performance

Parameters Ascend
Model Version V1
Resource Ascend 910
Uploaded Date 02/11/2021 (month/day/year)
MindSpore Version 1.2.0
Dataset FSNS
batch_size 32
outputs Annotation Precision, Character Precision
Accuracy Annotation Precision=63.52%, Character Precision=96.75%
Model for inference 12M (.ckpt file)