Amir Lashkari
a2df339240
|
4 years ago | |
---|---|---|
.. | ||
scripts | 4 years ago | |
src | 4 years ago | |
README.md | 4 years ago | |
eval.py | 4 years ago | |
train.py | 4 years ago |
README.md
Contents
- Contents
CRNN-Seq2Seq-OCR Description
CRNN-Seq2Seq-OCR is a neural network model for image based sequence recognition tasks, such as scene text recognition and optical character recognition (OCR). Its architecture is a combination of CNN and sequence to sequence model with attention mechanism.
Model Architecture
CRNN-Seq2Seq-OCR applies a vgg structure to extract features from processed images, following with attention-based encoder and decoder layer, finally utilizes NLL to calculate loss. See src/attention_ocr.py for details.
Dataset
For training and evaluation, we use the French Street Name Signs (FSNS) released by Google as the training data, which contains approximately 1 million training images and their corresponding ground truth words.
Environment Requirements
- Hardware(Ascend)
- Prepare hardware environment with Ascend processor. If you want to try Ascend, please send the application form to ascend@huawei.com. You will be able to have access to related resources once approved.
- Framework
- For more information, please check the resources below:
Quick Start
-
After the dataset is prepared, you may start running the training or the evaluation scripts as follows:
- Running on Ascend
# distribute training example in Ascend $ bash run_distribute_train.sh [RANK_TABLE_FILE] [DATASET_PATH] # evaluation example in Ascend $ bash run_eval_ascend.sh [DATASET_PATH] [CHECKPOINT_PATH] # standalone training example in Ascend $ bash run_standalone_train.sh [DATASET_NAME] [DATASET_PATH] [PLATFORM]
For distributed training, a hccl configuration file with JSON format needs to be created in advance.
Please follow the instructions in the link below: hccl_tools.
Script Description
Script and Sample Code
crnn-seq2seq-ocr
├── README.md # Descriptions about CRNN-Seq2Seq-OCR
├── scripts
│ ├── run_distribute_train.sh # Launch distributed training on Ascend(8 pcs)
│ ├── run_eval_ascend.sh # Launch Ascend evaluation
│ └── run_standalone_train.sh # Launch standalone training on Ascend(1 pcs)
├── src
│ ├── attention_ocr.py # CRNN-Seq2Seq-OCR training wrapper
│ ├── cnn.py # VGG network
│ ├── config.py # Parameter configuration
│ ├── create_mindrecord_files.py # Create mindrecord files from images and ground truth
│ ├── dataset.py # Data preprocessing for training and evaluation
│ ├── gru.py # GRU cell wrapper
│ ├── logger.py # Logger configuration
│ ├── lstm.py # LSTM cell wrapper
│ ├── seq2seq.py # CRNN-Seq2Seq-OCR model structure
│ └── utils.py # Utility functions for training and data pre-processing
│ ├── weight_init.py # weight initialization of LSTM and GRU
└── train.py # Training script
├── eval.py # Evaluation Script
Script Parameters
Training Script Parameters
# distributed training on Ascend
Usage: bash run_distribute_train.sh [RANK_TABLE_FILE] [DATASET_PATH]
# standalone training
Usage: bash run_standalone_train.sh [DATASET_PATH]
Parameters Configuration
Parameters for both training and evaluation can be set in config.py.
Dataset Preparation
- You may refer to "Generate dataset" in Quick Start to automatically generate a dataset, or you may choose to generate a text image dataset by yourself.
Training Process
- Set options in
config.py
, including learning rate and other network hyperparameters. Click MindSpore dataset preparation tutorial for more information about dataset.
Training
- Run
run_standalone_train.sh
for non-distributed training of CRNN-Seq2Seq-OCR model, only support Ascend now.
bash run_standalone_train.sh [DATASET_PATH]
Distributed Training
- Run
run_distribute_train.sh
for distributed training of CRNN-Seq2Seq-OCR model on Ascend.
bash run_distribute_train.sh [RANK_TABLE_FILE] [DATASET_PATH]
Check the train_parallel0/log.txt
and you will get outputs as following:
epoch: 20 step: 4080, loss is 1.56112
epoch: 20 step: 4081, loss is 1.6368448
epoch time: 1559886.096 ms, per step time: 382.231 ms
Evaluation Process
Evaluation
- Run
run_eval_ascend.sh
for evaluation on Ascend.
bash run_eval_ascend.sh [DATASET_PATH] [CHECKPOINT_PATH]
Check the eval/log
and you will get outputs as following:
character precision = 0.967522
Annotation precision precision = 0.635204
Model Description
Performance
Evaluation Performance
Parameters | Ascend |
---|---|
Model Version | V1 |
Resource | Ascend 910 ;CPU 2.60GHz,192cores;Memory,755G |
uploaded Date | 02/11/2021 (month/day/year) |
MindSpore Version | 1.2.0 |
Dataset | FSNS |
Training Parameters | epoch=20, batch_size=32 |
Optimizer | SGD |
Loss Function | Negative Log Likelihood |
Speed | 1pc: 355 ms/step; 8pcs: 385 ms/step |
Total time | 1pc: 64 hours; 8pcs: 9 hours |
Parameters (M) | 12 |
Scripts | crnn_seq2seq_ocr script |
Inference Performance
Parameters | Ascend |
---|---|
Model Version | V1 |
Resource | Ascend 910 |
Uploaded Date | 02/11/2021 (month/day/year) |
MindSpore Version | 1.2.0 |
Dataset | FSNS |
batch_size | 32 |
outputs | Annotation Precision, Character Precision |
Accuracy | Annotation Precision=63.52%, Character Precision=96.75% |
Model for inference | 12M (.ckpt file) |