|
|
# DeepText for Ascend
|
|
|
|
|
|
- [DeepText Description](#DeepText-description)
|
|
|
- [Model Architecture](#model-architecture)
|
|
|
- [Dataset](#dataset)
|
|
|
- [Features](#features)
|
|
|
- [Mixed Precision](#mixed-precision)
|
|
|
- [Environment Requirements](#environment-requirements)
|
|
|
- [Script Description](#script-description)
|
|
|
- [Script and Sample Code](#script-and-sample-code)
|
|
|
- [Training Process](#training-process)
|
|
|
- [Evaluation Process](#evaluation-process)
|
|
|
- [Evaluation](#evaluation)
|
|
|
- [Model Description](#model-description)
|
|
|
- [Performance](#performance)
|
|
|
- [Training Performance](#evaluation-performance)
|
|
|
- [Inference Performance](#evaluation-performance)
|
|
|
- [Description of Random Situation](#description-of-random-situation)
|
|
|
- [ModelZoo Homepage](#modelzoo-homepage)
|
|
|
|
|
|
# [DeepText Description](#contents)
|
|
|
|
|
|
DeepText is a convolutional neural network architecture for text detection in non-specific scenarios. The DeepText system is based on the elegant framework of Faster R-CNN. This idea was proposed in the paper "DeepText: A new approach for text proposal generation and text detection in natural images.", published in 2017.
|
|
|
|
|
|
[Paper](https://arxiv.org/pdf/1605.07314v1.pdf) Zhuoyao Zhong, Lianwen Jin, Shuangping Huang, South China University of Technology (SCUT), Published in ICASSP 2017.
|
|
|
|
|
|
# [Model architecture](#contents)
|
|
|
|
|
|
The overall network architecture of InceptionV4 is show below:
|
|
|
|
|
|
[Link](https://arxiv.org/pdf/1605.07314v1.pdf)
|
|
|
|
|
|
# [Dataset](#contents)
|
|
|
|
|
|
Here we used 4 datasets for training, and 1 datasets for Evaluation.
|
|
|
|
|
|
- Dataset1: ICDAR 2013: Focused Scene Text
|
|
|
- Train: 142MB, 229 images
|
|
|
- Test: 110MB, 233 images
|
|
|
- Dataset2: ICDAR 2013: Born-Digital Images
|
|
|
- Train: 27.7MB, 410 images
|
|
|
- Dataset3: SCUT-FORU: Flickr OCR Universal Database
|
|
|
- Train: 388MB, 1715 images
|
|
|
- Dataset4: CocoText v2(Subset of MSCOCO2017):
|
|
|
- Train: 13GB, 63686 images
|
|
|
|
|
|
# [Features](#contents)
|
|
|
|
|
|
# [Environment Requirements](#contents)
|
|
|
|
|
|
- Hardware(Ascend)
|
|
|
- Prepare hardware environment with Ascend processor.
|
|
|
- Framework
|
|
|
- [MindSpore](https://www.mindspore.cn/install/en)
|
|
|
- For more information, please check the resources below:
|
|
|
- [MindSpore Tutorials](https://www.mindspore.cn/tutorial/training/en/master/index.html)
|
|
|
- [MindSpore Python API](https://www.mindspore.cn/doc/api_python/en/master/index.html)
|
|
|
|
|
|
# [Script description](#contents)
|
|
|
|
|
|
## [Script and sample code](#contents)
|
|
|
|
|
|
```shell
|
|
|
.
|
|
|
└─deeptext
|
|
|
├─README.md
|
|
|
├─scripts
|
|
|
├─run_standalone_train_ascend.sh # launch standalone training with ascend platform(1p)
|
|
|
├─run_distribute_train_ascend.sh # launch distributed training with ascend platform(8p)
|
|
|
└─run_eval_ascend.sh # launch evaluating with ascend platform
|
|
|
├─src
|
|
|
├─DeepText
|
|
|
├─__init__.py # package init file
|
|
|
├─anchor_genrator.py # anchor generator
|
|
|
├─bbox_assign_sample.py # proposal layer for stage 1
|
|
|
├─bbox_assign_sample_stage2.py # proposal layer for stage 2
|
|
|
├─deeptext_vgg16.py # main network definition
|
|
|
├─proposal_generator.py # proposal generator
|
|
|
├─rcnn.py # rcnn
|
|
|
├─roi_align.py # roi_align cell wrapper
|
|
|
├─rpn.py # region-proposal network
|
|
|
└─vgg16.py # backbone
|
|
|
├─config.py # training configuration
|
|
|
├─dataset.py # data proprocessing
|
|
|
├─lr_schedule.py # learning rate scheduler
|
|
|
├─network_define.py # network definition
|
|
|
└─utils.py # some functions which is commonly used
|
|
|
├─eval.py # eval net
|
|
|
├─export.py # export checkpoint, surpport .onnx, .air, .mindir convert
|
|
|
└─train.py # train net
|
|
|
```
|
|
|
|
|
|
## [Training process](#contents)
|
|
|
|
|
|
### Usage
|
|
|
|
|
|
- Ascend:
|
|
|
|
|
|
```bash
|
|
|
# distribute training example(8p)
|
|
|
sh run_distribute_train_ascend.sh [IMGS_PATH] [ANNOS_PATH] [RANK_TABLE_FILE] [PRETRAINED_PATH] [COCO_TEXT_PARSER_PATH]
|
|
|
# standalone training
|
|
|
sh run_standalone_train_ascend.sh [IMGS_PATH] [ANNOS_PATH] [PRETRAINED_PATH] [COCO_TEXT_PARSER_PATH] [DEVICE_ID]
|
|
|
# evaluation:
|
|
|
sh run_eval_ascend.sh [IMGS_PATH] [ANNOS_PATH] [CHECKPOINT_PATH] [COCO_TEXT_PARSER_PATH] [DEVICE_ID]
|
|
|
```
|
|
|
|
|
|
> Notes:
|
|
|
> RANK_TABLE_FILE can refer to [Link](https://www.mindspore.cn/tutorial/training/en/master/advanced_use/distributed_training_ascend.html) , and the device_ip can be got as [Link](https://gitee.com/mindspore/mindspore/tree/master/model_zoo/utils/hccl_tools). For large models like InceptionV4, it's better to export an external environment variable `export HCCL_CONNECT_TIMEOUT=600` to extend hccl connection checking time from the default 120 seconds to 600 seconds. Otherwise, the connection could be timeout since compiling time increases with the growth of model size.
|
|
|
>
|
|
|
> This is processor cores binding operation regarding the `device_num` and total processor numbers. If you are not expect to do it, remove the operations `taskset` in `scripts/run_distribute_train.sh`
|
|
|
>
|
|
|
> The `pretrained_path` should be a checkpoint of vgg16 trained on Imagenet2012. The name of weight in dict should be totally the same, also the batch_norm should be enabled in the trainig of vgg16, otherwise fails in further steps.
|
|
|
> COCO_TEXT_PARSER_PATH coco_text.py can refer to [Link](https://github.com/andreasveit/coco-text).
|
|
|
>
|
|
|
### Launch
|
|
|
|
|
|
```bash
|
|
|
# training example
|
|
|
shell:
|
|
|
Ascend:
|
|
|
# distribute training example(8p)
|
|
|
sh run_distribute_train_ascend.sh [IMGS_PATH] [ANNOS_PATH] [RANK_TABLE_FILE] [PRETRAINED_PATH] [COCO_TEXT_PARSER_PATH]
|
|
|
# standalone training
|
|
|
sh run_standalone_train_ascend.sh [IMGS_PATH] [ANNOS_PATH] [PRETRAINED_PATH] [COCO_TEXT_PARSER_PATH] [DEVICE_ID]
|
|
|
```
|
|
|
|
|
|
### Result
|
|
|
|
|
|
Training result will be stored in the example path. Checkpoints will be stored at `ckpt_path` by default, and training log will be redirected to `./log`, also the loss will be redirected to `./loss_0.log` like followings.
|
|
|
|
|
|
```python
|
|
|
469 epoch: 1 step: 982 ,rpn_loss: 0.03940, rcnn_loss: 0.48169, rpn_cls_loss: 0.02910, rpn_reg_loss: 0.00344, rcnn_cls_loss: 0.41943, rcnn_reg_loss: 0.06223, total_loss: 0.52109
|
|
|
659 epoch: 2 step: 982 ,rpn_loss: 0.03607, rcnn_loss: 0.32129, rpn_cls_loss: 0.02916, rpn_reg_loss: 0.00230, rcnn_cls_loss: 0.25732, rcnn_reg_loss: 0.06390, total_loss: 0.35736
|
|
|
847 epoch: 3 step: 982 ,rpn_loss: 0.07074, rcnn_loss: 0.40527, rpn_cls_loss: 0.03494, rpn_reg_loss: 0.01193, rcnn_cls_loss: 0.30591, rcnn_reg_loss: 0.09937, total_loss: 0.47601
|
|
|
```
|
|
|
|
|
|
## [Eval process](#contents)
|
|
|
|
|
|
### Usage
|
|
|
|
|
|
You can start training using python or shell scripts. The usage of shell scripts as follows:
|
|
|
|
|
|
- Ascend:
|
|
|
|
|
|
```bash
|
|
|
sh run_eval_ascend.sh [IMGS_PATH] [ANNOS_PATH] [CHECKPOINT_PATH] [COCO_TEXT_PARSER_PATH] [DEVICE_ID]
|
|
|
```
|
|
|
|
|
|
### Launch
|
|
|
|
|
|
```bash
|
|
|
# eval example
|
|
|
shell:
|
|
|
Ascend:
|
|
|
sh run_eval_ascend.sh [IMGS_PATH] [ANNOS_PATH] [CHECKPOINT_PATH] [COCO_TEXT_PARSER_PATH] [DEVICE_ID]
|
|
|
```
|
|
|
|
|
|
> checkpoint can be produced in training process.
|
|
|
|
|
|
### Result
|
|
|
|
|
|
Evaluation result will be stored in the example path, you can find result like the followings in `log`.
|
|
|
|
|
|
```python
|
|
|
========================================
|
|
|
|
|
|
class 1 precision is 88.01%, recall is 82.77%
|
|
|
```
|
|
|
|
|
|
# [Model description](#contents)
|
|
|
|
|
|
## [Performance](#contents)
|
|
|
|
|
|
### Training Performance
|
|
|
|
|
|
| Parameters | Ascend |
|
|
|
| -------------------------- | ------------------------------------------------------------ |
|
|
|
| Model Version | Deeptext |
|
|
|
| Resource | Ascend 910, cpu:2.60GHz 192cores, memory:755G |
|
|
|
| uploaded Date | 12/26/2020 |
|
|
|
| MindSpore Version | 1.1.0 |
|
|
|
| Dataset | 66040 images |
|
|
|
| Batch_size | 2 |
|
|
|
| Training Parameters | src/config.py |
|
|
|
| Optimizer | Momentum |
|
|
|
| Loss Function | SoftmaxCrossEntropyWithLogits for classification, SmoothL2Loss for bbox regression|
|
|
|
| Loss | ~0.008 |
|
|
|
| Total time (8p) | 4h |
|
|
|
| Scripts | [deeptext script](https://gitee.com/mindspore/mindspore/tree/master/model_zoo/official/cv/deeptext) |
|
|
|
|
|
|
#### Inference Performance
|
|
|
|
|
|
| Parameters | Ascend |
|
|
|
| ------------------- | --------------------------- |
|
|
|
| Model Version | Deeptext |
|
|
|
| Resource | Ascend 910, cpu:2.60GHz 192cores, memory:755G |
|
|
|
| Uploaded Date | 12/26/2020 |
|
|
|
| MindSpore Version | 1.1.0 |
|
|
|
| Dataset | 229 images |
|
|
|
| Batch_size | 2 |
|
|
|
| Accuracy | precision=0.8801, recall=0.8277 |
|
|
|
| Total time | 1 min |
|
|
|
| Model for inference | 3492M (.ckpt file) |
|
|
|
|
|
|
#### Training performance results
|
|
|
|
|
|
| **Ascend** | train performance |
|
|
|
| :--------: | :---------------: |
|
|
|
| 1p | 14 img/s |
|
|
|
|
|
|
| **Ascend** | train performance |
|
|
|
| :--------: | :---------------: |
|
|
|
| 8p | 50 img/s |
|
|
|
|
|
|
# [Description of Random Situation](#contents)
|
|
|
|
|
|
We set seed to 1 in train.py.
|
|
|
|
|
|
# [ModelZoo Homepage](#contents)
|
|
|
|
|
|
Please check the official [homepage](https://gitee.com/mindspore/mindspore/tree/master/model_zoo).
|