From fc0d57e59d2f1d76e599c3d1ba522312c241534a Mon Sep 17 00:00:00 2001 From: jzg Date: Sat, 19 Sep 2020 17:24:47 +0800 Subject: [PATCH] Amend deeplabv3 readme --- model_zoo/official/cv/deeplabv3/README.md | 337 +++++++++++++++------- 1 file changed, 237 insertions(+), 100 deletions(-) diff --git a/model_zoo/official/cv/deeplabv3/README.md b/model_zoo/official/cv/deeplabv3/README.md index f28c1994ec..aaff70a9e2 100644 --- a/model_zoo/official/cv/deeplabv3/README.md +++ b/model_zoo/official/cv/deeplabv3/README.md @@ -1,88 +1,38 @@ -# DeepLabV3 for MindSpore - -DeepLab is a series of image semantic segmentation models, DeepLabV3 improves significantly over previous versions. Two keypoints of DeepLabV3:Its multi-grid atrous convolution makes it better to deal with segmenting objects at multiple scales, and augmented ASPP makes image-level features available to capture long range information. +# Contents + +- [DeepLabV3 Description](#DeepLabV3-description) +- [Model Architecture](#model-architecture) +- [Dataset](#dataset) +- [Features](#features) + - [Mixed Precision](#mixed-precision) +- [Environment Requirements](#environment-requirements) +- [Quick Start](#quick-start) +- [Script Description](#script-description) + - [Script and Sample Code](#script-and-sample-code) + - [Script Parameters](#script-parameters) + - [Training Process](#training-process) + - [Evaluation Process](#evaluation-process) +- [Model Description](#model-description) + - [Performance](#performance) + - [Evaluation Performance](#evaluation-performance) +- [ModelZoo Homepage](#modelzoo-homepage) + + +# [DeepLabV3 Description](#contents) +## Description +DeepLab is a series of image semantic segmentation models, DeepLabV3 improves significantly over previous versions. Two keypoints of DeepLabV3: Its multi-grid atrous convolution makes it better to deal with segmenting objects at multiple scales, and augmented ASPP makes image-level features available to capture long range information. This repository provides a script and recipe to DeepLabV3 model and achieve state-of-the-art performance. -## Table Of Contents - -* [Model overview](#model-overview) - * [Model Architecture](#model-architecture) - * [Default configuration](#default-configuration) -* [Setup](#setup) - * [Requirements](#requirements) -* [Quick start guide](#quick-start-guide) -* [Performance](#performance) - * [Results](#results) - * [Training accuracy](#training-accuracy) - * [Training performance](#training-performance) - * [One-hour performance](#one-hour-performance) - - -​ - -## Model overview - Refer to [this paper][1] for network details. - `Chen L C, Papandreou G, Schroff F, et al. Rethinking atrous convolution for semantic image segmentation[J]. arXiv preprint arXiv:1706.05587, 2017.` [1]: https://arxiv.org/abs/1706.05587 -## Default Configuration - -- network structure - - Resnet101 as backbone, atrous convolution for dense feature extraction. - -- preprocessing on training data: - - crop size: 513 * 513 - - random scale: scale range 0.5 to 2.0 - - random flip - - mean subtraction: means are [103.53, 116.28, 123.675] - -- preprocessing on validation data: - - The image's long side is resized to 513, then the image is padded to 513 * 513 - -- training parameters: - - - Momentum: 0.9 - - LR scheduler: cosine - - Weight decay: 0.0001 - -## Setup - -The following section lists the requirements to start training the deeplabv3 model. - - -### Requirements - -Before running code of this project,please ensure you have the following environments: - - [MindSpore](https://www.mindspore.cn/) - - Hardware environment with the Ascend AI processor - - - For more information about how to get started with MindSpore, see the following sections: - - [MindSpore Tutorials](https://www.mindspore.cn/tutorial/training/en/master/index.html) - - [MindSpore Python API](https://www.mindspore.cn/doc/api_python/en/master/index.html) - - -## Quick Start Guide - -### 1. Clone the respository - -``` -git clone xxx -cd ModelZoo_DeepLabV3_MS_MTI/00-access -``` -### 2. Install python packages in requirements.txt - -### 3. Download and preprocess the dataset +# [Model Architecture](#contents) +Resnet101 as backbone, atrous convolution for dense feature extraction. +# [Dataset](#contents) +Pascal VOC datasets and Semantic Boundaries Dataset - Download segmentation dataset. - Prepare the training data list file. The list file saves the relative path to image and annotation pairs. Lines are like: @@ -95,7 +45,7 @@ cd ModelZoo_DeepLabV3_MS_MTI/00-access ...... ``` - - Configure and run build_data.sh to convert dataset to mindrecords. Arguments in build_data.sh: + - Configure and run build_data.sh to convert dataset to mindrecords. Arguments in scripts/build_data.sh: ``` --data_root root path of training data @@ -105,17 +55,136 @@ cd ModelZoo_DeepLabV3_MS_MTI/00-access --shuffle shuffle or not ``` -### 4. Generate config json file for 8-cards training +# [Features](#contents) + +## Mixed Precision + +The [mixed precision](https://www.mindspore.cn/tutorial/zh-CN/master/advanced_use/mixed_precision.html) training method accelerates the deep learning neural network training process by using both the single-precision and half-precision data types, and maintains the network precision achieved by the single-precision training at the same time. Mixed precision training can accelerate the computation process, reduce memory usage, and enable a larger model or batch size to be trained on specific hardware. +For FP16 operators, if the input data type is FP32, the backend of MindSpore will automatically handle it with reduced precision. Users could check the reduced-precision operators by enabling INFO log and then searching ‘reduce precision’. + +# [Environment Requirements](#contents) + +- Hardware(Ascend) + - Prepare hardware environment with Ascend. If you want to try Ascend , please send the [application form](https://obs-9be7.obs.cn-east-2.myhuaweicloud.com/file/other/Ascend%20Model%20Zoo%E4%BD%93%E9%AA%8C%E8%B5%84%E6%BA%90%E7%94%B3%E8%AF%B7%E8%A1%A8.docx) to ascend@huawei.com. Once approved, you can get the resources. +- Framework + - [MindSpore](https://www.mindspore.cn/install/en) +- For more information, please check the resources below: + - [MindSpore tutorials](https://www.mindspore.cn/tutorial/zh-CN/master/index.html) + - [MindSpore API](https://www.mindspore.cn/api/zh-CN/master/index.html) +- Install python packages in requirements.txt +- Generate config json file for 8pcs training + + ``` + # From the root of this project + cd src/tools/ + python3 get_multicards_json.py 10.111.*.* + # 10.111.*.* is the computer's ip address. + ``` + +# [Quick Start](#contents) + +After installing MindSpore via the official website, you can start training and evaluation as follows: +- Runing on Ascend - ``` - # From the root of this projectcd tools - python get_multicards_json.py 10.111.*.* - # 10.111.*.* is the computer's ip address. - ``` +Based on original DeepLabV3 paper, we reproduce two training experiments on vocaug (also as trainaug) dataset and evaluate on voc val dataset. -### 5. Train +For single device training, please config parameters, training script is: +``` +run_standalone_train.sh +``` +For 8 devices training, training steps are as follows: +1. Train s16 with vocaug dataset, finetuning from resnet101 pretrained model, script is: +``` +run_distribute_train_s16_r1.sh +``` +2. Train s8 with vocaug dataset, finetuning from model in previous step, training script is: +``` +run_distribute_train_s8_r1.sh +``` +3. Train s8 with voctrain dataset, finetuning from model in pervious step, training script is: +``` +run_distribute_train_s8_r2.sh +``` +For evaluation, evaluating steps are as follows: +1. Eval s16 with voc val dataset, eval script is: +``` +run_eval_s16.sh +``` +2. Eval s8 with voc val dataset, eval script is: +``` +run_eval_s8.sh +``` +3. Eval s8 multiscale with voc val dataset, eval script is: +``` +run_eval_s8_multiscale.sh +``` +4. Eval s8 multiscale and flip with voc val dataset, eval script is: +``` +run_eval_s8_multiscale_flip.sh +``` + +# [Script Description](#contents) +## [Script and Sample Code](#contents) +```shell +. +└──deeplabv3 + ├── README.md + ├── script + ├── build_data.sh # convert raw data to mindrecord dataset + ├── run_distribute_train_s16_r1.sh # launch ascend distributed training(8 pcs) with vocaug dataset in s16 structure + ├── run_distribute_train_s8_r1.sh # launch ascend distributed training(8 pcs) with vocaug dataset in s8 structure + ├── run_distribute_train_s8_r2.sh # launch ascend distributed training(8 pcs) with voctrain dataset in s8 structure + ├── run_eval_s16.sh # launch ascend evaluation in s16 structure + ├── run_eval_s8.sh # launch ascend evaluation in s8 structure + ├── run_eval_s8_multiscale.sh # launch ascend evaluation with multiscale in s8 structure + ├── run_eval_s8_multiscale_filp.sh # launch ascend evaluation with multiscale and filp in s8 structure + ├── run_standalone_train.sh # launch ascend standalone training(1 pc) + ├── src + ├── data + ├── data_generator.py # mindrecord data generator + ├── build_seg_data.py # data preprocessing + ├── loss + ├── loss.py # loss definition for deeplabv3 + ├── nets + ├── deeplab_v3 + ├── deeplab_v3.py # DeepLabV3 network structure + ├── net_factory.py # set S16 and S8 structures + ├── tools + ├── get_multicards_json.py # get rank table file + └── utils + └── learning_rates.py # generate learning rate + ├── eval.py # eval net + ├── train.py # train net + └── requirements.txt # requirements file +``` + +## [Script Parameters](#contents) -Based on original DeeplabV3 paper, we reproduce two training experiments on vocaug (also as trainaug) dataset and evaluate on voc val dataset. +Default Configuration +``` +"data_file":"/PATH/TO/MINDRECORD_NAME" # dataset path +"train_epochs":300 # total epochs +"batch_size":32 # batch size of input tensor +"crop_size":513 # crop size +"base_lr":0.08 # initial learning rate +"lr_type":cos # decay mode for generating learning rate +"min_scale":0.5 # minimum scale of data argumentation +"max_scale":2.0 # maximum scale of data argumentation +"ignore_label":255 # ignore label +"num_classes":21 # number of classes +"model":deeplab_v3_s16 # select model +"ckpt_pre_trained":"/PATH/TO/PRETRAIN_MODEL" # path to load pretrain checkpoint +"is_distributed": # distributed training, it will be True if the parameter is set +"save_steps":410 # steps interval for saving +"freeze_bn": # freeze_bn, it will be True if the parameter is set +"keep_checkpoint_max":200 # max checkpoint for saving +``` + +## [Training Process](#contents) + +### Usage +#### Running on Ascend +Based on original DeepLabV3 paper, we reproduce two training experiments on vocaug (also as trainaug) dataset and evaluate on voc val dataset. For single device training, please config parameters, training script is as follows: ``` @@ -198,7 +267,7 @@ done ``` 3. Train s8 with voctrain dataset, finetuning from model in pervious step, training script is as follows: ``` -# run_distribute_train_r2.sh +# run_distribute_train_s8_r2.sh for((i=0;i<=$RANK_SIZE-1;i++)); do export RANK_ID=$i @@ -225,8 +294,64 @@ do --keep_checkpoint_max=200 >log 2>&1 & done ``` -### 6. Test +### Result + +- Training vocaug in s16 structure +``` +# distribute training result(8p) +epoch: 1 step: 41, loss is 0.8319108 +Epoch time: 213856.477, per step time: 5216.012 +epoch: 2 step: 41, loss is 0.46052963 +Epoch time: 21233.183, per step time: 517.883 +epoch: 3 step: 41, loss is 0.45012417 +Epoch time: 21231.951, per step time: 517.852 +epoch: 4 step: 41, loss is 0.30687785 +Epoch time: 21199.911, per step time: 517.071 +epoch: 5 step: 41, loss is 0.22769661 +Epoch time: 21240.281, per step time: 518.056 +epoch: 6 step: 41, loss is 0.25470978 +... +``` + +- Training vocaug in s8 structure +``` +# distribute training result(8p) +epoch: 1 step: 82, loss is 0.024167 +Epoch time: 322663.456, per step time: 3934.920 +epoch: 2 step: 82, loss is 0.019832281 +Epoch time: 43107.238, per step time: 525.698 +epoch: 3 step: 82, loss is 0.021008959 +Epoch time: 43109.519, per step time: 525.726 +epoch: 4 step: 82, loss is 0.01912349 +Epoch time: 43177.287, per step time: 526.552 +epoch: 5 step: 82, loss is 0.022886964 +Epoch time: 43095.915, per step time: 525.560 +epoch: 6 step: 82, loss is 0.018708453 +Epoch time: 43107.458, per step time: 525.701 +... +``` +- Training voctrain in s8 structure +``` +# distribute training result(8p) +epoch: 1 step: 11, loss is 0.00554624 +Epoch time: 199412.913, per step time: 18128.447 +epoch: 2 step: 11, loss is 0.007181881 +Epoch time: 6119.375, per step time: 556.307 +epoch: 3 step: 11, loss is 0.004980865 +Epoch time: 5996.978, per step time: 545.180 +epoch: 4 step: 11, loss is 0.0047651967 +Epoch time: 5987.412, per step time: 544.310 +epoch: 5 step: 11, loss is 0.006262637 +Epoch time: 5956.682, per step time: 541.517 +epoch: 6 step: 11, loss is 0.0060750707 +Epoch time: 5962.164, per step time: 542.015 +... +``` + +## [Evaluation Process](#contents) +### Usage +#### Running on Ascend Config checkpoint with --ckpt_path, run script, mIOU with print in eval_path/eval_log. ``` ./run_eval_s16.sh # test s16 @@ -253,14 +378,11 @@ python ${train_code_path}/eval.py --data_root=/PATH/TO/DATA \ --ckpt_path=/PATH/TO/PRETRAIN_MODEL >${eval_path}/eval_log 2>&1 & ``` -## Performance - ### Result Our result were obtained by running the applicable training script. To achieve the same results, follow the steps in the Quick Start Guide. #### Training accuracy - | **Network** | OS=16 | OS=8 | MS | Flip | mIOU | mIOU in paper | | :----------: | :-----: | :----: | :----: | :-----: | :-----: | :-------------: | | deeplab_v3 | √ | | | | 77.37 | 77.21 | @@ -268,15 +390,30 @@ Our result were obtained by running the applicable training script. To achieve t | deeplab_v3 | | √ | √ | | 79.70 |79.45 | | deeplab_v3 | | √ | √ | √ | 79.89 | 79.77 | -#### Training performance - -| **NPUs** | train performance | -| :------: | :---------------: | -| 1 | 26 img/s | -| 8 | 131 img/s | - - - +Note: There OS is output stride, and MS is multiscale. + +# [Model Description](#contents) +## [Performance](#contents) + +### Evaluation Performance +| Parameters | Ascend 910 +| -------------------------- | -------------------------------------- | +| Model Version | DeepLabV3 +| Resource | Ascend 910 | +| Uploaded Date | 09/04/2020 (month/day/year) | +| MindSpore Version | 0.7.0-alpha | +| Dataset | PASCAL VOC2012 + SBD | +| Training Parameters | epoch = 300, batch_size = 32 (s16_r1)
epoch = 800, batch_size = 16 (s8_r1)
epoch = 300, batch_size = 16 (s8_r2) | +| Optimizer | Momentum | +| Loss Function | Softmax Cross Entropy | +| Outputs | probability | +| Loss | 0.0065883575 | +| Speed | 31ms/step(1pc, s8)
234ms/step(8pcs, s8) | +| Checkpoint for Fine tuning | 443M (.ckpt file) | +| Scripts | [Link](https://gitee.com/mindspore/mindspore/tree/master/model_zoo/official/cv/deeplabv3) | + +# [ModelZoo Homepage](#contents) + Please check the official [homepage](https://gitee.com/mindspore/mindspore/tree/master/model_zoo).