!8899 Add OpenPose network to modelzoo

From: @zhanghuiyao Reviewed-by: Signed-off-by:
5 years ago · f4c126ddeb
parent 67265bf677 a79416cfba
commit f4c126ddeb
13 changed files with 2446 additions and 0 deletions
--- a/model_zoo/official/cv/openpose/README.md
+++ b/model_zoo/official/cv/openpose/README.md
@ -0,0 +1,225 @@
+# Contents
+
+- [Openpose Description](#googlenet-description)
+- [Model Architecture](#model-architecture)
+- [Dataset](#dataset)
+- [Features](#features)
+    - [Mixed Precision](#mixed-precision)
+- [Environment Requirements](#environment-requirements)
+- [Quick Start](#quick-start)
+- [Script Description](#script-description)
+    - [Script and Sample Code](#script-and-sample-code)
+    - [Script Parameters](#script-parameters)
+    - [Training Process](#training-process)
+        - [Training](#training)
+        - [Distributed Training](#distributed-training)  
+    - [Evaluation Process](#evaluation-process)
+        - [Evaluation](#evaluation)
+- [Model Description](#model-description)
+    - [Performance](#performance)  
+        - [Evaluation Performance](#evaluation-performance)
+
+# [Openpose Description](#contents)
+
+Openpose network proposes a bottom-up human attitude estimation algorithm using Part Affinity Fields (PAFs). Instead of a top-down algorithm: Detect people first and then return key-points and skeleton. The advantage of openpose is that the computing time does not increase significantly as the number of people in the image increases.However,the top-down algorithm is based on the detection result, and the runtimes grow linearly with the number of people.
+
+[Paper](https://arxiv.org/abs/1611.08050):  Zhe Cao,Tomas Simon,Shih-En Wei,Yaser Sheikh,"Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields",The IEEE Conference on Computer Vision and Pattern Recongnition(CVPR),2017
+
+# [Model Architecture](#contents)
+
+In first step the image is passed through baseline CNN network to extract the feature maps of the input In the paper. In this paper thee authors used first 10 layers of VGG-19 network.
+The feature map is then process in a multi-stage CNN pipeline to generate the Part Confidence Maps and Part Affinity Field.
+In the last step, the Confidence Maps and Part Affinity Fields  that are generated above are processed by a greedy bipartite matching algorithm to obtain the poses for each person in the image.
+
+# [Dataset](#contents)
+
+Prepare datasets, including training sets, verification sets, and annotations.The training set and validation set samples are located in the "dataset" directory, The available datasets include coco2014,coco2017 datasets.
+In the currently provided training script, the coco2017 data set is used as an example to perform data preprocessing during the training process. If users use data sets in other formats, please modify the data set loading and preprocessing methods
+
+- Download data from coco2017 data official website and unzip.
+
+ ````bash
+     wget http://images.cocodataset.org/zips/train2017.zip
+     wget http://images.cocodataset.org/zips/val2017.zip
+     wget http://images.cocodataset.org/annotations/annotations2017.zip
+````
+
+- Create the mask dataset.
+
+    Run python gen_ignore_mask.py
+
+````python
+    python gen_ignore_mask.py --train_ann ../dataset/annotations/person_keypoints_train2017.json --val_ann ../dataset/annotations/person_keypoints_val2017.json --train_dir train2017 --val_dir val2017
+````
+
+- The dataset folder is generated in the root directory and contains the following files:
+
+   ```python
+   ├── dataset
+       ├── annotation
+           ├─person_keypoints_train2017.json
+           └─person_keypoints_val2017.json
+       ├─ignore_mask_train
+       ├─ignore_mask_val
+       ├─tran2017
+       └─val2017
+   ```
+
+# [Features](#contents)
+
+## Mixed Precision
+
+The [mixed precision](https://www.mindspore.cn/tutorial/training/en/master/advanced_use/enable_mixed_precision.html) training method accelerates the deep learning neural network training process by using both the single-precision and half-precision data formats, and maintains the network precision achieved by the single-precision training at the same time. Mixed precision training can accelerate the computation process, reduce memory usage, and enable a larger model or batch size to be trained on specific hardware.
+For FP16 operators, if the input data type is FP32, the backend of MindSpore will automatically handle it with reduced precision. Users could check the reduced-precision operators by enabling INFO log and then searching ‘reduce precision’.
+
+# [Environment Requirements](#contents)
+
+- Hardware (Ascend)
+    - Prepare hardware environment with Ascend. If you want to try, please send the [application form](https://obs-9be7.obs.cn-east-2.myhuaweicloud.com/file/other/Ascend%20Model%20Zoo%E4%BD%93%E9%AA%8C%E8%B5%84%E6%BA%90%E7%94%B3%E8%AF%B7%E8%A1%A8.docx) to ascend@huawei.com. Once approved, you can get the resources.
+- Framework
+    - [MindSpore](https://www.mindspore.cn/install/en)
+- Download the VGG19 model of the MindSpore version:
+    - [vgg19-0-97_5004.ckpt](http://10.154.33.38:51203/tutorials/image_classification.html)
+- For more information, please check the resources below：
+    - [MindSpore Tutorials](https://www.mindspore.cn/tutorial/training/en/master/index.html)
+    - [MindSpore Python API](https://www.mindspore.cn/doc/api_python/en/master/index.html)
+
+# [Quick Start](#contents)
+
+After installing MindSpore via the official website, you can start training and evaluation as follows:
+
+  ```python
+  # run training example
+  python train.py --train_dir train2017 --train_ann person_keypoints_train2017.json > train.log 2>&1 &
+
+  # run distributed training example
+  bash run_distribute_train.sh [RANK_TABLE_FILE]
+
+  # run evaluation example
+  python eval.py --model_path path_to_eval_model.ckpt --imgpath_val ./dataset/val2017 --ann ./dataset/annotations/person_keypoints_val2017.json > eval.log 2>&1 &
+  OR
+  bash scripts/run_eval_ascend.sh
+  ```
+
+[RANK_TABLE_FILE] is the path of the multi-card information configuration table in the environment. The configuration table can be automatically generated by the tool [hccl_tool](https://gitee.com/mindspore/mindspore/tree/master/model_zoo/utils/hccl_tools).
+
+# [Script Description](#contents)
+
+## [Script and Sample Code](#contents)
+
+```python
+├── ModelZoo_openpose_MS_MIT
+    ├── README.md                        // descriptions about openpose
+    ├── scripts
+    │   ├──run_standalone_train.sh       // shell script for distributed on Ascend
+    │   ├──run_distribute_train.sh       // shell script for distributed on Ascend with 8p
+    │   ├──run_eval_ascend.sh            // shell script for evaluation on Ascend
+    ├── src
+    │   ├──openposenet.py                // Openpose architecture
+    │   ├──loss.py                       // Loss function
+    │   ├──config.py                     // parameter configuration
+    │   ├──dataset.py                    // Data preprocessing
+    │   ├──utils.py                      // Utils
+    │   ├──gen_ignore_mask.py            // Generating mask data script
+    ├── export.py                        // model conversion script
+    ├── train.py                         // training script
+    ├── eval.py                          // evaluation script  
+```
+
+## [Script Parameters](#contents)
+
+Parameters for both training and evaluation can be set in config.py
+
+- config for openpose
+
+  ```python
+  'data_dir': 'path to dataset'                    # absolute full path to the train and evaluation datasets
+  'vgg_path': 'path to vgg model'                  # absolute full path to vgg19 model
+  'save_model_path': 'path of saving models'       # absolute full path to output models
+  'load_pretrain': 'False'                         # whether training based on the pre-trained model
+  'pretrained_model_path':''                       # load pre-trained model path
+  'lr': 1e-4                                       # initial learning rate
+  'batch_size': 10                                 # training batch size
+  'lr_gamma': 0.1                                  # lr scale when reach lr_steps
+  'lr_steps': '100000,200000,250000'               # the steps when lr * lr_gamma
+  'loss scale': 16386                              # the loss scale of mixed precision
+  'max_epoch_train': 60                            # total training epochs
+  'insize': 368                                    # image size used as input to the model
+  'keep_checkpoint_max': 5                         # only keep the last keep_checkpoint_max checkpoint
+  'log_interval': 100                              # the interval of print a log
+  'ckpt_interval': 5000                            # the interval of saving a output model
+  ```
+
+For more configuration details, please refer the script `config.py`.
+
+## [Training Process](#contents)
+
+### Training
+
+- running on Ascend
+
+  ```python
+  python train.py --train_dir train2017 --train_ann person_keypoints_train2017.json > train.log 2>&1 &
+  ```
+
+  The python command above will run in the background, you can view the results through the file `train.log`.
+
+  After training, you'll get some checkpoint files under the script folder by default. The loss value will be achieved as follows:
+
+  ```python
+  # grep "epoch " train.log
+  epoch[0], iter[0], loss[0.29211228793809957], 0.13 imgs/sec, vgglr=0.0,baselr=2.499999936844688e-05,stagelr=9.999999747378752e-05
+  epoch[0], iter[100], loss[0.060355084178521694], 24.92 imgs/sec, vgglr=0.0,baselr=2.499999936844688e-05,stagelr=9.999999747378752e-05
+  epoch[0], iter[200], loss[0.026628130997662272], 26.20 imgs/sec, vgglr=0.0,baselr=2.499999936844688e-05,stagelr=9.999999747378752e-05
+  ...
+  ```
+
+  The model checkpoint will be saved in the directory of config.py: 'save_model_path'.
+
+## [Evaluation Process](#contents)
+
+### Evaluation
+
+- running on Ascend
+
+  Before running the command below, please check the checkpoint path used for evaluation. Please set the checkpoint path to be the absolute full path, e.g., "username/openpose/outputs/\*time*\/0-6_30000.ckpt".
+
+  ```python
+  python eval.py --model_path path_to_eval_model.ckpt --imgpath_val ./dataset/val2017 --ann ./dataset/annotations/person_keypoints_val2017.json > eval.log 2>&1 &
+  OR
+  bash scripts/run_eval_ascend.sh
+  ```
+
+  The above python command will run in the background. You can view the results through the file "eval.log". The accuracy of the test dataset will be as follows:
+
+  ```python
+  # grep "AP" eval.log
+
+  {'AP': 0.40030956300341397, 'Ap .5': 0.6658941566481336, 'AP .75': 0.396047897339743, 'AP (M)': 0.3075356543635785, 'AP (L)': 0.533772768618845, 'AR': 0.4519836272040302, 'AR .5': 0.693639798488665, 'AR .75': 0.4570214105793451, 'AR (M)': 0.32155148866429945, 'AR (L)': 0.6330360460795242}
+
+  ```
+
+# [Model Description](#contents)
+
+## [Performance](#contents)
+
+### Evaluation Performance
+
+| Parameters                 | Ascend
+| -------------------------- | -----------------------------------------------------------
+| Model Version              | openpose
+| Resource                   | Ascend 910 ；CPU 2.60GHz，192cores；Memory，755G
+| uploaded Date              | 10/20/2020 (month/day/year)
+| MindSpore Version          | 1.0.1-alpha
+| Training Parameters        | epoch = 60, steps = 30k, batch_size = 10, lr = 0.0001
+| Optimizer                  | Adam
+| Loss Function              | MSE
+| outputs                    | pose
+| Speed                      | 1pc: 29imgs/s
+| Total time                 | 1pc: 30h
+| Checkpoint for Fine tuning | 602.33M (.ckpt file)
+
+
+
+
+
--- a/model_zoo/official/cv/openpose/eval.py
+++ b/model_zoo/official/cv/openpose/eval.py
--- a/model_zoo/official/cv/openpose/export.py
+++ b/model_zoo/official/cv/openpose/export.py
@ -0,0 +1,38 @@
+# Copyright 2020 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+"""export"""
+
+import argparse
+import numpy as np
+from mindspore import Tensor
+from mindspore import context
+from mindspore.train.serialization import load_checkpoint, load_param_into_net, export
+
+from src.openposenet import OpenPoseNet
+
+parser = argparse.ArgumentParser(description='checkpoint export')
+parser.add_argument('--checkpoint_path', type=str, default=None, help='Checkpoint file path')
+args_opt = parser.parse_args()
+
+if __name__ == '__main__':
+    context.set_context(mode=context.GRAPH_MODE, save_graphs=False)
+    # define net
+    net = OpenPoseNet()
+
+    # load checkpoint
+    param_dict = load_checkpoint(args_opt.checkpoint_path)
+    load_param_into_net(net, param_dict)
+    inputs = np.random.uniform(0.0, 1.0, size=[1, 3, 368, 368]).astype(np.float32)
+    export(net, Tensor(inputs), file_name="openpose.air", file_format='AIR')
--- a/model_zoo/official/cv/openpose/scripts/run_distribute_train.sh
+++ b/model_zoo/official/cv/openpose/scripts/run_distribute_train.sh
@ -0,0 +1,61 @@
+#!/bin/bash
+# Copyright 2020 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+
+if [ $# != 1 ]
+then
+    echo "Usage: sh run_distribute_train.sh [RANK_TABLE_FILE]"
+exit 1
+fi
+
+get_real_path(){
+  if [ "${1:0:1}" == "/" ]; then
+    echo "$1"
+  else
+    echo "$(realpath -m $PWD/$1)"
+  fi
+}
+RANK_TABLE_FILE=$(get_real_path $1)
+
+echo $RANK_TABLE_FILE
+
+if [ ! -f $RANK_TABLE_FILE ]
+then
+    echo "error: RANK_TABLE_FILE=$RANK_TABLE_FILE is not a file"
+exit 1
+fi
+
+export DEVICE_NUM=8
+export RANK_SIZE=8
+export RANK_TABLE_FILE=$RANK_TABLE_FILE
+
+for((i=0; i<${DEVICE_NUM}; i++))
+do
+    export DEVICE_ID=$i
+    export RANK_ID=$i
+    rm -rf ./train_parallel$i
+    mkdir ./train_parallel$i
+    cp ../*.py ./train_parallel$i
+    cp -r ../src ./train_parallel$i
+    cd ./train_parallel$i || exit
+    echo "start training for rank $RANK_ID, device $DEVICE_ID"
+    env > env.log
+    python train.py \
+        --train_dir train2017 \
+        --group_size 8 \
+        --train_ann person_keypoints_train2017.json > log.txt 2>&1 &
+    cd ..
+done
+
--- a/model_zoo/official/cv/openpose/scripts/run_eval_ascend.sh
+++ b/model_zoo/official/cv/openpose/scripts/run_eval_ascend.sh
@ -0,0 +1,23 @@
+#!/bin/bash
+# Copyright 2020 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+
+export DEVICE_ID=0
+export RANK_ID=0
+python eval.py \
+  --model_path ./scripts/train_parallel0/checkpoints/ckpt_0/0-60_663.ckpt \
+  --imgpath_val /data0/zhy/dataset/coco/val2017 \
+  --ann /data0/zhy/dataset/coco/annotations/person_keypoints_val2017.json \
+  > eval.log 2>&1 &
--- a/model_zoo/official/cv/openpose/scripts/run_standalone_train.sh
+++ b/model_zoo/official/cv/openpose/scripts/run_standalone_train.sh
@ -0,0 +1,18 @@
+#!/bin/bash
+# Copyright 2020 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+
+cd ..
+python train.py --train_dir train2017 --train_ann person_keypoints_train2017.json > scripts/train.log 2>&1 &
--- a/model_zoo/official/cv/openpose/src/config.py
+++ b/model_zoo/official/cv/openpose/src/config.py
@ -0,0 +1,171 @@
+# Copyright 2020 Huawei Technologies Co., Ltd
+
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+
+# http://www.apache.org/licenses/LICENSE-2.0
+
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+from enum import IntEnum
+
+class JointType(IntEnum):
+    Nose = 0
+
+    Neck = 1
+
+    RightShoulder = 2
+
+    RightElbow = 3
+
+    RightHand = 4
+
+    LeftShoulder = 5
+
+    LeftElbow = 6
+
+    LeftHand = 7
+
+    RightWaist = 8
+
+    RightKnee = 9
+
+    RightFoot = 10
+
+    LeftWaist = 11
+
+    LeftKnee = 12
+
+    LeftFoot = 13
+
+    RightEye = 14
+
+    LeftEye = 15
+
+    RightEar = 16
+
+    LeftEar = 17
+
+params = {
+    # paths
+    'data_dir': '/data0/zhy/dataset/coco',
+    'vgg_path': '/data0/zhy/dataset/coco/vgg19-0-97_5004.ckpt',
+    'save_model_path': './checkpoints/',
+    'load_pretrain': False,
+    'pretrained_model_path': "",
+    # training params
+    'batch_size': 10,
+
+    'lr': 1e-4,
+    'lr_gamma': 0.1,
+    'lr_steps': '100000,200000,250000',
+    'lr_steps_NP': '250000',
+
+    'loss_scale': 16386,
+    'max_epoch_train': 60,
+    'min_keypoints': 5,
+    'min_area': 32 * 32,
+    'insize': 368,
+    'downscale': 8,
+    'paf_sigma': 8,
+    'heatmap_sigma': 7,
+    'eva_num': 100,
+    'keep_checkpoint_max': 5,
+    'log_interval': 100,
+    'ckpt_interval': 663, # 5000,
+
+    'min_box_size': 64,
+    'max_box_size': 512,
+    'min_scale': 0.5,
+    'max_scale': 2.0,
+    'max_rotate_degree': 40,
+    'center_perterb_max': 40,
+
+    # inference params
+    'inference_img_size': 368,
+    'inference_scales': [0.5, 1, 1.5, 2],
+    # 'inference_scales': [1.0],
+    'heatmap_size': 320,
+    'gaussian_sigma': 2.5,
+    'ksize': 17,
+    'n_integ_points': 10,
+    'n_integ_points_thresh': 8,
+    'heatmap_peak_thresh': 0.05,
+    'inner_product_thresh': 0.05,
+    'limb_length_ratio': 1.0,
+    'length_penalty_value': 1,
+    'n_subset_limbs_thresh': 3,
+    'subset_score_thresh': 0.2,
+    'limbs_point': [
+        [JointType.Neck, JointType.RightWaist],
+        [JointType.RightWaist, JointType.RightKnee],
+        [JointType.RightKnee, JointType.RightFoot],
+        [JointType.Neck, JointType.LeftWaist],
+        [JointType.LeftWaist, JointType.LeftKnee],
+        [JointType.LeftKnee, JointType.LeftFoot],
+        [JointType.Neck, JointType.RightShoulder],
+        [JointType.RightShoulder, JointType.RightElbow],
+        [JointType.RightElbow, JointType.RightHand],
+        [JointType.RightShoulder, JointType.RightEar],
+        [JointType.Neck, JointType.LeftShoulder],
+        [JointType.LeftShoulder, JointType.LeftElbow],
+        [JointType.LeftElbow, JointType.LeftHand],
+        [JointType.LeftShoulder, JointType.LeftEar],
+        [JointType.Neck, JointType.Nose],
+        [JointType.Nose, JointType.RightEye],
+        [JointType.Nose, JointType.LeftEye],
+        [JointType.RightEye, JointType.RightEar],
+        [JointType.LeftEye, JointType.LeftEar]
+    ],
+    'joint_indices': [
+        JointType.Nose,
+        JointType.LeftEye,
+        JointType.RightEye,
+        JointType.LeftEar,
+        JointType.RightEar,
+        JointType.LeftShoulder,
+        JointType.RightShoulder,
+        JointType.LeftElbow,
+        JointType.RightElbow,
+        JointType.LeftHand,
+        JointType.RightHand,
+        JointType.LeftWaist,
+        JointType.RightWaist,
+        JointType.LeftKnee,
+        JointType.RightKnee,
+        JointType.LeftFoot,
+        JointType.RightFoot
+    ],
+
+    # face params
+    'face_inference_img_size': 368,
+    'face_heatmap_peak_thresh': 0.1,
+    'face_crop_scale': 1.5,
+    'face_line_indices': [
+        [0, 1], [1, 2], [2, 3], [3, 4], [4, 5], [5, 6], [6, 7], [7, 8], [8, 9], [9, 10], [10, 11], [11, 12], [12, 13], [13, 14], [14, 15], [15, 16], # 轮廓
+        [17, 18], [18, 19], [19, 20], [20, 21],
+        [22, 23], [23, 24], [24, 25], [25, 26],
+        [27, 28], [28, 29], [29, 30],
+        [31, 32], [32, 33], [33, 34], [34, 35],
+        [36, 37], [37, 38], [38, 39], [39, 40], [40, 41], [41, 36],
+        [42, 43], [43, 44], [44, 45], [45, 46], [46, 47], [47, 42],
+        [48, 49], [49, 50], [50, 51], [51, 52], [52, 53], [53, 54], [54, 55], [55, 56], [56, 57], [57, 58], [58, 59], [59, 48], # 唇外廓
+        [60, 61], [61, 62], [62, 63], [63, 64], [64, 65], [65, 66], [66, 67], [67, 60]
+    ],
+
+    # hand params
+    'hand_inference_img_size': 368,
+    'hand_heatmap_peak_thresh': 0.1,
+    'fingers_indices': [
+        [[0, 1], [1, 2], [2, 3], [3, 4]],
+        [[0, 5], [5, 6], [6, 7], [7, 8]],
+        [[0, 9], [9, 10], [10, 11], [11, 12]],
+        [[0, 13], [13, 14], [14, 15], [15, 16]],
+        [[0, 17], [17, 18], [18, 19], [19, 20]],
+    ],
+}
--- a/model_zoo/official/cv/openpose/src/dataset.py
+++ b/model_zoo/official/cv/openpose/src/dataset.py
--- a/model_zoo/official/cv/openpose/src/gen_ignore_mask.py
+++ b/model_zoo/official/cv/openpose/src/gen_ignore_mask.py
@ -0,0 +1,133 @@
+# Copyright 2020 Huawei Technologies Co., Ltd
+
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+
+# http://www.apache.org/licenses/LICENSE-2.0
+
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+import os
+import argparse
+import cv2
+
+import numpy as np
+from tqdm import tqdm
+from pycocotools.coco import COCO as ReadJson
+
+from config import params
+
+class DataLoader():
+    def __init__(self, coco, dir_name, data_mode='train'):
+        self.train = coco
+        self.dir_name = dir_name
+        assert data_mode in ['train', 'val'], 'Data loading mode is invalid.'
+        self.mode = data_mode
+        self.catIds = coco.getCatIds()  # catNms=['person']
+        self.imgIds = sorted(coco.getImgIds(catIds=self.catIds))
+
+    def __len__(self):
+        return len(self.imgIds)
+
+    def gen_masks(self, image, anns):
+        _mask_all = np.zeros(image.shape[:2], 'bool')
+        _mask_miss = np.zeros(image.shape[:2], 'bool')
+        for ann in anns:
+            mask = self.train.annToMask(ann).astype('bool')
+            if ann['iscrowd'] == 1:
+                intxn = _mask_all & mask
+                _mask_miss = np.bitwise_or(_mask_miss.astype(int), np.subtract(mask, intxn, dtype=np.int32))
+                _mask_all = np.bitwise_or(_mask_all.astype(int), mask.astype(int))
+            elif ann['num_keypoints'] < params['min_keypoints'] or ann['area'] <= params['min_area']:
+                _mask_all = np.bitwise_or(_mask_all.astype(int), mask.astype(int))
+                _mask_miss = np.bitwise_or(_mask_miss.astype(int), mask.astype(int))
+            else:
+                _mask_all = np.bitwise_or(_mask_all.astype(int), mask.astype(int))
+        return _mask_all, _mask_miss
+
+    def dwaw_gen_masks(self, image, mask, color=(0, 0, 1)):
+        bimsk = np.repeat(mask[:, :, np.newaxis], 3, axis=2)
+        mskd = image * bimsk.astype(np.int32)
+        clmsk = np.ones(bimsk.shape) * bimsk
+        for ind in range(3):
+            clmsk[:, :, ind] = clmsk[:, :, ind] * color[ind] * 255
+        image = image + 0.7 * clmsk - 0.7 * mskd
+        return image.astype(np.uint8)
+
+    def draw_masks_and_keypoints(self, image, anns):
+        for ann in anns:
+            # masks
+            mask = self.train.annToMask(ann).astype(np.uint8)
+            if ann['iscrowd'] == 1:
+                color = (0, 0, 1)
+            elif ann['num_keypoints'] == 0:
+                color = (0, 1, 0)
+            else:
+                color = (1, 0, 0)
+            bimsk = np.repeat(mask[:, :, np.newaxis], 3, axis=2)
+            mskd = image * bimsk.astype(np.int32)
+            clmsk = np.ones(bimsk.shape) * bimsk
+            for ind in range(3):
+                clmsk[:, :, ind] = clmsk[:, :, ind] * color[ind] * 255
+            image = image + 0.7 * clmsk - 0.7 * mskd
+
+            # keypoints
+            for x, y, v in np.array(ann['keypoints']).reshape(-1, 3):
+                if v == 1:
+                    cv2.circle(image, (x, y), 3, (255, 255, 0), -1)
+                elif v == 2:
+                    cv2.circle(image, (x, y), 3, (255, 0, 255), -1)
+        return image.astype(np.uint8)
+
+    def get_img_annotation(self, ind=None, image_id=None):
+        if ind is not None:
+            image_id = self.imgIds[ind]
+
+        anno_ids = self.train.getAnnIds(imgIds=[image_id])
+        _annotations = self.train.loadAnns(anno_ids)
+
+        img_file = os.path.join(params['data_dir'], self.dir_name, self.train.loadImgs([image_id])[0]['file_name'])
+        _image = cv2.imread(img_file)
+        return _image, _annotations, image_id
+
+
+if __name__ == '__main__':
+    parser = argparse.ArgumentParser()
+    parser.add_argument('--vis', action='store_true', help='visualize annotations and ignore masks')
+    parser.add_argument('--train_ann', type=str, help='train annotations json')
+    parser.add_argument('--val_ann', type=str, help='val annotations json')
+    parser.add_argument('--train_dir', type=str, help='name of train dir')
+    parser.add_argument('--val_dir', type=str, help='name of val dir')
+    args = parser.parse_args()
+    path_list = [args.train_ann, args.val_ann, args.train_dir, args.val_dir]
+    for index, mode in enumerate(['train', 'val']):
+        train = ReadJson(path_list[index])
+        data_loader = DataLoader(train, path_list[index+2], mode=mode)
+
+        save_dir = os.path.join(params['data_dir'], 'ignore_mask_{}'.format(mode))
+        if not os.path.exists(save_dir):
+            os.makedirs(save_dir)
+
+        for i in tqdm(range(len(data_loader))):
+            img, annotations, img_id = data_loader.get_img_annotation(ind=i)
+            mask_all, mask_miss = data_loader.gen_masks(img, annotations)
+
+            if args.vis:
+                ann_img = data_loader.draw_masks_and_keypoints(img, annotations)
+                msk_img = data_loader.dwaw_gen_masks(img, mask_miss)
+                cv2.imshow('image', np.hstack((ann_img, msk_img)))
+                k = cv2.waitKey()
+                if k == ord('q'):
+                    break
+                elif k == ord('s'):
+                    cv2.imwrite('aaa.png', np.hstack((ann_img, msk_img)))
+
+            if np.any(mask_miss) and not args.vis:
+                mask_miss = mask_miss.astype(np.uint8) * 255
+                save_path = os.path.join(save_dir, '{:012d}.png'.format(img_id))
+                cv2.imwrite(save_path, mask_miss)
--- a/model_zoo/official/cv/openpose/src/loss.py
+++ b/model_zoo/official/cv/openpose/src/loss.py
@ -0,0 +1,207 @@
+# Copyright 2020 Huawei Technologies Co., Ltd
+
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+
+# http://www.apache.org/licenses/LICENSE-2.0
+
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+import time
+import mindspore.nn as nn
+from mindspore.ops import operations as P
+from mindspore.nn.loss.loss import _Loss
+from mindspore.train.callback import Callback
+from mindspore.ops import functional as F
+from mindspore.ops import composite as C
+from mindspore.communication.management import get_group_size
+from mindspore.context import ParallelMode
+from mindspore import context
+
+context.set_context(mode=context.GRAPH_MODE, save_graphs=True)
+time_stamp_init = False
+time_stamp_first = 0
+grad_scale = C.MultitypeFuncGraph("grad_scale")
+reciprocal = P.Reciprocal()
+GRADIENT_CLIP_TYPE = 1
+GRADIENT_CLIP_VALUE = 1.0
+
+@grad_scale.register("Tensor", "Tensor")
+def tensor_grad_scale(scale, grad):
+    return grad * F.cast(reciprocal(scale), F.dtype(grad))
+
+@grad_scale.register("Tensor", "RowTensor")
+def tensor_grad_scale_row_tensor(scale, grad):
+    return RowTensor(grad.indices,
+                     grad.values * F.cast(reciprocal(scale), F.dtype(grad.values)),
+                     grad.dense_shape)
+clip_grad = C.MultitypeFuncGraph("clip_grad")
+
+@clip_grad.register("Number", "Number", "Tensor")
+class openpose_loss(_Loss):
+    def __init__(self):
+        super(openpose_loss, self).__init__()
+        self.expand_dims = P.ExpandDims()
+        self.tile = P.Tile()
+        self.mul = P.Mul()
+        self.l2_loss = P.L2Loss()
+        self.square = P.Square()
+        self.reduceMean = P.ReduceMean()
+        self.reduceSum = P.ReduceSum()
+        self.print = P.Print()
+        self.shape = P.Shape()
+        self.maxoftensor = P.ArgMaxWithValue(-1)
+
+    def mean_square_error(self, map1, map2, mask=None):
+        # print("mask", mask)
+        # import pdb; pdb.set_trace()
+        if mask is None:
+            mse = self.reduceMean((map1 - map2) ** 2)
+            return mse
+
+        squareMap = self.square(map1 - map2)
+        squareMap_mask = self.mul(squareMap, mask)
+        mse = self.reduceMean(squareMap_mask)
+        return mse
+
+    def construct(self, logit_paf, logit_heatmap, gt_paf, gt_heatmap, ignore_mask):
+        # Input
+        # ignore_mask, make sure the ignore_mask the 0-1 array instead of the bool-false array
+        heatmaps_loss = []
+        pafs_loss = []
+        total_loss = 0
+
+        paf_masks = self.tile(self.expand_dims(ignore_mask, 1), (1, self.shape(gt_paf)[1], 1, 1))
+        heatmap_masks = self.tile(self.expand_dims(ignore_mask, 1), (1, self.shape(gt_heatmap)[1], 1, 1))
+
+        paf_masks = F.stop_gradient(paf_masks)
+        heatmap_masks = F.stop_gradient(heatmap_masks)
+        for logit_paf_t, logit_heatmap_t in zip(logit_paf, logit_heatmap):
+            # TEST
+            # tensor1 -- tuple
+            # tensor1 = self.maxoftensor(logit_paf_t)[1]
+            # tensor2 = self.maxoftensor(logit_heatmap_t)[1]
+            # tensor3 = self.maxoftensor(tensor1)[1]
+            # tensor4 = self.maxoftensor(tensor2)[1]
+            # self.print("paf",tensor3)
+            # self.print("heatmaps",tensor2)
+            pafs_loss_t = self.mean_square_error(logit_paf_t, gt_paf, paf_masks)
+            heatmaps_loss_t = self.mean_square_error(logit_heatmap_t, gt_heatmap, heatmap_masks)
+
+            total_loss += pafs_loss_t + heatmaps_loss_t
+            heatmaps_loss.append(heatmaps_loss_t)
+            pafs_loss.append(pafs_loss_t)
+
+        return total_loss, heatmaps_loss, pafs_loss
+
+class Depend_network(nn.Cell):
+    def __init__(self, network):
+        super(Depend_network, self).__init__()
+        self.network = network
+
+    def construct(self, *args):
+        loss, _, _ = self.network(*args) # loss, heatmaps_loss, pafs_loss
+        return loss
+
+class TrainingWrapper(nn.Cell):
+    def __init__(self, network, optimizer, sens=1):
+        super(TrainingWrapper, self).__init__(auto_prefix=False)
+        self.network = network
+        self.depend_network = Depend_network(network)
+        # self.weights = ms.ParameterTuple(network.trainable_params())
+        self.weights = optimizer.parameters
+        self.optimizer = optimizer
+        self.grad = C.GradOperation(get_by_list=True, sens_param=True)
+        self.sens = sens
+        self.reducer_flag = False
+        self.grad_reducer = None
+        self.print = P.Print()
+        self.parallel_mode = context.get_auto_parallel_context("parallel_mode")
+        if self.parallel_mode in [ParallelMode.DATA_PARALLEL, ParallelMode.HYBRID_PARALLEL]:
+            self.reducer_flag = True
+        if self.reducer_flag:
+            mean = context.get_auto_parallel_context("gradients_mean")
+            #if mean.get_device_num_is_set():
+            # if mean:
+                #degree = context.get_auto_parallel_context("device_num")
+            # else:
+            degree = get_group_size()
+            self.grad_reducer = nn.DistributedGradReducer(optimizer.parameters, mean, degree)
+
+    def construct(self, *args):
+        weights = self.weights
+        loss, heatmaps_loss, pafs_loss = self.network(*args)
+        sens = P.Fill()(P.DType()(loss), P.Shape()(loss), self.sens)
+        #grads = self.grad(self.network, weights)(*args, sens)
+        grads = self.grad(self.depend_network, weights)(*args, sens)
+        if self.reducer_flag:
+            grads = self.grad_reducer(grads)
+        #return F.depend(loss, self.optimizer(grads))
+        # for grad in grads:
+            # self.print(grad)
+        loss = F.depend(loss, self.optimizer(grads))
+        return loss, heatmaps_loss, pafs_loss
+
+class BuildTrainNetwork(nn.Cell):
+    def __init__(self, network, criterion):
+        super(BuildTrainNetwork, self).__init__()
+        self.network = network
+        self.criterion = criterion
+
+    def construct(self, input_data, gt_paf, gt_heatmap, mask):
+        logit_pafs, logit_heatmap = self.network(input_data)
+        loss, _, _ = self.criterion(logit_pafs, logit_heatmap, gt_paf, gt_heatmap, mask)
+        return loss
+
+class LossCallBack(Callback):
+    """
+    Monitor the loss in training.
+    If the loss is NAN or INF terminating training.
+    Note:
+        If per_print_times is 0 do not print loss.
+    Args:
+        per_print_times (int): Print loss every times. Default: 1.
+    """
+
+    def __init__(self, per_print_times=1):
+        super(LossCallBack, self).__init__()
+        if not isinstance(per_print_times, int) or per_print_times < 0:
+            raise ValueError("print_step must be int and >= 0.")
+        self._per_print_times = per_print_times
+        self.count = 0
+        self.loss_sum = 0
+
+        global time_stamp_init, time_stamp_first
+        if not time_stamp_init:
+            time_stamp_first = time.time()
+            time_stamp_init = True
+
+    def step_end(self, run_context):
+        cb_params = run_context.original_args()
+        loss = cb_params.net_outputs.asnumpy()
+
+        self.count += 1
+        self.loss_sum += float(loss)
+
+        cur_step_in_epoch = (cb_params.cur_step_num - 1) % cb_params.batch_num + 1
+
+        if self.count >= 1:
+            global time_stamp_first
+            time_stamp_current = time.time()
+
+            loss = self.loss_sum/self.count
+
+            loss_file = open("./loss.log", "a+")
+            loss_file.write("%lu epoch: %s step: %s ,loss: %.5f" %
+                            (time_stamp_current - time_stamp_first, cb_params.cur_epoch_num, cur_step_in_epoch,
+                             loss))
+            loss_file.write("\n")
+            loss_file.close()
+
+            self.count = 0
+            self.loss_sum = 0
--- a/model_zoo/official/cv/openpose/src/openposenet.py
+++ b/model_zoo/official/cv/openpose/src/openposenet.py
--- a/model_zoo/official/cv/openpose/src/utils.py
+++ b/model_zoo/official/cv/openpose/src/utils.py
@ -0,0 +1,157 @@
+# Copyright 2020 Huawei Technologies Co., Ltd
+
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+
+# http://www.apache.org/licenses/LICENSE-2.0
+
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+import argparse
+import os
+import time
+import numpy as np
+
+from mindspore.train.serialization import load_checkpoint, load_param_into_net
+from mindspore.train.callback import LossMonitor
+from mindspore.common.tensor import Tensor
+
+from src.config import params
+
+class MyLossMonitor(LossMonitor):
+    def __init__(self, per_print_times=1):
+        super(MyLossMonitor, self).__init__()
+        self._per_print_times = per_print_times
+        self._start_time = time.time()
+        self._loss_list = []
+
+    def step_end(self, run_context):
+        cb_params = run_context.original_args()
+        loss = cb_params.net_outputs
+
+        if isinstance(loss, (tuple, list)):
+            if isinstance(loss[0], Tensor) and isinstance(loss[0].asnumpy(), np.ndarray):
+                loss = loss[0]
+
+        if isinstance(loss, Tensor) and isinstance(loss.asnumpy(), np.ndarray):
+            loss = np.mean(loss.asnumpy())
+
+        cur_step_in_epoch = (cb_params.cur_step_num - 1) % cb_params.batch_num + 1
+
+        if isinstance(loss, float) and (np.isnan(loss) or np.isinf(loss)):
+            raise ValueError("epoch: {} step: {}. Invalid loss, terminating training.".format(
+                cb_params.cur_epoch_num, cur_step_in_epoch))
+        if self._per_print_times != 0 and cb_params.cur_step_num % self._per_print_times == 0:
+            # print("epoch: %s step: %s, loss is %s, step time: %.3f s." % (cb_params.cur_epoch_num, cur_step_in_epoch,
+            #                                                               loss,
+            #                                                               (time.time() - self._start_time)), flush=True)
+            self._loss_list.append(loss)
+            if cb_params.cur_step_num % 100 == 0:
+                print("epoch: %s, steps: [%s] mean loss is: %s"%(cb_params.cur_epoch_num, cur_step_in_epoch,
+                                                                 np.array(self._loss_list).mean()), flush=True)
+                self._loss_list = []
+
+        self._start_time = time.time()
+
+
+def parse_args():
+    """Parse train arguments."""
+    parser = argparse.ArgumentParser('mindspore openpose training')
+
+    # dataset related
+    parser.add_argument('--train_dir', type=str, default='train2017', help='train data dir')
+    parser.add_argument('--train_ann', type=str, default='person_keypoints_train2017.json',
+                        help='train annotations json')
+    parser.add_argument('--group_size', type=int, default=1, help='world size of distributed')
+
+    args, _ = parser.parse_known_args()
+
+    args.jsonpath_train = os.path.join(params['data_dir'], 'annotations/' + args.train_ann)
+    args.imgpath_train = os.path.join(params['data_dir'], args.train_dir)
+    args.maskpath_train = os.path.join(params['data_dir'], 'ignore_mask_train')
+
+    return args
+
+
+def get_lr(lr, lr_gamma, steps_per_epoch, max_epoch_train, lr_steps, group_size):
+    lr_stage = np.array([lr] * steps_per_epoch * max_epoch_train).astype('f')
+    for step in lr_steps:
+        step //= group_size
+        lr_stage[step:] *= lr_gamma
+
+    lr_base = lr_stage.copy()
+    lr_base = lr_base / 4
+
+    lr_vgg = lr_base.copy()
+    vgg_freeze_step = 2000
+    lr_vgg[:vgg_freeze_step] = 0
+    return lr_stage, lr_base, lr_vgg
+
+# zhang add
+def adjust_learning_rate(init_lr, lr_gamma, steps_per_epoch, max_epoch_train, stepvalues):
+    lr_stage = np.array([init_lr] * steps_per_epoch * max_epoch_train).astype('f')
+    for epoch in stepvalues:
+        lr_stage[epoch * steps_per_epoch:] *= lr_gamma
+
+    lr_base = lr_stage.copy()
+    lr_base = lr_base / 4
+
+    lr_vgg = lr_base.copy()
+    vgg_freeze_step = 2000
+    lr_vgg[:vgg_freeze_step] = 0
+    return lr_stage, lr_base, lr_vgg
+
+
+def load_model(test_net, model_path):
+    if model_path:
+        param_dict = load_checkpoint(model_path)
+        # print(type(param_dict))
+        param_dict_new = {}
+        for key, values in param_dict.items():
+            # print('key:', key)
+            if key.startswith('moment'):
+                continue
+            elif key.startswith('network.'):
+                param_dict_new[key[8:]] = values
+
+            # else:
+            # param_dict_new[key] = values
+        load_param_into_net(test_net, param_dict_new)
+
+
+class show_loss_list():
+    def __init__(self, name):
+        self.loss_list = np.zeros(6).astype('f')
+        self.sums = 0
+        self.name = name
+
+    def add(self, list_of_tensor):
+        self.sums += 1
+        for i, loss_tensor in enumerate(list_of_tensor):
+            self.loss_list[i] += loss_tensor.asnumpy()
+
+    def show(self):
+        print(self.name + ' stage_loss:', self.loss_list / (self.sums + 1e-8), flush=True)
+        self.loss_list = np.zeros(6).astype('f')
+        self.sums = 0
+
+
+class AverageMeter():
+    def __init__(self):
+        self.loss = 0
+        self.sum = 0
+
+    def add(self, tensor):
+        self.sum += 1
+        self.loss += tensor.asnumpy()
+
+    def meter(self):
+        avergeLoss = self.loss / (self.sum + 1e-8)
+        self.loss = 0
+        self.sum = 0
+        return avergeLoss
--- a/model_zoo/official/cv/openpose/train.py
+++ b/model_zoo/official/cv/openpose/train.py
@ -0,0 +1,124 @@
+# Copyright 2020 Huawei Technologies Co., Ltd
+
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+
+# http://www.apache.org/licenses/LICENSE-2.0
+
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+import os
+
+from mindspore import context
+from mindspore.context import ParallelMode
+from mindspore.communication.management import init, get_rank, get_group_size
+from mindspore.train import Model
+from mindspore.train.callback import ModelCheckpoint, CheckpointConfig, TimeMonitor
+from mindspore.train.loss_scale_manager import FixedLossScaleManager
+from mindspore.nn.optim import Adam
+
+from src.dataset import create_dataset
+from src.openposenet import OpenPoseNet
+from src.loss import openpose_loss, BuildTrainNetwork
+from src.config import params
+from src.utils import parse_args, get_lr, load_model, MyLossMonitor
+
+context.set_context(mode=context.GRAPH_MODE, device_target="Ascend", save_graphs=False)
+
+def train():
+    """Train function."""
+    args = parse_args()
+
+    args.outputs_dir = params['save_model_path']
+
+    if args.group_size > 1:
+        init()
+        context.set_auto_parallel_context(device_num=get_group_size(), parallel_mode=ParallelMode.DATA_PARALLEL,
+                                          gradients_mean=True)
+        args.outputs_dir = os.path.join(args.outputs_dir, "ckpt_{}/".format(str(get_rank())))
+        args.rank = get_rank()
+    else:
+        args.outputs_dir = os.path.join(args.outputs_dir, "ckpt_0/")
+        args.rank = 0
+
+    # with out loss_scale
+    if args.group_size > 1:
+        args.loss_scale = params['loss_scale'] / 2
+        args.lr_steps = list(map(int, params["lr_steps_NP"].split(',')))
+    else:
+        args.loss_scale = params['loss_scale']
+        args.lr_steps = list(map(int, params["lr_steps"].split(',')))
+
+    # create network
+    print('start create network')
+    criterion = openpose_loss()
+    criterion.add_flags_recursive(fp32=True)
+    network = OpenPoseNet(vggpath=params['vgg_path'])
+    # network.add_flags_recursive(fp32=True)
+
+    if params["load_pretrain"]:
+        print("load pretrain model:", params["pretrained_model_path"])
+        load_model(network, params["pretrained_model_path"])
+    train_net = BuildTrainNetwork(network, criterion)
+
+    # create dataset
+    if os.path.exists(args.jsonpath_train) and os.path.exists(args.imgpath_train) \
+            and os.path.exists(args.maskpath_train):
+        print('start create dataset')
+    else:
+        print('Error: wrong data path')
+
+
+    num_worker = 20 if args.group_size > 1 else 48
+    de_dataset_train = create_dataset(args.jsonpath_train, args.imgpath_train, args.maskpath_train,
+                                      batch_size=params['batch_size'],
+                                      rank=args.rank,
+                                      group_size=args.group_size,
+                                      num_worker=num_worker,
+                                      multiprocessing=True,
+                                      shuffle=True,
+                                      repeat_num=1)
+    steps_per_epoch = de_dataset_train.get_dataset_size()
+    print("steps_per_epoch: ", steps_per_epoch)
+
+    # lr scheduler
+    lr_stage, lr_base, lr_vgg = get_lr(params['lr'] * args.group_size,
+                                       params['lr_gamma'],
+                                       steps_per_epoch,
+                                       params["max_epoch_train"],
+                                       args.lr_steps,
+                                       args.group_size)
+    vgg19_base_params = list(filter(lambda x: 'base.vgg_base' in x.name, train_net.trainable_params()))
+    base_params = list(filter(lambda x: 'base.conv' in x.name, train_net.trainable_params()))
+    stages_params = list(filter(lambda x: 'base' not in x.name, train_net.trainable_params()))
+
+    group_params = [{'params': vgg19_base_params, 'lr': lr_vgg},
+                    {'params': base_params, 'lr': lr_base},
+                    {'params': stages_params, 'lr': lr_stage}]
+
+    opt = Adam(group_params, loss_scale=args.loss_scale)
+
+    train_net.set_train(True)
+    loss_scale_manager = FixedLossScaleManager(args.loss_scale, drop_overflow_update=False)
+
+    model = Model(train_net, optimizer=opt, loss_scale_manager=loss_scale_manager)
+
+    params['ckpt_interval'] = max(steps_per_epoch, params['ckpt_interval'])
+    config_ck = CheckpointConfig(save_checkpoint_steps=params['ckpt_interval'],
+                                 keep_checkpoint_max=params["keep_checkpoint_max"])
+    ckpoint_cb = ModelCheckpoint(prefix='{}'.format(args.rank), directory=args.outputs_dir, config=config_ck)
+    time_cb = TimeMonitor(data_size=de_dataset_train.get_dataset_size())
+    callback_list = [MyLossMonitor(), time_cb, ckpoint_cb]
+    print("============== Starting Training ==============")
+    model.train(params["max_epoch_train"], de_dataset_train, callbacks=callback_list,
+                dataset_sink_mode=False)
+
+
+if __name__ == "__main__":
+    # mindspore.common.seed.set_seed(1)
+    train()