!12184 Fix some spelling mistakes in model_zoo training script

From: @zlq2020
Reviewed-by: @guoqi1024
Signed-off-by: @guoqi1024
pull/12184/MERGE
mindspore-ci-bot 4 years ago committed by Gitee
commit adfe6e1bc2

@ -20,7 +20,6 @@
# [MobileNetV2 Description](#contents) # [MobileNetV2 Description](#contents)
MobileNetV2 is tuned to mobile phone CPUs through a combination of hardware- aware network architecture search (NAS) complemented by the NetAdapt algorithm and then subsequently improved through novel architecture advances.Nov 20, 2019. MobileNetV2 is tuned to mobile phone CPUs through a combination of hardware- aware network architecture search (NAS) complemented by the NetAdapt algorithm and then subsequently improved through novel architecture advances.Nov 20, 2019.
[Paper](https://arxiv.org/pdf/1905.02244) Howard, Andrew, Mark Sandler, Grace Chu, Liang-Chieh Chen, Bo Chen, Mingxing Tan, Weijun Wang et al. "Searching for MobileNetV2." In Proceedings of the IEEE International Conference on Computer Vision, pp. 1314-1324. 2019. [Paper](https://arxiv.org/pdf/1905.02244) Howard, Andrew, Mark Sandler, Grace Chu, Liang-Chieh Chen, Bo Chen, Mingxing Tan, Weijun Wang et al. "Searching for MobileNetV2." In Proceedings of the IEEE International Conference on Computer Vision, pp. 1314-1324. 2019.
@ -38,11 +37,10 @@ The overall network architecture of MobileNetV2 is show below:
Dataset used: [imagenet](http://www.image-net.org/) Dataset used: [imagenet](http://www.image-net.org/)
- Dataset size: ~125G, 1.2W colorful images in 1000 classes - Dataset size: ~125G, 1.2W colorful images in 1000 classes
- Train: 120G, 1.2W images - Train: 120G, 1.2W images
- Test: 5G, 50000 images - Test: 5G, 50000 images
- Data format: RGB images. - Data format: RGB images.
- Note: Data will be processed in src/dataset.py - Note: Data will be processed in src/dataset.py
# [Features](#contents) # [Features](#contents)
@ -54,14 +52,13 @@ For FP16 operators, if the input data type is FP32, the backend of MindSpore wil
# [Environment Requirements](#contents) # [Environment Requirements](#contents)
- Hardware:Ascend - Hardware:Ascend
- Prepare hardware environment with Ascend. If you want to try Ascend , please send the [application form](https://obs-9be7.obs.cn-east-2.myhuaweicloud.com/file/other/Ascend%20Model%20Zoo%E4%BD%93%E9%AA%8C%E8%B5%84%E6%BA%90%E7%94%B3%E8%AF%B7%E8%A1%A8.docx) to ascend@huawei.com. Once approved, you can get the resources. - Prepare hardware environment with Ascend. If you want to try Ascend , please send the [application form](https://obs-9be7.obs.cn-east-2.myhuaweicloud.com/file/other/Ascend%20Model%20Zoo%E4%BD%93%E9%AA%8C%E8%B5%84%E6%BA%90%E7%94%B3%E8%AF%B7%E8%A1%A8.docx) to ascend@huawei.com. Once approved, you can get the resources.
- Framework - Framework
- [MindSpore](https://www.mindspore.cn/install/en) - [MindSpore](https://www.mindspore.cn/install/en)
- For more information, please check the resources below - For more information, please check the resources below
- [MindSpore Tutorials](https://www.mindspore.cn/tutorial/training/en/master/index.html) - [MindSpore Tutorials](https://www.mindspore.cn/tutorial/training/en/master/index.html)
- [MindSpore Python API](https://www.mindspore.cn/doc/api_python/en/master/index.html) - [MindSpore Python API](https://www.mindspore.cn/doc/api_python/en/master/index.html)
# [Script description](#contents) # [Script description](#contents)
## [Script and sample code](#contents) ## [Script and sample code](#contents)
@ -84,7 +81,6 @@ For FP16 operators, if the input data type is FP32, the backend of MindSpore wil
├── export.py # export checkpoint files into air/onnx ├── export.py # export checkpoint files into air/onnx
``` ```
## [Script Parameters](#contents) ## [Script Parameters](#contents)
Parameters for both training and evaluation can be set in config.py Parameters for both training and evaluation can be set in config.py
@ -113,13 +109,11 @@ Parameters for both training and evaluation can be set in config.py
### Usage ### Usage
You can start training using python or shell scripts. The usage of shell scripts as follows: You can start training using python or shell scripts. The usage of shell scripts as follows:
- bash run_train.sh [Ascend] [RANK_TABLE_FILE] [DATASET_PATH] [PRETRAINED_CKPT_PATH]\(optional) - bash run_train.sh [Ascend] [RANK_TABLE_FILE] [DATASET_PATH] [PRETRAINED_CKPT_PATH]\(optional)
- bash run_train.sh [GPU] [DEVICE_ID_LIST] [DATASET_PATH] [PRETRAINED_CKPT_PATH]\(optional) - bash run_train.sh [GPU] [DEVICE_ID_LIST] [DATASET_PATH] [PRETRAINED_CKPT_PATH]\(optional)
### Launch ### Launch
``` bash ``` bash
@ -133,7 +127,7 @@ You can start training using python or shell scripts. The usage of shell scripts
Training result will be stored in the example path. Checkpoints trained by `Ascend` will be stored at `./train/device$i/checkpoint` by default, and training log will be redirected to `./train/device$i/train.log`. Checkpoints trained by `GPU` will be stored in `./train/checkpointckpt_$i` by default, and training log will be redirected to `./train/train.log`. Training result will be stored in the example path. Checkpoints trained by `Ascend` will be stored at `./train/device$i/checkpoint` by default, and training log will be redirected to `./train/device$i/train.log`. Checkpoints trained by `GPU` will be stored in `./train/checkpointckpt_$i` by default, and training log will be redirected to `./train/train.log`.
`train.log` is as follows: `train.log` is as follows:
``` ``` bash
epoch: [ 0/200], step:[ 624/ 625], loss:[5.258/5.258], time:[140412.236], lr:[0.100] epoch: [ 0/200], step:[ 624/ 625], loss:[5.258/5.258], time:[140412.236], lr:[0.100]
epoch time: 140522.500, per step time: 224.836, avg loss: 5.258 epoch time: 140522.500, per step time: 224.836, avg loss: 5.258
epoch: [ 1/200], step:[ 624/ 625], loss:[3.917/3.917], time:[138221.250], lr:[0.200] epoch: [ 1/200], step:[ 624/ 625], loss:[3.917/3.917], time:[138221.250], lr:[0.200]
@ -150,7 +144,7 @@ You can start training using python or shell scripts. The usage of shell scripts
### Launch ### Launch
``` ``` bash
# infer example # infer example
shell: shell:
Ascend: sh run_infer_quant.sh Ascend ~/imagenet/val/ ~/train/mobilenet-60_1601.ckpt Ascend: sh run_infer_quant.sh Ascend ~/imagenet/val/ ~/train/mobilenet-60_1601.ckpt
@ -160,9 +154,9 @@ You can start training using python or shell scripts. The usage of shell scripts
### Result ### Result
Inference result will be stored in the example path, you can find result like the followings in `./val/infer.log`. Inference result will be stored in the example path, you can find result like the following in `./val/infer.log`.
``` ``` bash
result: {'acc': 0.71976314102564111} result: {'acc': 0.71976314102564111}
``` ```

@ -35,7 +35,7 @@ def parse_args():
>>> parse_args() >>> parse_args()
""" """
parser = ArgumentParser(description="mindspore distributed training launch " parser = ArgumentParser(description="mindspore distributed training launch "
"helper utilty that will spawn up " "helper utility that will spawn up "
"multiple distributed processes") "multiple distributed processes")
parser.add_argument("--nproc_per_node", type=int, default=1, parser.add_argument("--nproc_per_node", type=int, default=1,
help="The number of processes to launch on each node, " help="The number of processes to launch on each node, "

@ -37,19 +37,18 @@ The overall network architecture of Resnet50 is show below:
Dataset used: [ImageNet2012](http://www.image-net.org/) Dataset used: [ImageNet2012](http://www.image-net.org/)
- Dataset size 224*224 colorful images in 1000 classes - Dataset size 224*224 colorful images in 1000 classes
- Train1,281,167 images - Train1,281,167 images
- Test 50,000 images - Test 50,000 images
- Data formatjpeg - Data formatjpeg
- NoteData will be processed in dataset.py - NoteData will be processed in dataset.py
- Download the dataset, the directory structure is as follows: - Download the dataset, the directory structure is as follows:
``` ```python
└─dataset └─dataset
├─ilsvrc # train dataset ├─ilsvrc # train dataset
└─validation_preprocess # evaluate dataset └─validation_preprocess # evaluate dataset
``` ```
# [Features](#contents) # [Features](#contents)
## [Mixed Precision](#contents) ## [Mixed Precision](#contents)
@ -60,14 +59,13 @@ For FP16 operators, if the input data type is FP32, the backend of MindSpore wil
# [Environment Requirements](#contents) # [Environment Requirements](#contents)
- Hardware:Ascend - Hardware:Ascend
- Prepare hardware environment with Ascend. If you want to try Ascend , please send the [application form](https://obs-9be7.obs.cn-east-2.myhuaweicloud.com/file/other/Ascend%20Model%20Zoo%E4%BD%93%E9%AA%8C%E8%B5%84%E6%BA%90%E7%94%B3%E8%AF%B7%E8%A1%A8.docx) to ascend@huawei.com. Once approved, you can get the resources. - Prepare hardware environment with Ascend. If you want to try Ascend , please send the [application form](https://obs-9be7.obs.cn-east-2.myhuaweicloud.com/file/other/Ascend%20Model%20Zoo%E4%BD%93%E9%AA%8C%E8%B5%84%E6%BA%90%E7%94%B3%E8%AF%B7%E8%A1%A8.docx) to ascend@huawei.com. Once approved, you can get the resources.
- Framework - Framework
- [MindSpore](https://www.mindspore.cn/install/en) - [MindSpore](https://www.mindspore.cn/install/en)
- For more information, please check the resources below - For more information, please check the resources below
- [MindSpore Tutorials](https://www.mindspore.cn/tutorial/training/en/master/index.html) - [MindSpore Tutorials](https://www.mindspore.cn/tutorial/training/en/master/index.html)
- [MindSpore Python API](https://www.mindspore.cn/doc/api_python/en/master/index.html) - [MindSpore Python API](https://www.mindspore.cn/doc/api_python/en/master/index.html)
# [Script description](#contents) # [Script description](#contents)
## [Script and sample code](#contents) ## [Script and sample code](#contents)
@ -124,18 +122,19 @@ Parameters for both training and evaluation can be set in config.py
### Usage ### Usage
- Ascend: sh run_train.sh Ascend [RANK_TABLE_FILE] [DATASET_PATH] [PRETRAINED_CKPT_PATH]\(optional) - Ascend: sh run_train.sh Ascend [RANK_TABLE_FILE] [DATASET_PATH] [PRETRAINED_CKPT_PATH]\(optional)
### Launch ### Launch
``` ```bash
# training example # training example
Ascend: bash run_train.sh Ascend ~/hccl.json ~/imagenet/train/ ~/pretrained_ckeckpoint Ascend: bash run_train.sh Ascend ~/hccl.json ~/imagenet/train/ ~/pretrained_ckeckpoint
``` ```
### Result ### Result
Training result will be stored in the example path. Checkpoints will be stored at `./train/device$i/` by default, and training log will be redirected to `./train/device$i/train.log` like followings. Training result will be stored in the example path. Checkpoints will be stored at `./train/device$i/` by default, and training log will be redirected to `./train/device$i/train.log` like following.
``` ```bash
epoch: 1 step: 5004, loss is 4.8995576 epoch: 1 step: 5004, loss is 4.8995576
epoch: 2 step: 5004, loss is 3.9235563 epoch: 2 step: 5004, loss is 3.9235563
epoch: 3 step: 5004, loss is 3.833077 epoch: 3 step: 5004, loss is 3.833077
@ -153,7 +152,7 @@ You can start training using python or shell scripts. The usage of shell scripts
### Launch ### Launch
``` ```bash
# infer example # infer example
shell: shell:
Ascend: sh run_infer.sh Ascend ~/imagenet/val/ ~/train/Resnet50-30_5004.ckpt Ascend: sh run_infer.sh Ascend ~/imagenet/val/ ~/train/Resnet50-30_5004.ckpt
@ -163,9 +162,9 @@ You can start training using python or shell scripts. The usage of shell scripts
### Result ### Result
Inference result will be stored in the example path, you can find result like the followings in `./eval/infer.log`. Inference result will be stored in the example path, you can find result like the following in `./eval/infer.log`.
``` ```bash
result: {'acc': 0.76576314102564111} result: {'acc': 0.76576314102564111}
``` ```
@ -191,7 +190,7 @@ result: {'acc': 0.76576314102564111}
| Total time | 8pcs: 17 hours(30 epochs with pretrained) | | Total time | 8pcs: 17 hours(30 epochs with pretrained) |
| Parameters (M) | 25.5 | | Parameters (M) | 25.5 |
| Checkpoint for Fine tuning | 197M (.ckpt file) | | Checkpoint for Fine tuning | 197M (.ckpt file) |
| Scripts | [resnet50-quant script](https://gitee.com/mindspore/mindspore/tree/master/model_zoo/official/cv/resnet50_quant) | | Scripts | [resnet50-quant script](https://gitee.com/mindspore/mindspore/tree/master/model_zoo/official/cv/resnet50_quant) |
### Inference Performance ### Inference Performance

@ -34,7 +34,7 @@ def parse_args():
>>> parse_args() >>> parse_args()
""" """
parser = ArgumentParser(description="mindspore distributed training launch " parser = ArgumentParser(description="mindspore distributed training launch "
"helper utilty that will spawn up " "helper utility that will spawn up "
"multiple distributed processes") "multiple distributed processes")
parser.add_argument("--nproc_per_node", type=int, default=1, parser.add_argument("--nproc_per_node", type=int, default=1,
help="The number of processes to launch on each node, " help="The number of processes to launch on each node, "

@ -93,7 +93,7 @@ class DetectionEngine:
def _nms(self, dets, thresh): def _nms(self, dets, thresh):
"""Calculate NMS.""" """Calculate NMS."""
# conver xywh -> xmin ymin xmax ymax # convert xywh -> xmin ymin xmax ymax
x1 = dets[:, 0] x1 = dets[:, 0]
y1 = dets[:, 1] y1 = dets[:, 1]
x2 = x1 + dets[:, 2] x2 = x1 + dets[:, 2]
@ -185,7 +185,7 @@ class DetectionEngine:
x_top_left = x - w / 2. x_top_left = x - w / 2.
y_top_left = y - h / 2. y_top_left = y - h / 2.
# creat all False # create all False
flag = np.random.random(cls_emb.shape) > sys.maxsize flag = np.random.random(cls_emb.shape) > sys.maxsize
for i in range(flag.shape[0]): for i in range(flag.shape[0]):
c = cls_argmax[i] c = cls_argmax[i]

@ -59,7 +59,7 @@ cp ../*.py ./eval$3
cp -r ../src ./eval$3 cp -r ../src ./eval$3
cd ./eval$3 || exit cd ./eval$3 || exit
env > env.log env > env.log
echo "start infering for device $DEVICE_ID" echo "start inferring for device $DEVICE_ID"
python eval.py \ python eval.py \
--data_dir=$DATASET_PATH \ --data_dir=$DATASET_PATH \
--pretrained=$CHECKPOINT_PATH \ --pretrained=$CHECKPOINT_PATH \

@ -94,7 +94,7 @@ def get_interp_method(interp, sizes=()):
Neighbors method. (used by default). Neighbors method. (used by default).
4: Lanczos interpolation over 8x8 pixel neighborhood. 4: Lanczos interpolation over 8x8 pixel neighborhood.
9: Cubic for enlarge, area for shrink, bilinear for others 9: Cubic for enlarge, area for shrink, bilinear for others
10: Random select from interpolation method metioned above. 10: Random select from interpolation method mentioned above.
Note: Note:
When shrinking an image, it will generally look best with AREA-based When shrinking an image, it will generally look best with AREA-based
interpolation, whereas, when enlarging an image, it will generally look best interpolation, whereas, when enlarging an image, it will generally look best

@ -58,7 +58,7 @@ class YoloBlock(nn.Cell):
Args: Args:
in_channels: Integer. Input channel. in_channels: Integer. Input channel.
out_chls: Interger. Middle channel. out_chls: Integer. Middle channel.
out_channels: Integer. Output channel. out_channels: Integer. Output channel.
Returns: Returns:
@ -108,7 +108,7 @@ class YOLOv3(nn.Cell):
Args: Args:
backbone_shape: List. Darknet output channels shape. backbone_shape: List. Darknet output channels shape.
backbone: Cell. Backbone Network. backbone: Cell. Backbone Network.
out_channel: Interger. Output channel. out_channel: Integer. Output channel.
Returns: Returns:
Tensor, output tensor. Tensor, output tensor.

@ -44,7 +44,7 @@ def has_valid_annotation(anno):
# if all boxes have close to zero area, there is no annotation # if all boxes have close to zero area, there is no annotation
if _has_only_empty_bbox(anno): if _has_only_empty_bbox(anno):
return False return False
# keypoints task have a slight different critera for considering # keypoints task have a slight different criteria for considering
# if an annotation is valid # if an annotation is valid
if "keypoints" not in anno[0]: if "keypoints" not in anno[0]:
return True return True

@ -121,7 +121,7 @@ def parse_args():
args.rank = get_rank() args.rank = get_rank()
args.group_size = get_group_size() args.group_size = get_group_size()
# select for master rank save ckpt or all rank save, compatiable for model parallel # select for master rank save ckpt or all rank save, compatible for model parallel
args.rank_save_ckpt_flag = 0 args.rank_save_ckpt_flag = 0
if args.is_save_on_master: if args.is_save_on_master:
if args.rank == 0: if args.rank == 0:
@ -140,6 +140,14 @@ def conver_training_shape(args):
training_shape = [int(args.training_shape), int(args.training_shape)] training_shape = [int(args.training_shape), int(args.training_shape)]
return training_shape return training_shape
def build_quant_network(network):
quantizer = QuantizationAwareTraining(bn_fold=True,
per_channel=[True, False],
symmetric=[True, False],
one_conv_fold=False)
network = quantizer.quantize(network)
return network
def train(): def train():
"""Train function.""" """Train function."""
@ -168,11 +176,7 @@ def train():
config = ConfigYOLOV3DarkNet53() config = ConfigYOLOV3DarkNet53()
# convert fusion network to quantization aware network # convert fusion network to quantization aware network
if config.quantization_aware: if config.quantization_aware:
quantizer = QuantizationAwareTraining(bn_fold=True, network = build_quant_network(network)
per_channel=[True, False],
symmetric=[True, False],
one_conv_fold=False)
network = quantizer.quantize(network)
network = YoloWithLossCell(network) network = YoloWithLossCell(network)
args.logger.info('finish get network') args.logger.info('finish get network')
@ -198,11 +202,8 @@ def train():
lr = get_lr(args) lr = get_lr(args)
opt = Momentum(params=get_param_groups(network), opt = Momentum(params=get_param_groups(network), learning_rate=Tensor(lr), momentum=args.momentum,
learning_rate=Tensor(lr), weight_decay=args.weight_decay, loss_scale=args.loss_scale)
momentum=args.momentum,
weight_decay=args.weight_decay,
loss_scale=args.loss_scale)
network = TrainingWrapper(network, opt) network = TrainingWrapper(network, opt)
network.set_train() network.set_train()
@ -213,9 +214,7 @@ def train():
ckpt_config = CheckpointConfig(save_checkpoint_steps=args.ckpt_interval, ckpt_config = CheckpointConfig(save_checkpoint_steps=args.ckpt_interval,
keep_checkpoint_max=ckpt_max_num) keep_checkpoint_max=ckpt_max_num)
save_ckpt_path = os.path.join(args.outputs_dir, 'ckpt_' + str(args.rank) + '/') save_ckpt_path = os.path.join(args.outputs_dir, 'ckpt_' + str(args.rank) + '/')
ckpt_cb = ModelCheckpoint(config=ckpt_config, ckpt_cb = ModelCheckpoint(config=ckpt_config, directory=save_ckpt_path, prefix='{}'.format(args.rank))
directory=save_ckpt_path,
prefix='{}'.format(args.rank))
cb_params = _InternalCallbackParam() cb_params = _InternalCallbackParam()
cb_params.train_network = network cb_params.train_network = network
cb_params.epoch_num = ckpt_max_num cb_params.epoch_num = ckpt_max_num

@ -4,7 +4,7 @@
- [Model Architecture](#model-architecture) - [Model Architecture](#model-architecture)
- [Dataset](#dataset) - [Dataset](#dataset)
- [Environment Requirements](#environment-requirements) - [Environment Requirements](#environment-requirements)
- [Quick Start](#quick-start) - [Quick Start](#quick-start)
- [Script Description](#script-description) - [Script Description](#script-description)
- [Script and Sample Code](#script-and-sample-code) - [Script and Sample Code](#script-and-sample-code)
- [Script Parameters](#script-parameters) - [Script Parameters](#script-parameters)
@ -19,7 +19,6 @@
- [Description of Random Situation](#description-of-random-situation) - [Description of Random Situation](#description-of-random-situation)
- [ModelZoo Homepage](#modelzoo-homepage) - [ModelZoo Homepage](#modelzoo-homepage)
# [YOLOv3_ResNet18 Description](#contents) # [YOLOv3_ResNet18 Description](#contents)
YOLOv3 network based on ResNet-18, with support for training and evaluation. YOLOv3 network based on ResNet-18, with support for training and evaluation.
@ -30,24 +29,25 @@ YOLOv3 network based on ResNet-18, with support for training and evaluation.
The overall network architecture of YOLOv3 is show below: The overall network architecture of YOLOv3 is show below:
And we use ResNet18 as the backbone of YOLOv3_ResNet18. The architecture of ResNet18 has 4 stages. The ResNet architecture performs the initial convolution and max-pooling using 7×7 and 3×3 kernel sizes respectively. Afterward, every stage of the network has different Residual blocks (2, 2, 2, 2) containing two 3×3 conv layers. Finally, the network has an Average Pooling layer followed by a fully connected layer. And we use ResNet18 as the backbone of YOLOv3_ResNet18. The architecture of ResNet18 has 4 stages. The ResNet architecture performs the initial convolution and max-pooling using 7×7 and 3×3 kernel sizes respectively. Afterward, every stage of the network has different Residual blocks (2, 2, 2, 2) containing two 3×3 conv layers. Finally, the network has an Average Pooling layer followed by a fully connected layer.
# [Dataset](#contents) # [Dataset](#contents)
Note that you can run the scripts based on the dataset mentioned in original paper or widely used in relevant domain/network architecture. In the following sections, we will introduce how to run the scripts using the related dataset below. Note that you can run the scripts based on the dataset mentioned in original paper or widely used in relevant domain/network architecture. In the following sections, we will introduce how to run the scripts using the related dataset below.
Dataset used: [COCO2017](<http://images.cocodataset.org/>) Dataset used: [COCO2017](<http://images.cocodataset.org/>)
- Dataset size19G - Dataset size19G
- Train18G118000 images - Train18G118000 images
- Val1G5000 images - Val1G5000 images
- Annotations241Minstancescaptionsperson_keypoints etc - Annotations241Minstancescaptionsperson_keypoints etc
- Data formatimage and json files - Data formatimage and json files
- NoteData will be processed in dataset.py - NoteData will be processed in dataset.py
- Dataset - Dataset
1. The directory structure is as follows: 1. The directory structure is as follows:
``` ```
. .
├── annotations # annotation jsons ├── annotations # annotation jsons
@ -55,7 +55,7 @@ Dataset used: [COCO2017](<http://images.cocodataset.org/>)
└── val2017 # infer dataset └── val2017 # infer dataset
``` ```
2. Organize the dataset infomation into a TXT file, each row in the file is as follows: 2. Organize the dataset information into a TXT file, each row in the file is as follows:
``` ```
train2017/0000001.jpg 0,259,401,459,7 35,28,324,201,2 0,30,59,80,2 train2017/0000001.jpg 0,259,401,459,7 35,28,324,201,2 0,30,59,80,2
@ -63,63 +63,61 @@ Dataset used: [COCO2017](<http://images.cocodataset.org/>)
Each row is an image annotation which split by space, the first column is a relative path of image, the others are box and class infomations of the format [xmin,ymin,xmax,ymax,class]. `dataset.py` is the parsing script, we read image from an image path joined by the `image_dir`(dataset directory) and the relative path in `anno_path`(the TXT file path), `image_dir` and `anno_path` are external inputs. Each row is an image annotation which split by space, the first column is a relative path of image, the others are box and class infomations of the format [xmin,ymin,xmax,ymax,class]. `dataset.py` is the parsing script, we read image from an image path joined by the `image_dir`(dataset directory) and the relative path in `anno_path`(the TXT file path), `image_dir` and `anno_path` are external inputs.
# [Environment Requirements](#contents) # [Environment Requirements](#contents)
- HardwareAscend - HardwareAscend
- Prepare hardware environment with Ascend processor. If you want to try Ascend , please send the [application form](https://obs-9be7.obs.cn-east-2.myhuaweicloud.com/file/other/Ascend%20Model%20Zoo%E4%BD%93%E9%AA%8C%E8%B5%84%E6%BA%90%E7%94%B3%E8%AF%B7%E8%A1%A8.docx) to ascend@huawei.com. Once approved, you can get the resources. - Prepare hardware environment with Ascend processor. If you want to try Ascend , please send the [application form](https://obs-9be7.obs.cn-east-2.myhuaweicloud.com/file/other/Ascend%20Model%20Zoo%E4%BD%93%E9%AA%8C%E8%B5%84%E6%BA%90%E7%94%B3%E8%AF%B7%E8%A1%A8.docx) to ascend@huawei.com. Once approved, you can get the resources.
- Framework - Framework
- [MindSpore](https://www.mindspore.cn/install/en) - [MindSpore](https://www.mindspore.cn/install/en)
- For more information, please check the resources below - For more information, please check the resources below
- [MindSpore Tutorials](https://www.mindspore.cn/tutorial/training/en/master/index.html) - [MindSpore Tutorials](https://www.mindspore.cn/tutorial/training/en/master/index.html)
- [MindSpore Python API](https://www.mindspore.cn/doc/api_python/en/master/index.html) - [MindSpore Python API](https://www.mindspore.cn/doc/api_python/en/master/index.html)
# [Quick Start](#contents) # [Quick Start](#contents)
After installing MindSpore via the official website, you can start training and evaluation on Ascend as follows: After installing MindSpore via the official website, you can start training and evaluation on Ascend as follows:
- runing on Ascend - running on Ascend
```shell script ```shell script
#run standalone training example #run standalone training example
sh run_standalone_train.sh [DEVICE_ID] [EPOCH_SIZE] [MINDRECORD_DIR] [IMAGE_DIR] [ANNO_PATH] sh run_standalone_train.sh [DEVICE_ID] [EPOCH_SIZE] [MINDRECORD_DIR] [IMAGE_DIR] [ANNO_PATH]
#run distributed training example #run distributed training example
sh run_distribute_train.sh [DEVICE_NUM] [EPOCH_SIZE] [MINDRECORD_DIR] [IMAGE_DIR] [ANNO_PATH] [RANK_TABLE_FILE] sh run_distribute_train.sh [DEVICE_NUM] [EPOCH_SIZE] [MINDRECORD_DIR] [IMAGE_DIR] [ANNO_PATH] [RANK_TABLE_FILE]
#run evaluation example #run evaluation example
sh run_eval.sh [DEVICE_ID] [CKPT_PATH] [MINDRECORD_DIR] [IMAGE_DIR] [ANNO_PATH] sh run_eval.sh [DEVICE_ID] [CKPT_PATH] [MINDRECORD_DIR] [IMAGE_DIR] [ANNO_PATH]
``` ```
# [Script Description](#contents) # [Script Description](#contents)
## [Script and Sample Code](#contents) ## [Script and Sample Code](#contents)
``` ```python
└── cv └── cv
├── README.md // descriptions about all the models ├── README.md // descriptions about all the models
├── mindspore_hub_conf.md // config for mindspore hub ├── mindspore_hub_conf.md // config for mindspore hub
└── yolov3_resnet18 └── yolov3_resnet18
├── README.md // descriptions about yolov3_resnet18 ├── README.md // descriptions about yolov3_resnet18
├── scripts ├── scripts
├── run_distribute_train.sh // shell script for distributed on Ascend ├── run_distribute_train.sh // shell script for distributed on Ascend
├── run_standalone_train.sh // shell script for distributed on Ascend ├── run_standalone_train.sh // shell script for distributed on Ascend
└── run_eval.sh // shell script for evaluation on Ascend └── run_eval.sh // shell script for evaluation on Ascend
├── src ├── src
├── dataset.py // creating dataset ├── dataset.py // creating dataset
├── yolov3.py // yolov3 architecture ├── yolov3.py // yolov3 architecture
├── config.py // parameter configuration ├── config.py // parameter configuration
└── utils.py // util function └── utils.py // util function
├── train.py // training script ├── train.py // training script
└── eval.py // evaluation script └── eval.py // evaluation script
``` ```
## [Script Parameters](#contents) ## [Script Parameters](#contents)
```
Major parameters in train.py and config.py as follows: Major parameters in train.py and config.py as follows:
```python
device_num: Use device nums, default is 1. device_num: Use device nums, default is 1.
lr: Learning rate, default is 0.001. lr: Learning rate, default is 0.001.
epoch_size: Epoch size, default is 50. epoch_size: Epoch size, default is 50.
@ -133,24 +131,23 @@ After installing MindSpore via the official website, you can start training and
img_shape: Image height and width used as input to the model. img_shape: Image height and width used as input to the model.
``` ```
## [Training Process](#contents) ## [Training Process](#contents)
### Training on Ascend ### Training on Ascend
To train the model, run `train.py` with the dataset `image_dir`, `anno_path` and `mindrecord_dir`. If the `mindrecord_dir` is empty, it wil generate [mindrecord](https://www.mindspore.cn/tutorial/training/zh-CN/master/advanced_use/convert_dataset.html) file by `image_dir` and `anno_path`(the absolute image path is joined by the `image_dir` and the relative path in `anno_path`). **Note if `mindrecord_dir` isn't empty, it will use `mindrecord_dir` rather than `image_dir` and `anno_path`.** To train the model, run `train.py` with the dataset `image_dir`, `anno_path` and `mindrecord_dir`. If the `mindrecord_dir` is empty, it wil generate [mindrecord](https://www.mindspore.cn/tutorial/training/zh-CN/master/advanced_use/convert_dataset.html) file by `image_dir` and `anno_path`(the absolute image path is joined by the `image_dir` and the relative path in `anno_path`). **Note if `mindrecord_dir` isn't empty, it will use `mindrecord_dir` rather than `image_dir` and `anno_path`.**
- Stand alone mode - Stand alone mode
``` ```bash
sh run_standalone_train.sh 0 50 ./Mindrecord_train ./dataset ./dataset/train.txt sh run_standalone_train.sh 0 50 ./Mindrecord_train ./dataset ./dataset/train.txt
``` ```
The input variables are device id, epoch size, mindrecord directory path, dataset directory path and train TXT file path. The input variables are device id, epoch size, mindrecord directory path, dataset directory path and train TXT file path.
- Distributed mode - Distributed mode
``` ```bash
sh run_distribute_train.sh 8 150 /data/Mindrecord_train /data /data/train.txt /data/hccl.json sh run_distribute_train.sh 8 150 /data/Mindrecord_train /data /data/train.txt /data/hccl.json
``` ```
@ -158,7 +155,7 @@ To train the model, run `train.py` with the dataset `image_dir`, `anno_path` and
You will get the loss value and time of each step as following: You will get the loss value and time of each step as following:
``` ```bash
epoch: 145 step: 156, loss is 12.202981 epoch: 145 step: 156, loss is 12.202981
epoch time: 25599.22742843628, per step time: 164.0976117207454 epoch time: 25599.22742843628, per step time: 164.0976117207454
epoch: 146 step: 156, loss is 16.91706 epoch: 146 step: 156, loss is 16.91706
@ -168,19 +165,20 @@ You will get the loss value and time of each step as following:
epoch: 148 step: 156, loss is 10.431475 epoch: 148 step: 156, loss is 10.431475
epoch time: 23634.241580963135, per step time: 151.50154859591754 epoch time: 23634.241580963135, per step time: 151.50154859591754
epoch: 149 step: 156, loss is 14.665991 epoch: 149 step: 156, loss is 14.665991
epoch time: 24118.8325881958, per step time: 154.60790120638333 epoch time: 24118.8325881958, per step time: 154.60790120638333
epoch: 150 step: 156, loss is 10.779521 epoch: 150 step: 156, loss is 10.779521
epoch time: 25319.57221031189, per step time: 162.30495006610187 epoch time: 25319.57221031189, per step time: 162.30495006610187
``` ```
Note the results is two-classification(person and face) used our own annotations with coco2017, you can change `num_classes` in `config.py` to train your dataset. And we will suport 80 classifications in coco2017 the near future. Note the results is two-classification(person and face) used our own annotations with coco2017, you can change `num_classes` in `config.py` to train your dataset. And we will support 80 classifications in coco2017 the near future.
## [Evaluation Process](#contents) ## [Evaluation Process](#contents)
### Evaluation on Ascend ### Evaluation on Ascend
To eval, run `eval.py` with the dataset `image_dir`, `anno_path`(eval txt), `mindrecord_dir` and `ckpt_path`. `ckpt_path` is the path of [checkpoint](https://www.mindspore.cn/tutorial/training/en/master/use/save_model.html) file. To eval, run `eval.py` with the dataset `image_dir`, `anno_path`(eval txt), `mindrecord_dir` and `ckpt_path`. `ckpt_path` is the path of [checkpoint](https://www.mindspore.cn/tutorial/training/en/master/use/save_model.html) file.
``` ```bash
sh run_eval.sh 0 yolo.ckpt ./Mindrecord_eval ./dataset ./dataset/eval.txt sh run_eval.sh 0 yolo.ckpt ./Mindrecord_eval ./dataset ./dataset/eval.txt
``` ```
@ -188,18 +186,18 @@ The input variables are device id, checkpoint path, mindrecord directory path, d
You will get the precision and recall value of each class: You will get the precision and recall value of each class:
``` ```bash
class 0 precision is 88.18%, recall is 66.00% class 0 precision is 88.18%, recall is 66.00%
class 1 precision is 85.34%, recall is 79.13% class 1 precision is 85.34%, recall is 79.13%
``` ```
Note the precision and recall values are results of two-classification(person and face) used our own annotations with coco2017. Note the precision and recall values are results of two-classification(person and face) used our own annotations with coco2017.
# [Model Description](#contents) # [Model Description](#contents)
## [Performance](#contents) ## [Performance](#contents)
### Evaluation Performance ### Evaluation Performance
| Parameters | Ascend | | Parameters | Ascend |
| -------------------------- | ----------------------------------------------------------- | | -------------------------- | ----------------------------------------------------------- |
@ -217,7 +215,6 @@ Note the precision and recall values are results of two-classification(person an
| Parameters (M) | 189 | | Parameters (M) | 189 |
| Scripts | [yolov3_resnet18 script](https://gitee.com/mindspore/mindspore/tree/master/model_zoo/official/cv/yolov3_resnet18) | [yolov3_resnet18 script](https://gitee.com/mindspore/mindspore/tree/master/model_zoo/official/cv/yolov3_resnet18) | | Scripts | [yolov3_resnet18 script](https://gitee.com/mindspore/mindspore/tree/master/model_zoo/official/cv/yolov3_resnet18) | [yolov3_resnet18 script](https://gitee.com/mindspore/mindspore/tree/master/model_zoo/official/cv/yolov3_resnet18) |
### Inference Performance ### Inference Performance
| Parameters | Ascend | | Parameters | Ascend |
@ -233,9 +230,9 @@ Note the precision and recall values are results of two-classification(person an
# [Description of Random Situation](#contents) # [Description of Random Situation](#contents)
In dataset.py, we set the seed inside “create_dataset" function. We also use random seed in train.py. In dataset.py, we set the seed inside “create_dataset" function. We also use random seed in train.py.
# [ModelZoo Homepage](#contents)
# [ModelZoo Homepage](#contents)
Please check the official [homepage](https://gitee.com/mindspore/mindspore/tree/master/model_zoo). Please check the official [homepage](https://gitee.com/mindspore/mindspore/tree/master/model_zoo).

@ -15,7 +15,7 @@
# ============================================================================ # ============================================================================
echo "=======================================================================================================================================================" echo "======================================================================================================================================================="
echo "Please run the scipt as: " echo "Please run the script as: "
echo "sh run_distribute_train.sh DEVICE_NUM EPOCH_SIZE MINDRECORD_DIR IMAGE_DIR ANNO_PATH RANK_TABLE_FILE PRE_TRAINED PRE_TRAINED_EPOCH_SIZE" echo "sh run_distribute_train.sh DEVICE_NUM EPOCH_SIZE MINDRECORD_DIR IMAGE_DIR ANNO_PATH RANK_TABLE_FILE PRE_TRAINED PRE_TRAINED_EPOCH_SIZE"
echo "For example: sh run_distribute_train.sh 8 150 /data/Mindrecord_train /data /data/train.txt /data/hccl.json /opt/yolov3-150.ckpt(optional) 100(optional)" echo "For example: sh run_distribute_train.sh 8 150 /data/Mindrecord_train /data /data/train.txt /data/hccl.json /opt/yolov3-150.ckpt(optional) 100(optional)"
echo "It is better to use absolute path." echo "It is better to use absolute path."
@ -47,7 +47,7 @@ then
exit 1 exit 1
fi fi
echo "After running the scipt, the network runs in the background. The log will be generated in LOGx/log.txt" echo "After running the script, the network runs in the background. The log will be generated in LOGx/log.txt"
export RANK_TABLE_FILE=$6 export RANK_TABLE_FILE=$6
export RANK_SIZE=$1 export RANK_SIZE=$1

@ -15,7 +15,7 @@
# ============================================================================ # ============================================================================
echo "==============================================================================================================" echo "=============================================================================================================="
echo "Please run the scipt as: " echo "Please run the script as: "
echo "sh run_eval.sh DEVICE_ID CKPT_PATH MINDRECORD_DIR IMAGE_DIR ANNO_PATH" echo "sh run_eval.sh DEVICE_ID CKPT_PATH MINDRECORD_DIR IMAGE_DIR ANNO_PATH"
echo "for example: sh run_eval.sh 0 yolo.ckpt ./Mindrecord_eval ./dataset ./dataset/eval.txt" echo "for example: sh run_eval.sh 0 yolo.ckpt ./Mindrecord_eval ./dataset ./dataset/eval.txt"
echo "==============================================================================================================" echo "=============================================================================================================="

@ -15,7 +15,7 @@
# ============================================================================ # ============================================================================
echo "=========================================================================================================================================" echo "========================================================================================================================================="
echo "Please run the scipt as: " echo "Please run the script as: "
echo "sh run_standalone_train.sh DEVICE_ID EPOCH_SIZE MINDRECORD_DIR IMAGE_DIR ANNO_PATH PRE_TRAINED PRE_TRAINED_EPOCH_SIZE" echo "sh run_standalone_train.sh DEVICE_ID EPOCH_SIZE MINDRECORD_DIR IMAGE_DIR ANNO_PATH PRE_TRAINED PRE_TRAINED_EPOCH_SIZE"
echo "for example: sh run_standalone_train.sh 0 50 ./Mindrecord_train ./dataset ./dataset/train.txt /opt/yolov3-50.ckpt(optional) 30(optional)" echo "for example: sh run_standalone_train.sh 0 50 ./Mindrecord_train ./dataset ./dataset/train.txt /opt/yolov3-50.ckpt(optional) 30(optional)"
echo "=========================================================================================================================================" echo "========================================================================================================================================="

Loading…
Cancel
Save