diff --git a/model_zoo/README.md b/model_zoo/README.md index 6c6319ab00..45368c3f24 100644 --- a/model_zoo/README.md +++ b/model_zoo/README.md @@ -15,6 +15,7 @@ In order to facilitate developers to enjoy the benefits of MindSpore framework, - [Official](https://gitee.com/mindspore/mindspore/tree/master/model_zoo/official) - [Computer Vision](https://gitee.com/mindspore/mindspore/tree/master/model_zoo/official/cv) - [Image Classification](https://gitee.com/mindspore/mindspore/tree/master/model_zoo/official/cv) + - [DenseNet](https://gitee.com/mindspore/mindspore/tree/master/model_zoo/official/cv/densenet/README.md) - [GoogleNet](https://gitee.com/mindspore/mindspore/tree/master/model_zoo/official/cv/googlenet/README.md) - [ResNet50[benchmark]](https://gitee.com/mindspore/mindspore/tree/master/model_zoo/official/cv/resnet/README.md) - [ResNet50_Quant](https://gitee.com/mindspore/mindspore/blob/master/model_zoo/official/cv/resnet50_quant/README.md) diff --git a/model_zoo/README_CN.md b/model_zoo/README_CN.md index 57ee515dc5..6f87ed75f4 100644 --- a/model_zoo/README_CN.md +++ b/model_zoo/README_CN.md @@ -15,6 +15,7 @@ - [官方](https://gitee.com/mindspore/mindspore/tree/master/model_zoo/official) - [计算机视觉](https://gitee.com/mindspore/mindspore/tree/master/model_zoo/official/cv) - [图像分类](https://gitee.com/mindspore/mindspore/tree/master/model_zoo/official/cv) + - [DenseNet](https://gitee.com/mindspore/mindspore/tree/master/model_zoo/official/cv/densenet/README.md) - [GoogleNet](https://gitee.com/mindspore/mindspore/tree/master/model_zoo/official/cv/googlenet/README.md) - [ResNet-50[基准]](https://gitee.com/mindspore/mindspore/tree/master/model_zoo/official/cv/resnet/README.md) - [ResNet50_Quant](https://gitee.com/mindspore/mindspore/blob/master/model_zoo/official/cv/resnet50_quant/README.md) diff --git a/model_zoo/official/cv/densenet121/README.md b/model_zoo/official/cv/densenet/README.md similarity index 65% rename from model_zoo/official/cv/densenet121/README.md rename to model_zoo/official/cv/densenet/README.md index 1efa2a9a70..90462eaed8 100644 --- a/model_zoo/official/cv/densenet121/README.md +++ b/model_zoo/official/cv/densenet/README.md @@ -1,6 +1,6 @@ # Contents -- [DenseNet121 Description](#densenet121-description) +- [DenseNet Description](#densenet-description) - [Model Architecture](#model-architecture) - [Dataset](#dataset) - [Features](#features) @@ -16,25 +16,28 @@ - [Evaluation Process](#evaluation-process) - [Evaluation](#evaluation) - [Model Description](#model-description) - - [Performance](#performance) + - [Performance](#performance) - [Training accuracy results](#training-accuracy-results) - [Training performance results](#training-performance-results) - [Description of Random Situation](#description-of-random-situation) - [ModelZoo Homepage](#modelzoo-homepage) -# [DenseNet121 Description](#contents) +# [DenseNet Description](#contents) -DenseNet121 is a convolution based neural network for the task of image classification. The paper describing the model can be found [here](https://arxiv.org/abs/1608.06993). HuaWei’s DenseNet121 is a implementation on [MindSpore](https://www.mindspore.cn/). +DenseNet is a convolution based neural network for the task of image classification. The paper describing the model can be found [here](https://arxiv.org/abs/1608.06993). HuaWei’s DenseNet is a implementation on [MindSpore](https://www.mindspore.cn/). The repository also contains scripts to launch training and inference routines. # [Model Architecture](#contents) -DenseNet121 builds on 4 densely connected block. In every dense block, each layer obtains additional inputs from all preceding layers and passes on its own feature-maps to all subsequent layers. Concatenation is used. Each layer is receiving a “collective knowledge” from all preceding layers. +DenseNet supports two kinds of implementations: DenseNet100 and DenseNet121, where the number represents number of layers in the network. + +DenseNet121 builds on 4 densely connected block and DenseNet100 builds on 3. In every dense block, each layer obtains additional inputs from all preceding layers and passes on its own feature-maps to all subsequent layers. Concatenation is used. Each layer is receiving a “collective knowledge” from all preceding layers. # [Dataset](#contents) -Dataset used: ImageNet +Dataset used in DenseNet121: ImageNet + The default configuration of the Dataset are as follows: - Training Dataset preprocess: @@ -49,11 +52,27 @@ The default configuration of the Dataset are as follows: - Input size of images is 224\*224 (Resize to 256\*256 then crops images at the center) - Normalize the input image with respect to mean and standard deviation +Dataset used in DenseNet100: Cifar-10 + +The default configuration of the Dataset are as follows: + +- Training Dataset preprocess: + - Input size of images is 32\*32 + - Randomly cropping is applied to the image with padding=4 + - Probability of the image being flipped set to 0.5 + - Randomly adjust the brightness, contrast, saturation (0.4, 0.4, 0.4) + - Normalize the input image with respect to mean and standard deviation + +- Test Dataset preprocess: + - Input size of images is 32\*32 + - Normalize the input image with respect to mean and standard deviation + # [Features](#contents) ## Mixed Precision The [mixed precision](https://www.mindspore.cn/tutorial/training/en/master/advanced_use/enable_mixed_precision.html) training method accelerates the deep learning neural network training process by using both the single-precision and half-precision data formats, and maintains the network precision achieved by the single-precision training at the same time. Mixed precision training can accelerate the computation process, reduce memory usage, and enable a larger model or batch size to be trained on specific hardware. + For FP16 operators, if the input data type is FP32, the backend of MindSpore will automatically handle it with reduced precision. Users could check the reduced-precision operators by enabling INFO log and then searching ‘reduce precision’. # [Environment Requirements](#contents) @@ -74,15 +93,15 @@ After installing MindSpore via the official website, you can start training and ```python # run training example - python train.py --data_dir /PATH/TO/DATASET --pretrained /PATH/TO/PRETRAINED_CKPT --is_distributed 0 > train.log 2>&1 & + python train.py --net [NET_NAME] --dataset [DATASET_NAME] --data_dir /PATH/TO/DATASET --pretrained /PATH/TO/PRETRAINED_CKPT --is_distributed 0 > train.log 2>&1 & # run distributed training example - sh scripts/run_distribute_train.sh 8 rank_table.json /PATH/TO/DATASET /PATH/TO/PRETRAINED_CKPT + sh scripts/run_distribute_train.sh 8 rank_table.json [NET_NAME] [DATASET_NAME] /PATH/TO/DATASET /PATH/TO/PRETRAINED_CKPT # run evaluation example - python eval.py --data_dir /PATH/TO/DATASET --pretrained /PATH/TO/CHECKPOINT > eval.log 2>&1 & + python eval.py --net [NET_NAME] --dataset [DATASET_NAME] --data_dir /PATH/TO/DATASET --pretrained /PATH/TO/CHECKPOINT > eval.log 2>&1 & OR - sh scripts/run_distribute_eval.sh 8 rank_table.json /PATH/TO/DATASET /PATH/TO/CHECKPOINT + sh scripts/run_distribute_eval.sh 8 rank_table.json [NET_NAME] [DATASET_NAME] /PATH/TO/DATASET /PATH/TO/CHECKPOINT ``` For distributed training, a hccl configuration file with JSON format needs to be created in advance. @@ -95,17 +114,19 @@ After installing MindSpore via the official website, you can start training and For running on GPU, please change `platform` from `Ascend` to `GPU` + ```python # run training example export CUDA_VISIBLE_DEVICES=0 - python train.py --data_dir=[DATASET_PATH] --is_distributed=0 --device_target='GPU' > train.log 2>&1 & + python train.py --net=[NET_NAME] --dataset=[DATASET_NAME] --data_dir=[DATASET_PATH] --is_distributed=0 --device_target='GPU' > train.log 2>&1 & # run distributed training example - sh run_distribute_train_gpu.sh 8 0,1,2,3,4,5,6,7 [DATASET_PATH] + sh run_distribute_train_gpu.sh 8 0,1,2,3,4,5,6,7 [NET_NAME] [DATASET_NAME] [DATASET_PATH] # run evaluation example - python eval.py --data_dir=[DATASET_PATH] --device_target='GPU' --pretrained=[CHECKPOINT_PATH] > eval.log 2>&1 & + python eval.py --net=[NET_NAME] --dataset=[DATASET_NAME] --data_dir=[DATASET_PATH] --device_target='GPU' --pretrained=[CHECKPOINT_PATH] > eval.log 2>&1 & OR - sh run_distribute_eval_gpu.sh 1 0 [DATASET_PATH] [CHECKPOINT_PATH] + sh run_distribute_eval_gpu.sh 1 0 [NET_NAME] [DATASET_NAME] [DATASET_PATH] [CHECKPOINT_PATH] + ``` # [Script Description](#contents) @@ -114,8 +135,8 @@ After installing MindSpore via the official website, you can start training and ```text ├── model_zoo ├── README.md // descriptions about all the models - ├── densenet121 - ├── README.md // descriptions about densenet121 + ├── densenet + ├── README.md // descriptions about densenet ├── scripts │ ├── run_distribute_train.sh // shell script for distributed on Ascend │ ├── run_distribute_train_gpu.sh // shell script for distributed on GPU @@ -144,9 +165,9 @@ You can modify the training behaviour through the various flags in the `train.py ```python --data_dir train data dir - --num_classes num of classes in dataset(default:1000) + --num_classes num of classes in dataset(default:1000 for densenet121; 10 for densenet100) --image_size image size of the dataset - --per_batch_size mini-batch size (default: 256) per gpu + --per_batch_size mini-batch size (default: 32 for densenet121; 64 for densenet100) per gpu --pretrained path of pretrained model --lr_scheduler type of LR schedule: exponential, cosine_annealing --lr initial learning rate @@ -176,10 +197,10 @@ You can modify the training behaviour through the various flags in the `train.py - running on Ascend ```python - python train.py --data_dir /PATH/TO/DATASET --pretrained /PATH/TO/PRETRAINED_CKPT --is_distributed 0 > train.log 2>&1 & + python train.py --net [NET_NAME] --dataset [DATASET_NAME] --data_dir /PATH/TO/DATASET --pretrained /PATH/TO/PRETRAINED_CKPT --is_distributed 0 > train.log 2>&1 & ``` - The python command above will run in the background, The log and model checkpoint will be generated in `output/202x-xx-xx_time_xx_xx_xx/`. The loss value will be achieved as follows: + The python command above will run in the background, The log and model checkpoint will be generated in `output/202x-xx-xx_time_xx_xx_xx/`. The loss value of training DenseNet121 on ImageNet will be achieved as follows: ```shell 2020-08-22 16:58:56,617:INFO:epoch[0], iter[5003], loss:4.367, mean_fps:0.00 imgs/sec @@ -195,22 +216,30 @@ You can modify the training behaviour through the various flags in the `train.py ```python export CUDA_VISIBLE_DEVICES=0 - python train.py --data_dir=[DATASET_PATH] --is_distributed=0 --device_target='GPU' > train.log 2>&1 & + python train.py --net [NET_NAME] --dataset [DATASET_NAME] --data_dir=[DATASET_PATH] --is_distributed=0 --device_target='GPU' > train.log 2>&1 & ``` The python command above will run in the background, you can view the results through the file `train.log`. After training, you'll get some checkpoint files under the folder `./ckpt_0/` by default. +- running on CPU + + ```python + python train.py --net=[NET_NAME] --dataset=[DATASET_NAME] --data_dir=[DATASET_PATH] --is_distributed=0 --device_target='CPU' > train.log 2>&1 & + ``` + + The python command above will run in the background, The log and model checkpoint will be generated in `output/202x-xx-xx_time_xx_xx_xx/`. + ### Distributed Training - running on Ascend ```bash - sh scripts/run_distribute_train.sh 8 rank_table.json /PATH/TO/DATASET /PATH/TO/PRETRAINED_CKPT + sh scripts/run_distribute_train.sh 8 rank_table.json [NET_NAME] [DATASET_NAME] /PATH/TO/DATASET /PATH/TO/PRETRAINED_CKPT ``` - The above shell script will run distribute training in the background. You can view the results log and model checkpoint through the file `train[X]/output/202x-xx-xx_time_xx_xx_xx/`. The loss value will be achieved as follows: + The above shell script will run distribute training in the background. You can view the results log and model checkpoint through the file `train[X]/output/202x-xx-xx_time_xx_xx_xx/`. The loss value of training DenseNet121 on ImageNet will be achieved as follows: ```log 2020-08-22 16:58:54,556:INFO:epoch[0], iter[5003], loss:3.857, mean_fps:0.00 imgs/sec @@ -227,7 +256,7 @@ You can modify the training behaviour through the various flags in the `train.py ```bash cd scripts - sh run_distribute_train_gpu.sh 8 0,1,2,3,4,5,6,7 [DATASET_PATH] + sh run_distribute_train_gpu.sh 8 0,1,2,3,4,5,6,7 [NET_NAME] [DATASET_NAME] [DATASET_PATH] ``` The above shell script will run distribute training in the background. You can view the results through the file `train/train.log`. @@ -241,14 +270,14 @@ You can modify the training behaviour through the various flags in the `train.py running the command below for evaluation. ```python - python eval.py --data_dir /PATH/TO/DATASET --pretrained /PATH/TO/CHECKPOINT > eval.log 2>&1 & + python eval.py --net [NET_NAME] --dataset [DATASET_NAME] --data_dir /PATH/TO/DATASET --pretrained /PATH/TO/CHECKPOINT > eval.log 2>&1 & OR - sh scripts/run_distribute_eval.sh 8 rank_table.json /PATH/TO/DATASET /PATH/TO/CHECKPOINT + sh scripts/run_distribute_eval.sh 8 rank_table.json [NET_NAME] [DATASET_NAME] /PATH/TO/DATASET /PATH/TO/CHECKPOINT ``` - The above python command will run in the background. You can view the results through the file "output/202x-xx-xx_time_xx_xx_xx/202x_xxxx.log". The accuracy of the test dataset will be as follows: + The above python command will run in the background. You can view the results through the file "output/202x-xx-xx_time_xx_xx_xx/202x_xxxx.log". The accuracy of evaluating DenseNet121 on the test dataset of ImageNet will be as follows: - ```shell + ```log 2020-08-24 09:21:50,551:INFO:after allreduce eval: top1_correct=37657, tot=49920, acc=75.43% 2020-08-24 09:21:50,551:INFO:after allreduce eval: top5_correct=46224, tot=49920, acc=92.60% ``` @@ -258,27 +287,49 @@ You can modify the training behaviour through the various flags in the `train.py running the command below for evaluation. ```python - python eval.py --data_dir=[DATASET_PATH] --device_target='GPU' --pretrained=[CHECKPOINT_PATH] > eval.log 2>&1 & + python eval.py --net=[NET_NAME] --dataset=[DATASET_NAME] --data_dir=[DATASET_PATH] --device_target='GPU' --pretrained=[CHECKPOINT_PATH] > eval.log 2>&1 & OR - sh run_distribute_eval_gpu.sh 1 0 [DATASET_PATH] [CHECKPOINT_PATH] + sh run_distribute_eval_gpu.sh 1 0 [NET_NAME] [DATASET_NAME] [DATASET_PATH] [CHECKPOINT_PATH] ``` - The above python command will run in the background. You can view the results through the file "eval/eval.log". The accuracy of the test dataset will be as follows: + The above python command will run in the background. You can view the results through the file "eval/eval.log". The accuracy of evaluating DenseNet121 on the test dataset of ImageNet will be as follows: - ```shell + ```log 2021-02-04 14:20:50,551:INFO:after allreduce eval: top1_correct=37637, tot=49984, acc=75.30% 2021-02-04 14:20:50,551:INFO:after allreduce eval: top5_correct=46370, tot=49984, acc=92.77% ``` + The accuracy of evaluating DenseNet100 on the test dataset of Cifar-10 will be as follows: + + ```log + 2021-03-12 18:04:07,893:INFO:after allreduce eval: top1_correct=9536, tot=9984, acc=95.51% + ``` + +- evaluation on CPU + + running the command below for evaluation. + + ```python + python eval.py --net=[NET_NAME] --dataset=[DATASET_NAME] --data_dir=[DATASET_PATH] --device_target='CPU' --pretrained=[CHECKPOINT_PATH] > eval.log 2>&1 & + ``` + + The above python command will run in the background. You can view the results through the file "eval/eval.log". The accuracy of evaluating DenseNet100 on the test dataset of Cifar-10 will be as follows: + + ```log + 2021-03-18 09:06:43,247:INFO:after allreduce eval: top1_correct=9492, tot=9984, acc=95.07% + ``` + # [Model Description](#contents) ## [Performance](#contents) +### DenseNet121 + ### Training accuracy results | Parameters | Ascend | GPU | | ------------------- | --------------------------- | --------------------------- | -| Model Version | Inception V1 | Inception V1 | +| Model Version | DenseNet121 | DenseNet121 | | Resource | Ascend 910 | Tesla V100-PCIE | | Uploaded Date | 09/15/2020 (month/day/year) | 01/27/2021 (month/day/year) | | MindSpore Version | 1.0.0 | 1.1.0 | @@ -291,7 +342,7 @@ You can modify the training behaviour through the various flags in the `train.py | Parameters | Ascend | GPU | | ------------------- | --------------------------- | ---------------------------- | -| Model Version | Inception V1 | Inception V1 | +| Model Version | DenseNet121 | DenseNet121 | | Resource | Ascend 910 | Tesla V100-PCIE | | Uploaded Date | 09/15/2020 (month/day/year) | 02/04/2021 (month/day/year) | | MindSpore Version | 1.0.0 | 1.1.1 | @@ -300,6 +351,23 @@ You can modify the training behaviour through the various flags in the `train.py | outputs | probability | probability | | speed | 1pc:760 img/s;8pc:6000 img/s| 1pc:161 img/s;8pc:1288 img/s | +### DenseNet100 + +### Training performance + +| Parameters | GPU | +| ------------------- | ---------------------------- | +| Model Version | DenseNet100 | +| Resource | Tesla V100-PCIE | +| Uploaded Date | 03/18/2021 (month/day/year) | +| MindSpore Version | 1.2.0 | +| Dataset | Cifar-10 | +| batch_size | 64 | +| epochs | 300 | +| outputs | probability | +| accuracy | 95.31% | +| speed | 1pc: 600.07 img/sec | + # [Description of Random Situation](#contents) In dataset.py, we set the seed inside “create_dataset" function. We also use random seed in train.py. diff --git a/model_zoo/official/cv/densenet121/README_CN.md b/model_zoo/official/cv/densenet/README_CN.md similarity index 60% rename from model_zoo/official/cv/densenet121/README_CN.md rename to model_zoo/official/cv/densenet/README_CN.md index fff0b7e7d1..07e00f4513 100644 --- a/model_zoo/official/cv/densenet121/README_CN.md +++ b/model_zoo/official/cv/densenet/README_CN.md @@ -3,7 +3,7 @@ - [目录](#目录) -- [DenseNet121描述](#densenet121描述) +- [DenseNet描述](#densenet描述) - [模型架构](#模型架构) - [数据集](#数据集) - [特性](#特性) @@ -27,32 +27,50 @@ -# DenseNet121描述 +# DenseNet描述 -DenseNet-121是一个基于卷积的神经网络,用于图像分类。有关该模型的描述,可查阅[此论文](https://arxiv.org/abs/1608.06993)。华为的DenseNet-121是[MindSpore](https://www.mindspore.cn/)上的一个实现。 +DenseNet是一个基于卷积的神经网络,用于图像分类。有关该模型的描述,可查阅[此论文](https://arxiv.org/abs/1608.06993)。华为的DenseNet是[MindSpore](https://www.mindspore.cn/)上的一个实现。 仓库中还包含用于启动训练和推理例程的脚本。 # 模型架构 -DenseNet-121构建在4个密集连接块上。各个密集块中,每个层都会接受其前面所有层作为其额外的输入,并将自己的特征映射传递给后续所有层。会使用到级联。每一层都从前几层接受“集体知识”。 +DenseNet模型支持两种模式:DenseNet-100 和DenseNet-121。数字表示网络中包含的卷积层数量。 + +DenseNet-121构建在4个密集连接块上, DenseNet-100则构建在3个密集连接块上。各个密集块中,每个层都会接受其前面所有层作为其额外的输入,并将自己的特征映射传递给后续所有层。会使用到级联。每一层都从前几层接受“集体知识”。 # 数据集 -使用的数据集: ImageNet +DenseNet-121使用的数据集: ImageNet + +数据集的默认配置如下: + +- 训练数据集预处理: + - 图像的输入尺寸:224\*224 + - 裁剪的原始尺寸大小范围(最小值,最大值):(0.08, 1.0) + - 裁剪的宽高比范围(最小值,最大值):(0.75, 1.333) + - 图像翻转概率:0.5 + - 随机调节亮度、对比度、饱和度:(0.4, 0.4, 0.4) + - 根据平均值和标准偏差对输入图像进行归一化 + +- 测试数据集预处理: + - 图像的输入尺寸:224\*224(将图像缩放到256\*256,然后在中央区域裁剪图像) + - 根据平均值和标准偏差对输入图像进行归一化 + +DenseNet-100使用的数据集: Cifar-10 + 数据集的默认配置如下: - 训练数据集预处理: -- 图像的输入尺寸:224\*224 -- 裁剪的原始尺寸大小范围(最小值,最大值):(0.08, 1.0) -- 裁剪的宽高比范围(最小值,最大值):(0.75, 1.333) -- 图像翻转概率:0.5 -- 随机调节亮度、对比度、饱和度:(0.4, 0.4, 0.4) -- 根据平均值和标准偏差对输入图像进行归一化 + - 图像的输入尺寸:32\*32 + - 随机裁剪的边界填充值:4 + - 图像翻转概率:0.5 + - 随机调节亮度、对比度、饱和度:(0.4, 0.4, 0.4) + - 根据平均值和标准偏差对输入图像进行归一化 - 测试数据集预处理: -- 图像的输入尺寸:224\*224(将图像缩放到256\*256,然后在中央区域裁剪图像) -- 根据平均值和标准偏差对输入图像进行归一化 + - 图像的输入尺寸:32\*32 + - 根据平均值和标准偏差对输入图像进行归一化 # 特性 @@ -79,15 +97,15 @@ DenseNet-121构建在4个密集连接块上。各个密集块中,每个层都 ```python # 训练示例 - python train.py --data_dir /PATH/TO/DATASET --pretrained /PATH/TO/PRETRAINED_CKPT --is_distributed 0 > train.log 2>&1 & + python train.py --net [NET_NAME] --dataset [DATASET_NAME] --data_dir /PATH/TO/DATASET --pretrained /PATH/TO/PRETRAINED_CKPT --is_distributed 0 > train.log 2>&1 & # 分布式训练示例 - sh scripts/run_distribute_train.sh 8 rank_table.json /PATH/TO/DATASET /PATH/TO/PRETRAINED_CKPT + sh scripts/run_distribute_train.sh 8 rank_table.json [NET_NAME] [DATASET_NAME] /PATH/TO/DATASET /PATH/TO/PRETRAINED_CKPT # 评估示例 - python eval.py --data_dir /PATH/TO/DATASET --pretrained /PATH/TO/CHECKPOINT > eval.log 2>&1 & + python eval.py --net [NET_NAME] --dataset [DATASET_NAME] --data_dir /PATH/TO/DATASET --pretrained /PATH/TO/CHECKPOINT > eval.log 2>&1 & OR - sh scripts/run_distribute_eval.sh 8 rank_table.json /PATH/TO/DATASET /PATH/TO/CHECKPOINT + sh scripts/run_distribute_eval.sh 8 rank_table.json [NET_NAME] [DATASET_NAME] /PATH/TO/DATASET /PATH/TO/CHECKPOINT ``` 分布式训练需要提前创建JSON格式的HCCL配置文件。 @@ -101,15 +119,15 @@ DenseNet-121构建在4个密集连接块上。各个密集块中,每个层都 ```python # 训练示例 export CUDA_VISIBLE_DEVICES=0 - python train.py --data_dir=[DATASET_PATH] --is_distributed=0 --device_target='GPU' > train.log 2>&1 & + python train.py --net=[NET_NAME] --dataset=[DATASET_NAME] --data_dir=[DATASET_PATH] --is_distributed=0 --device_target='GPU' > train.log 2>&1 & # 分布式训练示例 - sh run_distribute_train_gpu.sh 8 0,1,2,3,4,5,6,7 [DATASET_PATH] + sh run_distribute_train_gpu.sh 8 0,1,2,3,4,5,6,7 [NET_NAME] [DATASET_NAME] [DATASET_PATH] # 评估示例 - python eval.py --data_dir=[DATASET_PATH] --device_target='GPU' --pretrained=[CHECKPOINT_PATH] > eval.log 2>&1 & + python eval.py --net=[NET_NAME] --dataset=[DATASET_NAME] --data_dir=[DATASET_PATH] --device_target='GPU' --pretrained=[CHECKPOINT_PATH] > eval.log 2>&1 & OR - sh run_distribute_eval_gpu.sh 1 0 [DATASET_PATH] [CHECKPOINT_PATH] + sh run_distribute_eval_gpu.sh 1 0 [NET_NAME] [DATASET_NAME] [DATASET_PATH] [CHECKPOINT_PATH] ``` # 脚本说明 @@ -119,8 +137,8 @@ DenseNet-121构建在4个密集连接块上。各个密集块中,每个层都 ```shell ├── model_zoo ├── README.md // 所有模型的说明 - ├── densenet121 - ├── README.md // DenseNet-121相关说明 + ├── densenet + ├── README.md // DenseNet相关说明 ├── scripts │ ├── run_distribute_train.sh // Ascend分布式shell脚本 │ ├── run_distribute_train_gpu.sh // GPU分布式shell脚本 @@ -148,10 +166,10 @@ DenseNet-121构建在4个密集连接块上。各个密集块中,每个层都 可通过`train.py`脚本中的参数修改训练行为。`train.py`脚本中的参数如下: ```param - --Data_dir 训练数据目录 - --num_classes 数据集中的类个数(默认为1000) + --data_dir 训练数据目录 + --num_classes 数据集中的类个数(DenseNet-121中默认为1000,DenseNet-100中默认为10) --image_size 数据集图片大小 - --per_batch_size 每GPU的迷你批次大小(默认为256) + --per_batch_size 每GPU的迷你批次大小(DenseNet-121中默认为32, DenseNet-100中默认为64) --pretrained 预训练模型的路径 --lr_scheduler LR调度类型,取值包括 exponential,cosine_annealing --lr 初始学习率 @@ -181,10 +199,10 @@ DenseNet-121构建在4个密集连接块上。各个密集块中,每个层都 - Ascend处理器环境运行 ```python - python train.py --data_dir /PATH/TO/DATASET --pretrained /PATH/TO/PRETRAINED_CKPT --is_distributed 0 > train.log 2>&1 & + python train.py --net [NET_NAME] --dataset [DATASET_NAME] --data_dir /PATH/TO/DATASET --pretrained /PATH/TO/PRETRAINED_CKPT --is_distributed 0 > train.log 2>&1 & ``` - 以上python命令在后台运行,在`output/202x-xx-xx_time_xx_xx/`目录下生成日志和模型检查点。损失值的实现如下: + 以上python命令在后台运行,在`output/202x-xx-xx_time_xx_xx/`目录下生成日志和模型检查点。在ImageNet数据集上训练DenseNet-121的损失值的实现如下: ```log 2020-08-22 16:58:56,617:INFO:epoch[0], iter[5003], loss:4.367, mean_fps:0.00 imgs/sec @@ -200,7 +218,15 @@ DenseNet-121构建在4个密集连接块上。各个密集块中,每个层都 ```python export CUDA_VISIBLE_DEVICES=0 - python train.py --data_dir=[DATASET_PATH] --is_distributed=0 --device_target='GPU' > train.log 2>&1 & + python train.py --net=[NET_NAME] --dataset=[DATASET_NAME] --data_dir=[DATASET_PATH] --is_distributed=0 --device_target='GPU' > train.log 2>&1 & + ``` + + 以上python命令在后台运行,在`output/202x-xx-xx_time_xx_xx/`目录下生成日志和模型检查点。 + +- CPU处理器环境运行 + + ```python + python train.py --net=[NET_NAME] --dataset=[DATASET_NAME] --data_dir=[DATASET_PATH] --is_distributed=0 --device_target='CPU' > train.log 2>&1 & ``` 以上python命令在后台运行,在`output/202x-xx-xx_time_xx_xx/`目录下生成日志和模型检查点。 @@ -210,10 +236,10 @@ DenseNet-121构建在4个密集连接块上。各个密集块中,每个层都 - Ascend处理器环境运行 ```shell - sh scripts/run_distribute_train.sh 8 rank_table.json /PATH/TO/DATASET /PATH/TO/PRETRAINED_CKPT + sh scripts/run_distribute_train.sh 8 rank_table.json [NET_NAME] [DATASET_NAME] /PATH/TO/DATASET /PATH/TO/PRETRAINED_CKPT ``` - 上述shell脚本将在后台进行分布式训练。可以通过文件`train[X]/output/202x-xx-xx_time_xx_xx_xx/`查看结果日志和模型检查点。损失值的实现如下: + 上述shell脚本将在后台进行分布式训练。可以通过文件`train[X]/output/202x-xx-xx_time_xx_xx_xx/`查看结果日志和模型检查点。在ImageNet数据集上训练DenseNet-121的损失值的实现如下: ```log 2020-08-22 16:58:54,556:INFO:epoch[0], iter[5003], loss:3.857, mean_fps:0.00 imgs/sec @@ -230,7 +256,7 @@ DenseNet-121构建在4个密集连接块上。各个密集块中,每个层都 ```bash cd scripts - sh run_distribute_train_gpu.sh 8 0,1,2,3,4,5,6,7 [DATASET_PATH] + sh run_distribute_train_gpu.sh 8 0,1,2,3,4,5,6,7 [NET_NAME] [DATASET_NAME] [DATASET_PATH] ``` 上述shell脚本将在后台进行分布式训练。可以通过文件`train[X]/output/202x-xx-xx_time_xx_xx_xx/`查看结果日志和模型检查点。 @@ -244,12 +270,12 @@ DenseNet-121构建在4个密集连接块上。各个密集块中,每个层都 运行以下命令进行评估。 ```eval - python eval.py --data_dir /PATH/TO/DATASET --pretrained /PATH/TO/CHECKPOINT > eval.log 2>&1 & + python eval.py --net [NET_NAME] --dataset [DATASET_NAME] --data_dir /PATH/TO/DATASET --pretrained /PATH/TO/CHECKPOINT > eval.log 2>&1 & OR - sh scripts/run_distribute_eval.sh 8 rank_table.json /PATH/TO/DATASET /PATH/TO/CHECKPOINT + sh scripts/run_distribute_eval.sh 8 rank_table.json [NET_NAME] [DATASET_NAME] /PATH/TO/DATASET /PATH/TO/CHECKPOINT ``` - 上述python命令在后台运行。可以通过“output/202x-xx-xx_time_xx_xx_xx/202x_xxxx.log”文件查看结果。测试数据集的准确率如下: + 上述python命令在后台运行。可以通过“output/202x-xx-xx_time_xx_xx_xx/202x_xxxx.log”文件查看结果。DenseNet-121在ImageNet的测试数据集的准确率如下: ```log 2020-08-24 09:21:50,551:INFO:after allreduce eval: top1_correct=37657, tot=49920, acc=75.43% @@ -261,27 +287,49 @@ DenseNet-121构建在4个密集连接块上。各个密集块中,每个层都 运行以下命令进行评估。 ```eval - python eval.py --data_dir=[DATASET_PATH] --device_target='GPU' --pretrained=[CHECKPOINT_PATH] > eval.log 2>&1 & + python eval.py --net=[NET_NAME] --dataset=[DATASET_NAME] --data_dir=[DATASET_PATH] --device_target='GPU' --pretrained=[CHECKPOINT_PATH] > eval.log 2>&1 & OR - sh run_distribute_eval_gpu.sh 1 0 [DATASET_PATH] [CHECKPOINT_PATH] + sh run_distribute_eval_gpu.sh 1 0 [NET_NAME] [DATASET_NAME] [DATASET_PATH] [CHECKPOINT_PATH] ``` - 上述python命令在后台运行。可以通过“eval/eval.log”文件查看结果。测试数据集的准确率如下: + 上述python命令在后台运行。可以通过“eval/eval.log”文件查看结果。DenseNet-121在ImageNet的测试数据集的准确率如下: ```log 2021-02-04 14:20:50,551:INFO:after allreduce eval: top1_correct=37637, tot=49984, acc=75.30% 2021-02-04 14:20:50,551:INFO:after allreduce eval: top5_correct=46370, tot=49984, acc=92.77% ``` + DenseNet-100在Cifar-10的测试数据集的准确率如下: + + ```log + 2021-03-12 18:04:07,893:INFO:after allreduce eval: top1_correct=9536, tot=9984, acc=95.51% + ``` + +- CPU处理器环境 + + 运行以下命令进行评估。 + + ```eval + python eval.py --net=[NET_NAME] --dataset=[DATASET_NAME] --data_dir=[DATASET_PATH] --device_target='CPU' --pretrained=[CHECKPOINT_PATH] > eval.log 2>&1 & + ``` + + 上述python命令在后台运行。可以通过“eval/eval.log”文件查看结果。DenseNet-100在Cifar-10的测试数据集的准确率如下: + + ```log + 2021-03-18 09:06:43,247:INFO:after allreduce eval: top1_correct=9492, tot=9984, acc=95.07% + ``` + # 模型描述 ## 性能 +### DenseNet121 + ### 训练准确率结果 | 参数 | Ascend | GPU | | ------------------- | -------------------------- | -------------------------- | -| 模型版本 | Inception V1 | Inception V1 | +| 模型版本 | DenseNet-121 | DenseNet-121 | | 资源 | Ascend 910 | Tesla V100-PCIE | | 上传日期 | 2020/9/15 | 2021/2/4 | | MindSpore版本 | 1.0.0 | 1.1.1 | @@ -294,7 +342,7 @@ DenseNet-121构建在4个密集连接块上。各个密集块中,每个层都 | 参数 | Ascend | GPU | | ------------------- | -------------------------------- | -------------------------------- | -| 模型版本 | Inception V1 | Inception V1 | +| 模型版本 | DenseNet-121 | DenseNet-121 | | 资源 | Ascend 910 | Tesla V100-PCIE | | 上传日期 | 2020/9/15 | 2021/2/4 | | MindSpore版本 | 1.0.0 | 1.1.1 | @@ -303,6 +351,23 @@ DenseNet-121构建在4个密集连接块上。各个密集块中,每个层都 | 输出 | 概率 | 概率 | | 速度 | 单卡:760 img/s;8卡:6000 img/s | 单卡:161 img/s;8卡:1288 img/s | +### DenseNet100 + +### 训练结果 + +| 参数 | GPU | +| ------------------- | -------------------------------- | +| 模型版本 | DenseNet-100 | +| 资源 | Tesla V100-PCIE | +| 上传日期 | 2021/03/18 | +| MindSpore版本 | 1.2.0 | +| 数据集 | Cifar-10 | +| 轮次 | 300 | +| batch_size | 64 | +| 输出 | 概率 | +| 训练性能 | Top1:95.28% | +| 速度 | 单卡:600.07 img/sec | + # 随机情况说明 dataset.py中设置了“create_dataset”函数内的种子,同时还使用了train.py中的随机种子。 @@ -310,4 +375,3 @@ dataset.py中设置了“create_dataset”函数内的种子,同时还使用 # ModelZoo主页 请浏览官网[主页](https://gitee.com/mindspore/mindspore/tree/master/model_zoo)。 - diff --git a/model_zoo/official/cv/densenet121/eval.py b/model_zoo/official/cv/densenet/eval.py similarity index 86% rename from model_zoo/official/cv/densenet121/eval.py rename to model_zoo/official/cv/densenet/eval.py index 6e44317efe..4cf23f5ae3 100644 --- a/model_zoo/official/cv/densenet121/eval.py +++ b/model_zoo/official/cv/densenet/eval.py @@ -1,4 +1,4 @@ -# Copyright 2020 Huawei Technologies Co., Ltd +# Copyright 2020-2021 Huawei Technologies Co., Ltd # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -15,7 +15,7 @@ """ ##############test densenet example################# -python eval.py --data_dir /PATH/TO/DATASET --pretrained /PATH/TO/CHECKPOINT +python eval.py --net densenet121 --dataset imagenet --data_dir /PATH/TO/DATASET --pretrained /PATH/TO/CHECKPOINT """ import os @@ -34,10 +34,6 @@ from mindspore.ops import functional as F from mindspore.common import dtype as mstype from src.utils.logging import get_logger -from src.datasets import classification_dataset -from src.network import DenseNet121 -from src.config import config - class ParameterReduce(nn.Cell): """ @@ -61,10 +57,13 @@ def parse_args(cloud_args=None): """ parser = argparse.ArgumentParser('mindspore classification test') + # network and dataset choices + parser.add_argument('--net', type=str, default='', help='Densenet Model, densenet100 or densenet121') + parser.add_argument('--dataset', type=str, default='', help='Dataset, either cifar10 or imagenet') + # dataset related parser.add_argument('--data_dir', type=str, default='', help='eval data dir') - parser.add_argument('--num_classes', type=int, default=1000, help='num of classes in dataset') - parser.add_argument('--image_size', type=str, default='224,224', help='image size of the dataset') + # network related parser.add_argument('--backbone', default='resnet50', help='backbone') parser.add_argument('--pretrained', default='', type=str, help='fully path of pretrained model to load.' @@ -80,12 +79,21 @@ def parse_args(cloud_args=None): parser.add_argument('--train_url', type=str, default="", help='train url') # platform - parser.add_argument('--device_target', type=str, default='Ascend', choices=('Ascend', 'GPU'), help='device target') + parser.add_argument('--device_target', type=str, default='Ascend', choices=('Ascend', 'GPU', 'CPU'), + help='device target') args, _ = parser.parse_known_args() args = merge_args(args, cloud_args) + if args.net == "densenet100": + from src.config import config_100 as config + else: + from src.config import config_121 as config + args.per_batch_size = config.per_batch_size + args.image_size = config.image_size + args.num_classes = config.num_classes + args.image_size = list(map(int, args.image_size.split(','))) return args @@ -151,7 +159,8 @@ def generate_results(model, rank, group_size, top1_correct, top5_correct, img_to def test(cloud_args=None): """ - network eval function. Get top1 and top5 ACC from classification. + network eval function. Get top1 and top5 ACC from classification for imagenet, + and top1 ACC for cifar10. The result will be save at [./outputs] by default. """ args = parse_args(cloud_args) @@ -185,13 +194,23 @@ def test(cloud_args=None): else: args.models = [args.pretrained,] + if args.net == "densenet100": + from src.network.densenet import DenseNet100 as DenseNet + else: + from src.network.densenet import DenseNet121 as DenseNet + + if args.dataset == "cifar10": + from src.datasets import classification_dataset_cifar10 as classification_dataset + else: + from src.datasets import classification_dataset_imagenet as classification_dataset + for model in args.models: de_dataset = classification_dataset(args.data_dir, image_size=args.image_size, per_batch_size=args.per_batch_size, max_epoch=1, rank=args.rank, group_size=args.group_size, mode='eval') eval_dataloader = de_dataset.create_tuple_iterator() - network = DenseNet121(args.num_classes) + network = DenseNet(args.num_classes) param_dict = load_checkpoint(model) param_dict_new = {} @@ -240,15 +259,13 @@ def test(cloud_args=None): img_tot = results[2, 0] acc1 = 100.0 * top1_correct / img_tot acc5 = 100.0 * top5_correct / img_tot - args.logger.info('after allreduce eval: top1_correct={}, tot={}, acc={:.2f}%'.format(top1_correct, - img_tot, + args.logger.info('after allreduce eval: top1_correct={}, tot={}, acc={:.2f}%'.format(top1_correct, img_tot, acc1)) - args.logger.info('after allreduce eval: top5_correct={}, tot={}, acc={:.2f}%'.format(top5_correct, - img_tot, - acc5)) + if args.dataset == 'imagenet': + args.logger.info('after allreduce eval: top5_correct={}, tot={}, acc={:.2f}%'.format(top5_correct, img_tot, + acc5)) if args.is_distributed: release() - if __name__ == "__main__": test() diff --git a/model_zoo/official/cv/densenet121/export.py b/model_zoo/official/cv/densenet/export.py similarity index 79% rename from model_zoo/official/cv/densenet121/export.py rename to model_zoo/official/cv/densenet/export.py index d4211994b3..6c6104b918 100644 --- a/model_zoo/official/cv/densenet121/export.py +++ b/model_zoo/official/cv/densenet/export.py @@ -1,4 +1,4 @@ -# Copyright 2020 Huawei Technologies Co., Ltd +# Copyright 2020-2021 Huawei Technologies Co., Ltd # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -20,14 +20,13 @@ from mindspore.common import dtype as mstype from mindspore import context, Tensor from mindspore.train.serialization import export, load_checkpoint, load_param_into_net -from src.network import DenseNet121 -from src.config import config +parser = argparse.ArgumentParser(description="densenet export") -parser = argparse.ArgumentParser(description="densenet121 export") +parser.add_argument("--net", type=str, default='', help="Densenet Model, densenet100 or densenet121") parser.add_argument("--device_id", type=int, default=0, help="Device id") parser.add_argument("--batch_size", type=int, default=32, help="batch size") parser.add_argument("--ckpt_file", type=str, required=True, help="Checkpoint file path.") -parser.add_argument("--file_name", type=str, default="densenet121", help="output file name.") +parser.add_argument("--file_name", type=str, default="densenet", help="output file name.") parser.add_argument("--file_format", type=str, choices=["AIR", "ONNX", "MINDIR"], default="AIR", help="file format") parser.add_argument("--device_target", type=str, choices=["Ascend", "GPU", "CPU"], default="Ascend", help="device target") @@ -37,8 +36,15 @@ context.set_context(mode=context.GRAPH_MODE, device_target=args.device_target) if args.device_target == "Ascend": context.set_context(device_id=args.device_id) +if args.net == "densenet100": + from src.config import config_100 as config + from src.network.densenet import DenseNet100 as DenseNet +else: + from src.config import config_121 as config + from src.network.densenet import DenseNet121 as DenseNet + if __name__ == "__main__": - network = DenseNet121(config.num_classes) + network = DenseNet(config.num_classes) param_dict = load_checkpoint(args.ckpt_file) diff --git a/model_zoo/official/cv/densenet121/mindspore_hub_conf.py b/model_zoo/official/cv/densenet/mindspore_hub_conf.py similarity index 82% rename from model_zoo/official/cv/densenet121/mindspore_hub_conf.py rename to model_zoo/official/cv/densenet/mindspore_hub_conf.py index 5e1ed149c0..29931ba26e 100644 --- a/model_zoo/official/cv/densenet121/mindspore_hub_conf.py +++ b/model_zoo/official/cv/densenet/mindspore_hub_conf.py @@ -1,4 +1,4 @@ -# Copyright 2020 Huawei Technologies Co., Ltd +# Copyright 2020-2021 Huawei Technologies Co., Ltd # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -13,9 +13,11 @@ # limitations under the License. # ============================================================================ """hub config.""" -from src.network import DenseNet121 +from src.network import DenseNet121, DenseNet100 def create_network(name, *args, **kwargs): if name == 'densenet121': return DenseNet121(*args, **kwargs) + if name == 'densenet100': + return DenseNet100(*args, **kwargs) raise NotImplementedError(f"{name} is not implemented in the repo") diff --git a/model_zoo/official/cv/densenet121/scripts/run_distribute_eval.sh b/model_zoo/official/cv/densenet/scripts/run_distribute_eval.sh similarity index 86% rename from model_zoo/official/cv/densenet121/scripts/run_distribute_eval.sh rename to model_zoo/official/cv/densenet/scripts/run_distribute_eval.sh index 74c52226e2..5e980353ed 100644 --- a/model_zoo/official/cv/densenet121/scripts/run_distribute_eval.sh +++ b/model_zoo/official/cv/densenet/scripts/run_distribute_eval.sh @@ -1,5 +1,5 @@ #!/bin/bash -# Copyright 2020 Huawei Technologies Co., Ltd +# Copyright 2020-2021 Huawei Technologies Co., Ltd # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -16,8 +16,8 @@ echo "==============================================================================================================" echo "Please run the script as: " -echo "sh scripts/run_distribute_eval.sh DEVICE_NUM RANK_TABLE_FILE DATASET CKPT_PATH" -echo "for example: sh scripts/run_distribute_train.sh 8 /data/hccl.json /path/to/dataset /path/to/ckpt" +echo "sh scripts/run_distribute_eval.sh DEVICE_NUM RANK_TABLE_FILE NET_NAME DATASET_NAME DATASET CKPT_PATH" +echo "for example: sh scripts/run_distribute_train.sh 8 /data/hccl.json densenet121 imagenet /path/to/dataset /path/to/ckpt" echo "It is better to use absolute path." echo "=================================================================================================================" @@ -25,8 +25,10 @@ echo "After running the script, the network runs in the background. The log will export RANK_SIZE=$1 export RANK_TABLE_FILE=$2 -DATASET=$3 -CKPT_PATH=$4 +NET_NAME=$3 +DATASET_NAME=$4 +DATASET=$5 +CKPT_PATH=$6 for((i=0;i env.log python eval.py \ + --net=$NET_NAME \ + --dataset=$DATASET_NAME \ --data_dir=$DATASET \ --pretrained=$CKPT_PATH > log.txt 2>&1 & cd ../ done - diff --git a/model_zoo/official/cv/densenet121/scripts/run_distribute_eval_gpu.sh b/model_zoo/official/cv/densenet/scripts/run_distribute_eval_gpu.sh similarity index 70% rename from model_zoo/official/cv/densenet121/scripts/run_distribute_eval_gpu.sh rename to model_zoo/official/cv/densenet/scripts/run_distribute_eval_gpu.sh index 03f3446fcf..f9857d494f 100644 --- a/model_zoo/official/cv/densenet121/scripts/run_distribute_eval_gpu.sh +++ b/model_zoo/official/cv/densenet/scripts/run_distribute_eval_gpu.sh @@ -14,26 +14,26 @@ # limitations under the License. # ============================================================================ -if [ $# -lt 4 ] +if [ $# -lt 6 ] then - echo "Usage: sh run_distribute_eval_gpu.sh [DEVICE_NUM] [VISIABLE_DEVICES(0,1,2,3,4,5,6,7)] [DATASET_PATH] [CHECKPOINT_PATH]" -exit 1 + echo "Usage: sh run_distribute_eval_gpu.sh [DEVICE_NUM] [VISIABLE_DEVICES(0,1,2,3,4,5,6,7)] [NET_NAME] [DATASET_NAME] [DATASET_PATH] [CHECKPOINT_PATH]" + exit 1 fi if [ $1 -lt 1 ] && [ $1 -gt 8 ] then echo "error: DEVICE_NUM=$1 is not in (1-8)" -exit 1 + exit 1 fi export DEVICE_NUM=$1 export RANK_SIZE=$1 # check checkpoint file -if [ ! -f $4 ] +if [ ! -f $6 ] then - echo "error: CHECKPOINT_PATH=$4 is not a file" -exit 1 + echo "error: CHECKPOINT_PATH=$6 is not a file" + exit 1 fi BASEPATH=$(cd "`dirname $0`" || exit; pwd) @@ -51,13 +51,17 @@ export CUDA_VISIBLE_DEVICES="$2" if [ $1 -gt 1 ] then mpirun -n $1 --allow-run-as-root python3 ${BASEPATH}/../eval.py \ - --data_dir=$3 \ + --net=$3 \ + --dataset=$4 \ + --data_dir=$5 \ --device_target='GPU' \ - --pretrained=$4 > eval.log 2>&1 & + --pretrained=$6 > eval.log 2>&1 & else python3 ${BASEPATH}/../eval.py \ - --data_dir=$3 \ + --net=$3 \ + --dataset=$4 \ + --data_dir=$5 \ --device_target='GPU' \ - --pretrained=$4 > eval.log 2>&1 & + --pretrained=$6 > eval.log 2>&1 & fi diff --git a/model_zoo/official/cv/densenet121/scripts/run_distribute_train.sh b/model_zoo/official/cv/densenet/scripts/run_distribute_train.sh similarity index 78% rename from model_zoo/official/cv/densenet121/scripts/run_distribute_train.sh rename to model_zoo/official/cv/densenet/scripts/run_distribute_train.sh index 0a597c11d5..1d70b6100c 100644 --- a/model_zoo/official/cv/densenet121/scripts/run_distribute_train.sh +++ b/model_zoo/official/cv/densenet/scripts/run_distribute_train.sh @@ -1,5 +1,5 @@ #!/bin/bash -# Copyright 2020 Huawei Technologies Co., Ltd +# Copyright 2020-2021 Huawei Technologies Co., Ltd # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -16,8 +16,8 @@ echo "==============================================================================================================" echo "Please run the script as: " -echo "sh scripts/run_distribute_train.sh DEVICE_NUM RANK_TABLE_FILE DATASET CKPT_FILE" -echo "for example: sh scripts/run_distribute_train.sh 8 /data/hccl.json /path/to/dataset ckpt_file" +echo "sh scripts/run_distribute_train.sh DEVICE_NUM RANK_TABLE_FILE NET_NAME DATASET_NAME DATASET CKPT_FILE" +echo "for example: sh scripts/run_distribute_train.sh 8 /data/hccl.json densenet121 imagenet /path/to/dataset ckpt_file" echo "It is better to use absolute path." echo "=================================================================================================================" @@ -25,8 +25,10 @@ echo "After running the script, the network runs in the background. The log will export RANK_SIZE=$1 export RANK_TABLE_FILE=$2 -DATASET=$3 -CKPT_FILE=$4 +NET_NAME=$3 +DATASET_NAME=$4 +DATASET=$5 +CKPT_FILE=$6 for((i=0;i env.log if [ -f $CKPT_FILE ] then - python train.py --data_dir=$DATASET --pretrained=$CKPT_FILE > log.txt 2>&1 & + python train.py --net=$NET_NAME --dataset=$DATASET_NAME --data_dir=$DATASET --pretrained=$CKPT_FILE > log.txt 2>&1 & else - python train.py --data_dir=$DATASET > log.txt 2>&1 & + python train.py --net=$NET_NAME --dataset=$DATASET_NAME --data_dir=$DATASET > log.txt 2>&1 & fi cd ../ diff --git a/model_zoo/official/cv/densenet121/scripts/run_distribute_train_gpu.sh b/model_zoo/official/cv/densenet/scripts/run_distribute_train_gpu.sh similarity index 67% rename from model_zoo/official/cv/densenet121/scripts/run_distribute_train_gpu.sh rename to model_zoo/official/cv/densenet/scripts/run_distribute_train_gpu.sh index 2683c67dbe..8972da6b3c 100644 --- a/model_zoo/official/cv/densenet121/scripts/run_distribute_train_gpu.sh +++ b/model_zoo/official/cv/densenet/scripts/run_distribute_train_gpu.sh @@ -14,16 +14,16 @@ # limitations under the License. # ============================================================================ -if [ $# -lt 3 ] +if [ $# -lt 5 ] then - echo "Usage: sh run_distribute_train_gpu.sh [DEVICE_NUM] [VISIABLE_DEVICES(0,1,2,3,4,5,6,7)] [DATASET_PATH] [PRE_TRAINED](optional)" -exit 1 + echo "Usage: sh run_distribute_train_gpu.sh [DEVICE_NUM] [VISIABLE_DEVICES(0,1,2,3,4,5,6,7)] [NET_NAME] [DATASET_NAME] [DATASET_PATH] [PRE_TRAINED](optional)" + exit 1 fi if [ $1 -lt 1 ] && [ $1 -gt 8 ] then echo "error: DEVICE_NUM=$1 is not in (1-8)" -exit 1 + exit 1 fi export DEVICE_NUM=$1 @@ -40,30 +40,38 @@ cd ../train || exit export CUDA_VISIBLE_DEVICES="$2" -if [ -f $4 ] # pretrained ckpt -then +if [ -f $6 ] # pretrained ckpt +then if [ $1 -gt 1 ] then mpirun -n $1 --allow-run-as-root python3 ${BASEPATH}/../train.py \ - --data_dir=$3 \ + --net=$3 \ + --dataset=$4 \ + --data_dir=$5 \ --device_target='GPU' \ - --pretrained=$4 > train.log 2>&1 & + --pretrained=$6 > train.log 2>&1 & else python3 ${BASEPATH}/../train.py \ - --data_dir=$3 \ + --net=$3 \ + --dataset=$4 \ + --data_dir=$5 \ --is_distributed=0 \ --device_target='GPU' \ - --pretrained=$4 > train.log 2>&1 & + --pretrained=$6 > train.log 2>&1 & fi else if [ $1 -gt 1 ] then mpirun -n $1 --allow-run-as-root python3 ${BASEPATH}/../train.py \ - --data_dir=$3 \ + --net=$3 \ + --dataset=$4 \ + --data_dir=$5 \ --device_target='GPU' > train.log 2>&1 & else python3 ${BASEPATH}/../train.py \ - --data_dir=$3 \ + --net=$3 \ + --dataset=$4 \ + --data_dir=$5 \ --is_distributed=0 \ --device_target='GPU' > train.log 2>&1 & fi diff --git a/model_zoo/official/cv/densenet/scripts/run_eval_cpu.sh b/model_zoo/official/cv/densenet/scripts/run_eval_cpu.sh new file mode 100644 index 0000000000..a9fab58b9f --- /dev/null +++ b/model_zoo/official/cv/densenet/scripts/run_eval_cpu.sh @@ -0,0 +1,46 @@ +#!/bin/bash +# Copyright 2021 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ + +if [ $# -lt 4 ] +then + echo "Usage: sh run_eval_cpu.sh [NET_NAME] [DATASET_NAME] [DATASET_PATH] [CHECKPOINT_PATH]" + exit 1 +fi + +# check checkpoint file +if [ ! -f $4 ] +then + echo "error: CHECKPOINT_PATH=$4 is not a file" + exit 1 +fi + +BASEPATH=$(cd "`dirname $0`" || exit; pwd) +export PYTHONPATH=${BASEPATH}:$PYTHONPATH + +if [ -d "../eval" ]; +then + rm -rf ../eval +fi +mkdir ../eval +cd ../eval || exit + +python ${BASEPATH}/../eval.py \ + --net=$1 \ + --dataset=$2 \ + --data_dir=$3 \ + --device_target='CPU' \ + --is_distributed=0 \ + --pretrained=$4 > eval.log 2>&1 & diff --git a/model_zoo/official/cv/densenet/scripts/run_train_cpu.sh b/model_zoo/official/cv/densenet/scripts/run_train_cpu.sh new file mode 100644 index 0000000000..14f0e12d97 --- /dev/null +++ b/model_zoo/official/cv/densenet/scripts/run_train_cpu.sh @@ -0,0 +1,49 @@ +#!/bin/bash +# Copyright 2021 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ + +if [ $# -lt 3 ] +then + echo "Usage: sh run_train_cpu.sh [NET_NAME] [DATASET_NAME] [DATASET_PATH] [PRE_TRAINED](optional)" + exit 1 +fi + +BASEPATH=$(cd "`dirname $0`" || exit; pwd) +export PYTHONPATH=${BASEPATH}:$PYTHONPATH +if [ -d "../train" ]; +then + rm -rf ../train +fi +mkdir ../train +cd ../train || exit + + +if [ -f $4 ] # pretrained ckpt +then + python ${BASEPATH}/../train.py \ + --net=$1 \ + --dataset=$2 \ + --data_dir=$3 \ + --is_distributed=0 \ + --device_target='CPU' \ + --pretrained=$4 > train.log 2>&1 & +else + python ${BASEPATH}/../train.py \ + --net=$1 \ + --dataset=$2 \ + --data_dir=$3 \ + --is_distributed=0 \ + --device_target='CPU' > train.log 2>&1 & +fi diff --git a/model_zoo/official/cv/densenet121/src/__init__.py b/model_zoo/official/cv/densenet/src/__init__.py similarity index 100% rename from model_zoo/official/cv/densenet121/src/__init__.py rename to model_zoo/official/cv/densenet/src/__init__.py diff --git a/model_zoo/official/cv/densenet121/src/config.py b/model_zoo/official/cv/densenet/src/config.py similarity index 61% rename from model_zoo/official/cv/densenet121/src/config.py rename to model_zoo/official/cv/densenet/src/config.py index f90dea09bb..840682c8a3 100644 --- a/model_zoo/official/cv/densenet121/src/config.py +++ b/model_zoo/official/cv/densenet/src/config.py @@ -1,4 +1,4 @@ -# Copyright 2020 Huawei Technologies Co., Ltd +# Copyright 2020-2021 Huawei Technologies Co., Ltd # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -15,7 +15,39 @@ """config""" from easydict import EasyDict as ed -config = ed({ +#config for densenet100 and cifar10 +config_100 = ed({ + "image_size": '32, 32', + "num_classes": 10, + + "lr": 0.1, + "lr_scheduler": 'exponential', + "lr_epochs": '150, 225, 300', + "lr_gamma": 0.1, + "eta_min": 0, + "T_max": 120, + "max_epoch": 300, + "per_batch_size": 64, + "warmup_epochs": 0, + + "weight_decay": 0.0001, + "momentum": 0.9, + "is_dynamic_loss_scale": 0, + "loss_scale": 1024, + "label_smooth": 0, + "label_smooth_factor": 0.1, + + "log_interval": 100, + "ckpt_interval": 3124, + "ckpt_path": 'outputs_cifar10/', + "is_save_on_master": 1, + + "rank": 0, + "group_size": 1 +}) + +# config for densenet121 and imagenet +config_121 = ed({ "image_size": '224,224', "num_classes": 1000, @@ -38,7 +70,7 @@ config = ed({ "log_interval": 100, "ckpt_interval": 50000, - "ckpt_path": 'outputs/', + "ckpt_path": 'outputs_imagenet/', "is_save_on_master": 1, "rank": 0, diff --git a/model_zoo/official/cv/densenet121/src/datasets/__init__.py b/model_zoo/official/cv/densenet/src/datasets/__init__.py similarity index 74% rename from model_zoo/official/cv/densenet121/src/datasets/__init__.py rename to model_zoo/official/cv/densenet/src/datasets/__init__.py index a1e6a79422..76d899208f 100644 --- a/model_zoo/official/cv/densenet121/src/datasets/__init__.py +++ b/model_zoo/official/cv/densenet/src/datasets/__init__.py @@ -1,4 +1,4 @@ -# Copyright 2020 Huawei Technologies Co., Ltd +# Copyright 2020-2021 Huawei Technologies Co., Ltd # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -17,6 +17,6 @@ read dataset for classification """ -from .classification import classification_dataset +from .classification import classification_dataset_cifar10, classification_dataset_imagenet -__all__ = ["classification_dataset"] +__all__ = ["classification_dataset_cifar10", "classification_dataset_imagenet"] diff --git a/model_zoo/official/cv/densenet121/src/datasets/classification.py b/model_zoo/official/cv/densenet/src/datasets/classification.py similarity index 61% rename from model_zoo/official/cv/densenet121/src/datasets/classification.py rename to model_zoo/official/cv/densenet/src/datasets/classification.py index 59cbd74a0e..4386899926 100644 --- a/model_zoo/official/cv/densenet121/src/datasets/classification.py +++ b/model_zoo/official/cv/densenet/src/datasets/classification.py @@ -1,4 +1,4 @@ -# Copyright 2020 Huawei Technologies Co., Ltd +# Copyright 2020-2021 Huawei Technologies Co., Ltd # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -50,17 +50,10 @@ class TxtDataset(): return len(self.imgs) -def classification_dataset(data_dir, image_size, per_batch_size, max_epoch, rank, group_size, - mode='train', - input_mode='folder', - root='', - num_parallel_workers=None, - shuffle=None, - sampler=None, - class_indexing=None, - drop_remainder=True, - transform=None, - target_transform=None): +def classification_dataset_imagenet(data_dir, image_size, per_batch_size, max_epoch, rank, group_size, mode='train', + input_mode='folder', root='', num_parallel_workers=None, shuffle=None, + sampler=None, class_indexing=None, drop_remainder=True, transform=None, + target_transform=None): """ A function that returns a dataset for classification. The mode of input dataset could be "folder" or "txt". If it is "folder", all images within one folder have the same label. If it is "txt", all paths of images @@ -88,7 +81,7 @@ def classification_dataset(data_dir, image_size, per_batch_size, max_epoch, rank unique index starting from 0). Examples: - >>> from src.datasets.classification import classification_dataset + >>> from src.datasets.classification import classification_dataset_imagenet >>> # path to imagefolder directory. This directory needs to contain sub-directories which contain the images >>> data_dir = "/path/to/imagefolder_directory" >>> de_dataset = classification_dataset(data_dir, image_size=[224, 244], @@ -152,3 +145,77 @@ def classification_dataset(data_dir, image_size, per_batch_size, max_epoch, rank de_dataset = de_dataset.repeat(1) return de_dataset + + +def classification_dataset_cifar10(data_dir, image_size, per_batch_size, max_epoch, rank, group_size, mode='train', + num_parallel_workers=None, shuffle=None, sampler=None, drop_remainder=True, + transform=None, target_transform=None): + """ + A function that returns cifar10 dataset for classification. + + Args: + data_dir (str): Path to the root directory that contains the dataset's bin files. + image_size (Union(int, sequence)): Size of the input images. + per_batch_size (int): the batch size of evey step during training. + max_epoch (int): the number of epochs. + rank (int): The shard ID within num_shards (default=None). + group_size (int): Number of shards that the dataset should be divided + into (default=None). + mode (str): "train" or others. Default: " train". + input_mode (str): The form of the input dataset. "folder" or "txt". Default: "folder". + root (str): the images path for "input_mode="txt"". Default: " ". + num_parallel_workers (int): Number of workers to read the data. Default: None. + shuffle (bool): Whether or not to perform shuffle on the dataset + (default=None, performs shuffle). + sampler (Sampler): Object used to choose samples from the dataset. Default: None. + + Examples: + >>> from src.datasets.classification import classification_dataset_cifar10 + >>> # path to imagefolder directory. This directory needs to contain bin files of data. + >>> data_dir = "/path/to/datafolder_directory" + >>> de_dataset = classification_dataset_cifar10(data_dir, image_size=[32, 32], + >>> per_batch_size=64, max_epoch=100, + >>> rank=0, group_size=1) + """ + + mean = [0.5 * 255, 0.5 * 255, 0.5 * 255] + std = [0.5 * 255, 0.5 * 255, 0.5 * 255] + + if transform is None: + if mode == 'train': + transform_img = [ + vision_C.RandomCrop(image_size, padding=4), + vision_C.RandomHorizontalFlip(prob=0.5), + vision_C.RandomColorAdjust(brightness=0.4, contrast=0.4, saturation=0.4), + vision_C.Normalize(mean=mean, std=std), + vision_C.HWC2CHW() + ] + else: + transform_img = [ + vision_C.Normalize(mean=mean, std=std), + vision_C.HWC2CHW() + ] + else: + transform_img = transform + + if target_transform is None: + transform_label = [ + normal_C.TypeCast(mstype.int32) + ] + else: + transform_label = target_transform + + de_dataset = de.Cifar10Dataset(data_dir, num_parallel_workers=num_parallel_workers, shuffle=shuffle, + sampler=sampler, num_shards=group_size, + shard_id=rank) + + de_dataset = de_dataset.map(input_columns="image", num_parallel_workers=8, operations=transform_img) + de_dataset = de_dataset.map(input_columns="label", num_parallel_workers=8, operations=transform_label) + + columns_to_project = ["image", "label"] + de_dataset = de_dataset.project(columns=columns_to_project) + + de_dataset = de_dataset.batch(per_batch_size, drop_remainder=drop_remainder) + de_dataset = de_dataset.repeat(1) + + return de_dataset diff --git a/model_zoo/official/cv/densenet121/src/datasets/sampler.py b/model_zoo/official/cv/densenet/src/datasets/sampler.py similarity index 100% rename from model_zoo/official/cv/densenet121/src/datasets/sampler.py rename to model_zoo/official/cv/densenet/src/datasets/sampler.py diff --git a/model_zoo/official/cv/densenet121/src/losses/__init__.py b/model_zoo/official/cv/densenet/src/losses/__init__.py similarity index 100% rename from model_zoo/official/cv/densenet121/src/losses/__init__.py rename to model_zoo/official/cv/densenet/src/losses/__init__.py diff --git a/model_zoo/official/cv/densenet121/src/losses/crossentropy.py b/model_zoo/official/cv/densenet/src/losses/crossentropy.py similarity index 100% rename from model_zoo/official/cv/densenet121/src/losses/crossentropy.py rename to model_zoo/official/cv/densenet/src/losses/crossentropy.py diff --git a/model_zoo/official/cv/densenet121/src/lr_scheduler/__init__.py b/model_zoo/official/cv/densenet/src/lr_scheduler/__init__.py similarity index 100% rename from model_zoo/official/cv/densenet121/src/lr_scheduler/__init__.py rename to model_zoo/official/cv/densenet/src/lr_scheduler/__init__.py diff --git a/model_zoo/official/cv/densenet121/src/lr_scheduler/lr_scheduler.py b/model_zoo/official/cv/densenet/src/lr_scheduler/lr_scheduler.py similarity index 100% rename from model_zoo/official/cv/densenet121/src/lr_scheduler/lr_scheduler.py rename to model_zoo/official/cv/densenet/src/lr_scheduler/lr_scheduler.py diff --git a/model_zoo/official/cv/densenet121/src/network/__init__.py b/model_zoo/official/cv/densenet/src/network/__init__.py similarity index 86% rename from model_zoo/official/cv/densenet121/src/network/__init__.py rename to model_zoo/official/cv/densenet/src/network/__init__.py index bb1727b063..bc048c2f96 100644 --- a/model_zoo/official/cv/densenet121/src/network/__init__.py +++ b/model_zoo/official/cv/densenet/src/network/__init__.py @@ -1,4 +1,4 @@ -# Copyright 2020 Huawei Technologies Co., Ltd +# Copyright 2020-2021 Huawei Technologies Co., Ltd # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -15,4 +15,4 @@ """ densenet network """ -from .densenet import DenseNet121 +from .densenet import DenseNet121, DenseNet100 diff --git a/model_zoo/official/cv/densenet121/src/network/densenet.py b/model_zoo/official/cv/densenet/src/network/densenet.py similarity index 72% rename from model_zoo/official/cv/densenet121/src/network/densenet.py rename to model_zoo/official/cv/densenet/src/network/densenet.py index 27708faf73..f529471eb6 100644 --- a/model_zoo/official/cv/densenet121/src/network/densenet.py +++ b/model_zoo/official/cv/densenet/src/network/densenet.py @@ -1,4 +1,4 @@ -# Copyright 2020 Huawei Technologies Co., Ltd +# Copyright 2020-2021 Huawei Technologies Co., Ltd # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -25,7 +25,7 @@ from mindspore.ops import operations as P from mindspore.common import initializer as init from src.utils.var_init import default_recurisive_init, KaimingNormal -__all__ = ["DenseNet121"] +__all__ = ["DenseNet121", "DenseNet100"] class GlobalAvgPooling(nn.Cell): """ @@ -123,13 +123,17 @@ class _Transition(nn.Cell): """ the transition layer """ - def __init__(self, num_input_features, num_output_features): + def __init__(self, num_input_features, num_output_features, avgpool=False): super(_Transition, self).__init__() + if avgpool: + poollayer = nn.AvgPool2d(kernel_size=2, stride=2) + else: + poollayer = nn.MaxPool2d(kernel_size=2, stride=2) self.features = nn.SequentialCell(OrderedDict([ ('norm', nn.BatchNorm2d(num_input_features)), ('relu', nn.ReLU()), ('conv', conv1x1(num_input_features, num_output_features)), - ('pool', nn.MaxPool2d(kernel_size=2, stride=2)) + ('pool', poollayer) ])) def construct(self, x): @@ -142,17 +146,23 @@ class Densenet(nn.Cell): """ __constants__ = ['features'] - def __init__(self, growth_rate, block_config, num_init_features, bn_size=4, drop_rate=0): + def __init__(self, growth_rate, block_config, num_init_features=None, bn_size=4, drop_rate=0): super(Densenet, self).__init__() layers = OrderedDict() - layers['conv0'] = conv7x7(3, num_init_features, stride=2, padding=3) - layers['norm0'] = nn.BatchNorm2d(num_init_features) - layers['relu0'] = nn.ReLU() - layers['pool0'] = nn.MaxPool2d(kernel_size=3, stride=2, pad_mode='same') + if num_init_features: + layers['conv0'] = conv7x7(3, num_init_features, stride=2, padding=3) + layers['norm0'] = nn.BatchNorm2d(num_init_features) + layers['relu0'] = nn.ReLU() + layers['pool0'] = nn.MaxPool2d(kernel_size=3, stride=2, pad_mode='same') + num_features = num_init_features + else: + layers['conv0'] = conv3x3(3, growth_rate*2, stride=1, padding=1) + layers['norm0'] = nn.BatchNorm2d(growth_rate*2) + layers['relu0'] = nn.ReLU() + num_features = growth_rate * 2 # Each denseblock - num_features = num_init_features for i, num_layers in enumerate(block_config): block = _DenseBlock( num_layers=num_layers, @@ -165,8 +175,12 @@ class Densenet(nn.Cell): num_features = num_features + num_layers*growth_rate if i != len(block_config)-1: - trans = _Transition(num_input_features=num_features, - num_output_features=num_features // 2) + if num_init_features: + trans = _Transition(num_input_features=num_features, num_output_features=num_features // 2, + avgpool=False) + else: + trans = _Transition(num_input_features=num_features, num_output_features=num_features // 2, + avgpool=True) layers['transition%d'%(i+1)] = trans num_features = num_features // 2 @@ -184,6 +198,11 @@ class Densenet(nn.Cell): def get_out_channels(self): return self.out_channels + +def _densenet100(**kwargs): + return Densenet(growth_rate=12, block_config=(16, 16, 16), **kwargs) + + def _densenet121(**kwargs): return Densenet(growth_rate=32, block_config=(6, 12, 24, 16), num_init_features=64, **kwargs) @@ -200,6 +219,38 @@ def _densenet201(**kwargs): return Densenet(growth_rate=32, block_config=(6, 12, 48, 32), num_init_features=64, **kwargs) +class DenseNet100(nn.Cell): + """ + the densenet100 architecture + """ + def __init__(self, num_classes, include_top=True): + super(DenseNet100, self).__init__() + self.backbone = _densenet100() + out_channels = self.backbone.get_out_channels() + self.include_top = include_top + if self.include_top: + self.head = CommonHead(num_classes, out_channels) + + default_recurisive_init(self) + for _, cell in self.cells_and_names(): + if isinstance(cell, nn.Conv2d): + cell.weight.set_data(init.initializer(KaimingNormal(a=math.sqrt(5), mode='fan_out', + nonlinearity='relu'), + cell.weight.shape, + cell.weight.dtype)) + elif isinstance(cell, nn.BatchNorm2d): + cell.gamma.set_data(init.initializer('ones', cell.gamma.shape)) + cell.beta.set_data(init.initializer('zeros', cell.beta.shape)) + elif isinstance(cell, nn.Dense): + cell.bias.set_data(init.initializer('zeros', cell.bias.shape)) + + def construct(self, x): + x = self.backbone(x) + if not self.include_top: + return x + x = self.head(x) + return x + class DenseNet121(nn.Cell): """ diff --git a/model_zoo/official/cv/densenet121/src/optimizers/__init__.py b/model_zoo/official/cv/densenet/src/optimizers/__init__.py similarity index 100% rename from model_zoo/official/cv/densenet121/src/optimizers/__init__.py rename to model_zoo/official/cv/densenet/src/optimizers/__init__.py diff --git a/model_zoo/official/cv/densenet121/src/utils/__init__.py b/model_zoo/official/cv/densenet/src/utils/__init__.py similarity index 100% rename from model_zoo/official/cv/densenet121/src/utils/__init__.py rename to model_zoo/official/cv/densenet/src/utils/__init__.py diff --git a/model_zoo/official/cv/densenet121/src/utils/logging.py b/model_zoo/official/cv/densenet/src/utils/logging.py similarity index 100% rename from model_zoo/official/cv/densenet121/src/utils/logging.py rename to model_zoo/official/cv/densenet/src/utils/logging.py diff --git a/model_zoo/official/cv/densenet121/src/utils/var_init.py b/model_zoo/official/cv/densenet/src/utils/var_init.py similarity index 100% rename from model_zoo/official/cv/densenet121/src/utils/var_init.py rename to model_zoo/official/cv/densenet/src/utils/var_init.py diff --git a/model_zoo/official/cv/densenet121/train.py b/model_zoo/official/cv/densenet/train.py similarity index 82% rename from model_zoo/official/cv/densenet121/train.py rename to model_zoo/official/cv/densenet/train.py index d0034d44c9..72c669de33 100644 --- a/model_zoo/official/cv/densenet121/train.py +++ b/model_zoo/official/cv/densenet/train.py @@ -1,4 +1,4 @@ -# Copyright 2020 Huawei Technologies Co., Ltd +# Copyright 2020-2021 Huawei Technologies Co., Ltd # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -32,21 +32,18 @@ from mindspore.context import ParallelMode from mindspore.common import set_seed from src.optimizers import get_param_groups -from src.network import DenseNet121 -from src.datasets import classification_dataset from src.losses.crossentropy import CrossEntropy from src.lr_scheduler import MultiStepLR, CosineAnnealingLR from src.utils.logging import get_logger -from src.config import config set_seed(1) class BuildTrainNetwork(nn.Cell): """build training network""" - def __init__(self, network, criterion): + def __init__(self, net, crit): super(BuildTrainNetwork, self).__init__() - self.network = network - self.criterion = criterion + self.network = net + self.criterion = crit def construct(self, input_data, label): output = self.network(input_data) @@ -108,6 +105,10 @@ def parse_args(cloud_args=None): """parameters""" parser = argparse.ArgumentParser('mindspore classification training') + # network and dataset choices + parser.add_argument('--net', type=str, default='', help='Densenet Model, densenet100 or densenet121') + parser.add_argument('--dataset', type=str, default='', help='Dataset, either cifar10 or imagenet') + # dataset related parser.add_argument('--data_dir', type=str, default='', help='train data dir') @@ -121,10 +122,17 @@ def parse_args(cloud_args=None): parser.add_argument('--train_url', type=str, default="", help='train url') # platform - parser.add_argument('--device_target', type=str, default='Ascend', choices=('Ascend', 'GPU'), help='device target') + parser.add_argument('--device_target', type=str, default='Ascend', choices=('Ascend', 'GPU', 'CPU'), + help='device target') args, _ = parser.parse_known_args() args = merge_args(args, cloud_args) + + if args.net == "densenet100": + from src.config import config_100 as config + else: + from src.config import config_121 as config + args.image_size = config.image_size args.num_classes = config.num_classes args.lr = config.lr @@ -158,15 +166,26 @@ def merge_args(args, cloud_args): """dictionary""" args_dict = vars(args) if isinstance(cloud_args, dict): - for key in cloud_args.keys(): - val = cloud_args[key] - if key in args_dict and val: - arg_type = type(args_dict[key]) + for k in cloud_args.keys(): + val = cloud_args[k] + if k in args_dict and val: + arg_type = type(args_dict[k]) if arg_type is not type(None): val = arg_type(val) - args_dict[key] = val + args_dict[k] = val return args +def get_lr_scheduler(args): + if args.lr_scheduler == 'exponential': + lr_scheduler = MultiStepLR(args.lr, args.lr_epochs, args.lr_gamma, args.steps_per_epoch, args.max_epoch, + warmup_epochs=args.warmup_epochs) + elif args.lr_scheduler == 'cosine_annealing': + lr_scheduler = CosineAnnealingLR(args.lr, args.T_max, args.steps_per_epoch, args.max_epoch, + warmup_epochs=args.warmup_epochs, eta_min=args.eta_min) + else: + raise NotImplementedError(args.lr_scheduler) + return lr_scheduler + def train(cloud_args=None): """training process""" args = parse_args(cloud_args) @@ -200,9 +219,18 @@ def train(cloud_args=None): datetime.datetime.now().strftime('%Y-%m-%d_time_%H_%M_%S')) args.logger = get_logger(args.outputs_dir, args.rank) + if args.net == "densenet100": + from src.network.densenet import DenseNet100 as DenseNet + else: + from src.network.densenet import DenseNet121 as DenseNet + + if args.dataset == "cifar10": + from src.datasets import classification_dataset_cifar10 as classification_dataset + else: + from src.datasets import classification_dataset_imagenet as classification_dataset + # dataloader - de_dataset = classification_dataset(args.data_dir, args.image_size, - args.per_batch_size, args.max_epoch, + de_dataset = classification_dataset(args.data_dir, args.image_size, args.per_batch_size, args.max_epoch, args.rank, args.group_size) de_dataset.map_model = 4 args.steps_per_epoch = de_dataset.get_dataset_size() @@ -212,12 +240,11 @@ def train(cloud_args=None): # network args.logger.important_info('start create network') # get network and init - network = DenseNet121(args.num_classes) + network = DenseNet(args.num_classes) # loss if not args.label_smooth: args.label_smooth_factor = 0.0 - criterion = CrossEntropy(smooth_factor=args.label_smooth_factor, - num_classes=args.num_classes) + criterion = CrossEntropy(smooth_factor=args.label_smooth_factor, num_classes=args.num_classes) # load pretrain model if os.path.isfile(args.pretrained): @@ -234,30 +261,12 @@ def train(cloud_args=None): args.logger.info('load model {} success'.format(args.pretrained)) # lr scheduler - if args.lr_scheduler == 'exponential': - lr_scheduler = MultiStepLR(args.lr, - args.lr_epochs, - args.lr_gamma, - args.steps_per_epoch, - args.max_epoch, - warmup_epochs=args.warmup_epochs) - elif args.lr_scheduler == 'cosine_annealing': - lr_scheduler = CosineAnnealingLR(args.lr, - args.T_max, - args.steps_per_epoch, - args.max_epoch, - warmup_epochs=args.warmup_epochs, - eta_min=args.eta_min) - else: - raise NotImplementedError(args.lr_scheduler) + lr_scheduler = get_lr_scheduler(args) lr_schedule = lr_scheduler.get_lr() # optimizer - opt = Momentum(params=get_param_groups(network), - learning_rate=Tensor(lr_schedule), - momentum=args.momentum, - weight_decay=args.weight_decay, - loss_scale=args.loss_scale) + opt = Momentum(params=get_param_groups(network), learning_rate=Tensor(lr_schedule), + momentum=args.momentum, weight_decay=args.weight_decay, loss_scale=args.loss_scale) # mixed precision training criterion.add_flags_recursive(fp32=True) @@ -280,6 +289,8 @@ def train(cloud_args=None): model = Model(train_net, optimizer=opt, metrics=None, loss_scale_manager=loss_scale_manager, amp_level="O3") elif args.device_target == 'GPU': model = Model(train_net, optimizer=opt, metrics=None, loss_scale_manager=loss_scale_manager, amp_level="O0") + elif args.device_target == 'CPU': + model = Model(train_net, optimizer=opt, metrics=None, loss_scale_manager=loss_scale_manager, amp_level="O0") else: raise ValueError("Unsupported device target.") @@ -290,8 +301,7 @@ def train(cloud_args=None): ckpt_max_num = args.max_epoch * args.steps_per_epoch // args.ckpt_interval ckpt_config = CheckpointConfig(save_checkpoint_steps=args.ckpt_interval, keep_checkpoint_max=ckpt_max_num) - ckpt_cb = ModelCheckpoint(config=ckpt_config, - directory=args.outputs_dir, + ckpt_cb = ModelCheckpoint(config=ckpt_config, directory=args.outputs_dir, prefix='{}'.format(args.rank)) callbacks.append(ckpt_cb)