!8570 add TinyNet-A, B, D, E to model_zoo

From: @wangrao124
Reviewed-by: 
Signed-off-by:
pull/8570/MERGE
mindspore-ci-bot 4 years ago committed by Gitee
commit 16b34e5e58

@ -8,11 +8,9 @@
- [Script and Sample Code](#script-and-sample-code) - [Script and Sample Code](#script-and-sample-code)
- [Training Process](#training-process) - [Training Process](#training-process)
- [Evaluation Process](#evaluation-process) - [Evaluation Process](#evaluation-process)
- [Evaluation](#evaluation)
- [Model Description](#model-description) - [Model Description](#model-description)
- [Performance](#performance) - [Performance](#performance)
- [Training Performance](#evaluation-performance) - [Evaluation Performance](#evaluation-performance)
- [Inference Performance](#evaluation-performance)
- [Description of Random Situation](#description-of-random-situation) - [Description of Random Situation](#description-of-random-situation)
- [ModelZoo Homepage](#modelzoo-homepage) - [ModelZoo Homepage](#modelzoo-homepage)
@ -22,7 +20,6 @@ TinyNets are a series of lightweight models obtained by twisting resolution, dep
[Paper](https://arxiv.org/abs/2010.14819): Kai Han, Yunhe Wang, Qiulin Zhang, Wei Zhang, Chunjing Xu, Tong Zhang. Model Rubik's Cube: Twisting Resolution, Depth and Width for TinyNets. In NeurIPS 2020. [Paper](https://arxiv.org/abs/2010.14819): Kai Han, Yunhe Wang, Qiulin Zhang, Wei Zhang, Chunjing Xu, Tong Zhang. Model Rubik's Cube: Twisting Resolution, Depth and Width for TinyNets. In NeurIPS 2020.
Note: We have only released TinyNet-C for now, and will release other TinyNets soon.
# [Model architecture](#contents) # [Model architecture](#contents)
The overall network architecture of TinyNet is show below: The overall network architecture of TinyNet is show below:
@ -44,42 +41,45 @@ Dataset used: [ImageNet 2012](http://image-net.org/challenges/LSVRC/2012/)
- Hardware (GPU) - Hardware (GPU)
- Framework - Framework
- [MindSpore](https://www.mindspore.cn/install/en) - [MindSpore](https://www.mindspore.cn/install/en)
- For more information, please check the resources below - For more information, please check the resources below:
- [MindSpore Tutorials](https://www.mindspore.cn/tutorial/training/en/master/index.html) - [MindSpore Tutorials](https://www.mindspore.cn/tutorial/training/en/master/index.html)
- [MindSpore Python API](https://www.mindspore.cn/doc/api_python/en/master/index.html) - [MindSpore Python API](https://www.mindspore.cn/doc/api_python/en/master/index.html)
# [Script description](#contents) # [Script Description](#contents)
## [Script and sample code](#contents) ## [Script and Sample Code](#contents)
``` ```markdown
.tinynet .tinynet
├── Readme.md # descriptions about tinynet ├── README.md # descriptions about tinynet
├── script ├── script
│ ├── eval.sh # evaluation script │ ├── eval.sh # evaluation script
│ ├── train_1p_gpu.sh # training script on single GPU │ ├── train_1p_gpu.sh # training script on single GPU
│ └── train_distributed_gpu.sh # distributed training script on multiple GPUs │ └── train_distributed_gpu.sh # distributed training script on multiple GPUs
├── src ├── src
│ ├── callback.py # loss and checkpoint callbacks │ ├── callback.py # loss, ema, and checkpoint callbacks
│ ├── dataset.py # data processing │ ├── dataset.py # data preprocessing
│ ├── loss.py # label-smoothing cross-entropy loss function │ ├── loss.py # label-smoothing cross-entropy loss function
│ ├── tinynet.py # tinynet architecture │ ├── tinynet.py # tinynet architecture
│ └── utils.py # utility functions │ └── utils.py # utility functions
├── eval.py # evaluation interface ├── eval.py # evaluation interface
└── train.py # training interface └── train.py # training interface
``` ```
## [Training process](#contents)
### Launch ### [Training process](#contents)
``` #### Launch
```bash
# training on single GPU # training on single GPU
sh train_1p_gpu.sh sh train_1p_gpu.sh
# training on multiple GPUs, the number after -n indicates how many GPUs will be used for training # training on multiple GPUs, the number after -n indicates how many GPUs will be used for training
sh train_distributed_gpu.sh -n 8 sh train_distributed_gpu.sh -n 8
``` ```
Inside train.sh, there are hyperparameters that can be adjusted during training, for example: Inside train.sh, there are hyperparameters that can be adjusted during training, for example:
```
```python
--model tinynet_c model to be used for training --model tinynet_c model to be used for training
--drop 0.2 dropout rate --drop 0.2 dropout rate
--drop-connect 0 drop connect rate --drop-connect 0 drop connect rate
@ -102,22 +102,25 @@ Inside train.sh, there are hyperparameters that can be adjusted during training,
--GPU using GPU for training --GPU using GPU for training
--dataset_sink using sink mode --dataset_sink using sink mode
``` ```
The config above was used to train tinynets on ImageNet (change drop-connect to 0.2 for training tinynet-b)
The config above was used to train tinynets on ImageNet (change drop-connect to 0.1 for training tinynet_b)
> checkpoints will be saved in the ./device_{rank_id} folder (single GPU) > checkpoints will be saved in the ./device_{rank_id} folder (single GPU)
or ./device_parallel folder (multiple GPUs) or ./device_parallel folder (multiple GPUs)
## [Eval process](#contents) ### [Evaluation Process](#contents)
### Launch #### Launch
``` ```bash
# infer example # infer example
sh eval.sh sh eval.sh
``` ```
Inside the eval.sh, there are configs that can be adjusted during inference, for example: Inside the eval.sh, there are configs that can be adjusted during inference, for example:
```
```python
--num-classes 1000 --num-classes 1000
--batch-size 128 --batch-size 128
--workers 8 --workers 8
@ -126,13 +129,14 @@ Inside the eval.sh, there are configs that can be adjusted during inference, for
--ckpt /path_to_EMA_checkpoint/ --ckpt /path_to_EMA_checkpoint/
--dataset_sink > tinynet_c_eval.log 2>&1 & --dataset_sink > tinynet_c_eval.log 2>&1 &
``` ```
> checkpoint can be produced in training process. > checkpoint can be produced in training process.
# [Model Description](#contents) # [Model Description](#contents)
## [Performance](#contents) ## [Performance](#contents)
#### Evaluation Performance ### Evaluation Performance
| Model | FLOPs | Latency* | ImageNet Top-1 | | Model | FLOPs | Latency* | ImageNet Top-1 |
| ------------------- | ----- | -------- | -------------- | | ------------------- | ----- | -------- | -------------- |

@ -36,7 +36,7 @@ def load_nparray_into_net(net, array_dict):
for _, param in net.parameters_and_names(): for _, param in net.parameters_and_names():
if param.name in array_dict: if param.name in array_dict:
new_param = array_dict[param.name] new_param = array_dict[param.name]
param.set_data(Parameter(new_param.copy(), name=param.name)) param.set_data(Parameter(Tensor(deepcopy(new_param)), name=param.name))
else: else:
param_not_load.append(param.name) param_not_load.append(param.name)
return param_not_load return param_not_load
@ -48,8 +48,8 @@ class EmaEvalCallBack(Callback):
the end of training epoch. the end of training epoch.
Args: Args:
model: Mindspore model instance. network: tinynet network instance.
ema_network: step-wise exponential moving average for ema_network. ema_network: step-wise exponential moving average of network.
eval_dataset: the evaluation daatset. eval_dataset: the evaluation daatset.
decay (float): ema decay. decay (float): ema decay.
save_epoch (int): defines how often to save checkpoint. save_epoch (int): defines how often to save checkpoint.
@ -57,9 +57,9 @@ class EmaEvalCallBack(Callback):
start_epoch (int): which epoch to start/resume training. start_epoch (int): which epoch to start/resume training.
""" """
def __init__(self, model, ema_network, eval_dataset, loss_fn, decay=0.999, def __init__(self, network, ema_network, eval_dataset, loss_fn, decay=0.999,
save_epoch=1, dataset_sink_mode=True, start_epoch=0): save_epoch=1, dataset_sink_mode=True, start_epoch=0):
self.model = model self.network = network
self.ema_network = ema_network self.ema_network = ema_network
self.eval_dataset = eval_dataset self.eval_dataset = eval_dataset
self.loss_fn = loss_fn self.loss_fn = loss_fn
@ -80,14 +80,12 @@ class EmaEvalCallBack(Callback):
def begin(self, run_context): def begin(self, run_context):
"""Initialize the EMA parameters """ """Initialize the EMA parameters """
cb_params = run_context.original_args() for _, param in self.network.parameters_and_names():
for _, param in cb_params.network.parameters_and_names():
self.shadow[param.name] = deepcopy(param.data.asnumpy()) self.shadow[param.name] = deepcopy(param.data.asnumpy())
def step_end(self, run_context): def step_end(self, run_context):
"""Update the EMA parameters""" """Update the EMA parameters"""
cb_params = run_context.original_args() for _, param in self.network.parameters_and_names():
for _, param in cb_params.network.parameters_and_names():
new_average = (1.0 - self.decay) * param.data.asnumpy().copy() + \ new_average = (1.0 - self.decay) * param.data.asnumpy().copy() + \
self.decay * self.shadow[param.name] self.decay * self.shadow[param.name]
self.shadow[param.name] = new_average self.shadow[param.name] = new_average
@ -98,24 +96,20 @@ class EmaEvalCallBack(Callback):
cur_epoch = cb_params.cur_epoch_num + self._start_epoch - 1 cur_epoch = cb_params.cur_epoch_num + self._start_epoch - 1
save_ckpt = (cur_epoch % self.save_epoch == 0) save_ckpt = (cur_epoch % self.save_epoch == 0)
acc = self.model.eval(
self.eval_dataset, dataset_sink_mode=self.dataset_sink_mode)
print("Model Accuracy:", acc)
load_nparray_into_net(self.ema_network, self.shadow) load_nparray_into_net(self.ema_network, self.shadow)
self.ema_network.set_train(False) model = Model(self.network, loss_fn=self.loss_fn, metrics=self.eval_metrics)
model_ema = Model(self.ema_network, loss_fn=self.loss_fn, model_ema = Model(self.ema_network, loss_fn=self.loss_fn,
metrics=self.eval_metrics) metrics=self.eval_metrics)
acc = model.eval(
self.eval_dataset, dataset_sink_mode=self.dataset_sink_mode)
ema_acc = model_ema.eval( ema_acc = model_ema.eval(
self.eval_dataset, dataset_sink_mode=self.dataset_sink_mode) self.eval_dataset, dataset_sink_mode=self.dataset_sink_mode)
print("Model Accuracy:", acc)
print("EMA-Model Accuracy:", ema_acc) print("EMA-Model Accuracy:", ema_acc)
self.ema_accuracy[cur_epoch] = ema_acc["Top1-Acc"]
output = [{"name": k, "data": Tensor(v)} output = [{"name": k, "data": Tensor(v)}
for k, v in self.shadow.items()] for k, v in self.shadow.items()]
self.ema_accuracy[cur_epoch] = ema_acc["Top1-Acc"]
if self.best_ema_accuracy < ema_acc["Top1-Acc"]: if self.best_ema_accuracy < ema_acc["Top1-Acc"]:
self.best_ema_accuracy = ema_acc["Top1-Acc"] self.best_ema_accuracy = ema_acc["Top1-Acc"]
self.best_ema_epoch = cur_epoch self.best_ema_epoch = cur_epoch

@ -65,12 +65,12 @@ def create_dataset(batch_size, train_data_url='', workers=8, distributed=False,
contrast=adjust_range, contrast=adjust_range,
saturation=adjust_range) saturation=adjust_range)
to_tensor = py_vision.ToTensor() to_tensor = py_vision.ToTensor()
nromlize_op = py_vision.Normalize( normalize_op = py_vision.Normalize(
IMAGENET_DEFAULT_MEAN, IMAGENET_DEFAULT_STD) IMAGENET_DEFAULT_MEAN, IMAGENET_DEFAULT_STD)
# assemble all the transforms # assemble all the transforms
image_ops = py_transforms.Compose([decode_op, random_resize_crop_bicubic, image_ops = py_transforms.Compose([decode_op, random_resize_crop_bicubic,
random_horizontal_flip_op, random_color_jitter_op, to_tensor, nromlize_op]) random_horizontal_flip_op, random_color_jitter_op, to_tensor, normalize_op])
rank_id = get_rank() if distributed else 0 rank_id = get_rank() if distributed else 0
rank_size = get_group_size() if distributed else 1 rank_size = get_group_size() if distributed else 1
@ -125,11 +125,11 @@ def create_dataset_val(batch_size=128, val_data_url='', workers=8, distributed=F
resize_op = py_vision.Resize(size=scale_size, interpolation=Inter.BICUBIC) resize_op = py_vision.Resize(size=scale_size, interpolation=Inter.BICUBIC)
center_crop = py_vision.CenterCrop(size=input_size) center_crop = py_vision.CenterCrop(size=input_size)
to_tensor = py_vision.ToTensor() to_tensor = py_vision.ToTensor()
nromlize_op = py_vision.Normalize( normalize_op = py_vision.Normalize(
IMAGENET_DEFAULT_MEAN, IMAGENET_DEFAULT_STD) IMAGENET_DEFAULT_MEAN, IMAGENET_DEFAULT_STD)
image_ops = py_transforms.Compose([decode_op, resize_op, center_crop, image_ops = py_transforms.Compose([decode_op, resize_op, center_crop,
to_tensor, nromlize_op]) to_tensor, normalize_op])
dataset = dataset.map(input_columns=["label"], operations=type_cast_op, dataset = dataset.map(input_columns=["label"], operations=type_cast_op,
num_parallel_workers=workers) num_parallel_workers=workers)

@ -18,10 +18,12 @@ import re
from copy import deepcopy from copy import deepcopy
import mindspore.nn as nn import mindspore.nn as nn
import mindspore.common.dtype as mstype
from mindspore.ops import operations as P from mindspore.ops import operations as P
from mindspore.common.initializer import Normal, Zero, One, initializer, Uniform from mindspore.common.initializer import Normal, Zero, One, initializer, Uniform
from mindspore import context, ms_function from mindspore import context, ms_function
from mindspore.common.parameter import Parameter from mindspore.common.parameter import Parameter
from mindspore import Tensor
# Imagenet constant values # Imagenet constant values
IMAGENET_DEFAULT_MEAN = (0.485, 0.456, 0.406) IMAGENET_DEFAULT_MEAN = (0.485, 0.456, 0.406)
@ -29,12 +31,14 @@ IMAGENET_DEFAULT_STD = (0.229, 0.224, 0.225)
# model structure configurations for TinyNets, values are # model structure configurations for TinyNets, values are
# (resolution multiplier, channel multiplier, depth multiplier) # (resolution multiplier, channel multiplier, depth multiplier)
# only tinynet-c is availiable for now, we will release other tinynet
# models soon
# codes are inspired and partially adapted from # codes are inspired and partially adapted from
# https://github.com/rwightman/gen-efficientnet-pytorch # https://github.com/rwightman/gen-efficientnet-pytorch
TINYNET_CFG = {"c": (0.825, 0.54, 0.85)} TINYNET_CFG = {"a": (0.86, 1.0, 1.2),
"b": (0.84, 0.75, 1.1),
"c": (0.825, 0.54, 0.85),
"d": (0.68, 0.54, 0.695),
"e": (0.475, 0.51, 0.60)}
relu = P.ReLU() relu = P.ReLU()
sigmoid = P.Sigmoid() sigmoid = P.Sigmoid()
@ -524,13 +528,15 @@ class DropConnect(nn.Cell):
self.dtype = P.DType() self.dtype = P.DType()
self.keep_prob = 1 - drop_connect_rate self.keep_prob = 1 - drop_connect_rate
self.dropout = P.Dropout(keep_prob=self.keep_prob) self.dropout = P.Dropout(keep_prob=self.keep_prob)
self.keep_prob_tensor = Tensor(self.keep_prob, dtype=mstype.float32)
def construct(self, x): def construct(self, x):
shape = self.shape(x) shape = self.shape(x)
dtype = self.dtype(x) dtype = self.dtype(x)
ones_tensor = P.Fill()(dtype, (shape[0], 1, 1, 1), 1) ones_tensor = P.Fill()(dtype, (shape[0], 1, 1, 1), 1)
_, mask_ = self.dropout(ones_tensor) _, mask = self.dropout(ones_tensor)
x = x * mask_ x = x * mask
x = x / self.keep_prob_tensor
return x return x

@ -227,7 +227,7 @@ def main():
net_ema.set_train(False) net_ema.set_train(False)
assert args.ema_decay > 0, "EMA should be used in tinynet training." assert args.ema_decay > 0, "EMA should be used in tinynet training."
ema_cb = EmaEvalCallBack(model=model, ema_cb = EmaEvalCallBack(network=net,
ema_network=net_ema, ema_network=net_ema,
loss_fn=loss, loss_fn=loss,
eval_dataset=val_dataset, eval_dataset=val_dataset,

Loading…
Cancel
Save