# ResNext50 Example ## Description This is an example of training ResNext50 with ImageNet dataset in Mindspore. ## Requirements - Install [Mindspore](http://www.mindspore.cn/install/en). - Downlaod the dataset ImageNet2012. ## Structure ```shell . └─resnext50 ├─README.md ├─scripts ├─run_standalone_train.sh # launch standalone training(1p) ├─run_distribute_train.sh # launch distributed training(8p) └─run_eval.sh # launch evaluating ├─src ├─backbone ├─_init_.py # initalize ├─resnet.py # resnext50 backbone ├─utils ├─_init_.py # initalize ├─cunstom_op.py # network operation ├─logging.py # print log ├─optimizers_init_.py # get parameters ├─sampler.py # distributed sampler ├─var_init_.py # calculate gain value ├─_init_.py # initalize ├─config.py # parameter configuration ├─crossentropy.py # CrossEntropy loss function ├─dataset.py # data preprocessing ├─head.py # commom head ├─image_classification.py # get resnet ├─linear_warmup.py # linear warmup learning rate ├─warmup_cosine_annealing.py # learning rate each step ├─warmup_step_lr.py # warmup step learning rate ├─eval.py # eval net └─train.py # train net ``` ## Parameter Configuration Parameters for both training and evaluating can be set in config.py ``` "image_height": '224,224' # image size "num_classes": 1000, # dataset class number "per_batch_size": 128, # batch size of input tensor "lr": 0.05, # base learning rate "lr_scheduler": 'cosine_annealing', # learning rate mode "lr_epochs": '30,60,90,120', # epoch of lr changing "lr_gamma": 0.1, # decrease lr by a factor of exponential lr_scheduler "eta_min": 0, # eta_min in cosine_annealing scheduler "T_max": 150, # T-max in cosine_annealing scheduler "max_epoch": 150, # max epoch num to train the model "backbone": 'resnext50', # backbone metwork "warmup_epochs" : 1, # warmup epoch "weight_decay": 0.0001, # weight decay "momentum": 0.9, # momentum "is_dynamic_loss_scale": 0, # dynamic loss scale "loss_scale": 1024, # loss scale "label_smooth": 1, # label_smooth "label_smooth_factor": 0.1, # label_smooth_factor "ckpt_interval": 2000, # ckpt_interval "ckpt_path": 'outputs/', # checkpoint save location "is_save_on_master": 1, "rank": 0, # local rank of distributed "group_size": 1 # world size of distributed ``` ## Running the example ### Train #### Usage ``` # distribute training example(8p) sh run_distribute_train.sh MINDSPORE_HCCL_CONFIG_PATH DATA_PATH # standalone training sh run_standalone_train.sh DEVICE_ID DATA_PATH ``` #### Launch ```bash # distributed training example(8p) sh scripts/run_distribute_train.sh MINDSPORE_HCCL_CONFIG_PATH /ImageNet/train # standalone training example sh scripts/run_standalone_train.sh 0 /ImageNet_Original/train ``` #### Result You can find checkpoint file together with result in log. ### Evaluation #### Usage ``` # Evaluation sh run_eval.sh DEVICE_ID DATA_PATH PRETRAINED_CKPT_PATH ``` #### Launch ```bash # Evaluation with checkpoint sh scripts/run_eval.sh 0 /opt/npu/datasets/classification/val /resnext50_100.ckpt ``` > checkpoint can be produced in training process. #### Result Evaluation result will be stored in the scripts path. Under this, you can find result like the followings in log. ``` acc=78,16%(TOP1) acc=93.88%(TOP5) ```