diff --git a/mindspore/lite/example/train_lenet/README.md b/mindspore/lite/example/train_lenet/README.md new file mode 100644 index 0000000000..8d0c7647b3 --- /dev/null +++ b/mindspore/lite/example/train_lenet/README.md @@ -0,0 +1,134 @@ +# Content + + + +- [Overview](#overview) +- [Model Architecture](#model-architecture) +- [Dataset](#dataset) +- [Environment Requirements](#environment-requirements) +- [Quick Start](#quick-start) +- [Script Detailed Description](#script-detailed-description) + + + +# Overview + +This folder holds code for Training-on-Device of a LeNet model. Part of the code runs on a server using MindSpore infrastructure, another part uses MindSpore Lite conversion utility, and the last part is the actual training of the model on some android-based device. + +# Model Architecture + +LeNet is a very simple network which is composed of only 5 layers, 2 of which are convolutional layers and the remaining 3 are fully connected layers. Such a small network can be fully trained (from scratch) on a device in a short time. Therefore, it is a good example. + +# Dataset + +In this example we use the MNIST dataset of handwritten digits as published in [THE MNIST DATABASE]() + +- Dataset size:52.4M,60,000 28*28 in 10 classes + - Test:10,000 images + - Train:60,000 images +- Data format:binary files + - Note:Data will be processed in dataset.cc + +- The dataset directory structure is as follows: + +```python +mnist/ +├── test +│   ├── t10k-images-idx3-ubyte +│   └── t10k-labels-idx1-ubyte +└── train + ├── train-images-idx3-ubyte + └── train-labels-idx1-ubyte +``` + +# Environment Requirements + +- Server side + - [MindSpore Framework](https://www.mindspore.cn/install/en): it is recommended to install a docker image + - [MindSpore ToD Framework](https://www.mindspore.cn/tutorial/tod/en/use/prparation.html) + - [Android NDK r20b](https://dl.google.com/android/repository/android-ndk-r20b-linux-x86_64.zip) + - [Android SDK](https://developer.android.com/studio?hl=zh-cn#cmdline-tools) +- A connected Android device + +# Quick Start + +After installing all the above mentioned, the script in the home directory could be run with the following arguments: + +```python +sh ./prepare_and_run.sh DATASET_PATH [MINDSPORE_DOCKER] [RELEASE.tar.gz] +``` + +where: + +- DATASET_PATH is the path to the [dataset](#dataset), +- MINDSPORE_DOCKER is the image name of the docker that runs [MindSpore](#environment-requirements). If not provided MindSpore will be run locally +- and REALEASE.tar.gz is a pointer to the MindSpore ToD release tar ball. If not provided, the script will attempt to find MindSpore ToD compilation output. + +# Script Detailed Description + +The provided `prepare_and_run.sh` script is performing the followings: + +- Prepare the trainable lenet model in a `.ms` format +- Prepare the folder that should be pushed into the device +- Copy this folder into the device and run the scripts on the device + +See how to run the script and paramaters definitions in the [Quick Start Section](#quick-start) + +## Preparing the model + +Within the model folder a `prepare_model.sh` script uses MindSpore infrastructure to export the model into a `.mindir` file. The user can specify a docker image on which MindSpore is installed. Otherwise, the pyhton script will be run locally. +The script then converts the `.mindir` to a `.ms` format using the MindSpore ToD converter. +The script accepts a tar ball where the converter resides. Otherwise, the script will attempt to find the converter in the MindSpore ToD build output directory. + +## Preparing the Folder + +The `lenet_tod.ms` model file is then copied into the `package` folder as well as scripts, the MindSpore ToD library and the MNIST dataset. +Finally, the code (in src) is compiled for arm64 and the binary is copied into the `package` folder. + +### Running the code on the device + +To run the code on the device the script first uses `adb` tool to push the `package` folder into the device. It then runs training (which takes some time) and finally runs evaluation of the trained model using the test data. + +# Folder Directory tree + +``` python +train_lenet/ +├── Makefile # Makefile of src code +├── model +│   ├── lenet_export.py # Python script that exports the LeNet model to .mindir +│   ├── prepare_model.sh # script that export model (using docker) then converts it +│   └── train_utils.py # utility function used during the export +├── prepare_and_run.sh # main script that creates model, compiles it and send to device for running +├── README.md # this manual +├── scripts +│   ├── eval.sh # on-device script that load the train model and evaluates its accuracy +│   ├── run_eval.sh # adb script that launches eval.sh +│   ├── run_train.sh # adb script that launches train.sh +│   └── train.sh # on-device script that load the initial model and train it +├── src +│   ├── dataset.cc # dataset handler +│   ├── dataset.h # dataset class header +│   ├── net_runner.cc # program that runs training/evaluation of models +│   └── net_runner.h # net_runner header +``` + +When the `prepare_and_run.sh` script is run, the following folder is prepared. It is pushed to the device and then training runs + +``` python +├── package +│   ├── bin +│   │   └── net_runner # the executable that performs the training/evaluation +│   ├── dataset +│   │   ├── test +│   │   │   ├── t10k-images-idx3-ubyte # test images +│   │   │   └── t10k-labels-idx1-ubyte # test labels +│   │   └── train +│   │   ├── train-images-idx3-ubyte # train images +│   │   └── train-labels-idx1-ubyte # train labels +│   ├── eval.sh # on-device script that load the train model and evaluates its accuracy +│   ├── lib +│   │   └── libmindspore-lite.so # MindSpore Lite library +│   ├── model +│   │   └── lenet_tod.ms # model to train +│   └── train.sh # on-device script that load the initial model and train it +``` diff --git a/mindspore/lite/example/train_lenet/model/lenet_export.py b/mindspore/lite/example/train_lenet/model/lenet_export.py new file mode 100644 index 0000000000..eddcc47eb0 --- /dev/null +++ b/mindspore/lite/example/train_lenet/model/lenet_export.py @@ -0,0 +1,37 @@ +# Copyright 2020 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ +"""lenet_export.""" + +import sys +from mindspore import context, Tensor +import mindspore.common.dtype as mstype +from mindspore.train.serialization import export +from lenet import LeNet5 +import numpy as np +from train_utils import TrainWrap + +sys.path.append('../../../cv/lenet/src/') + +n = LeNet5() +n.set_train() +context.set_context(mode=context.PYNATIVE_MODE, device_target="GPU", save_graphs=False) + +batch_size = 32 +x = Tensor(np.ones((batch_size, 1, 32, 32)), mstype.float32) +label = Tensor(np.zeros([batch_size, 10]).astype(np.float32)) +net = TrainWrap(n) +export(net, x, label, file_name="lenet_tod.mindir", file_format='MINDIR') + +print("finished exporting") diff --git a/mindspore/lite/example/train_lenet/model/prepare_model.sh b/mindspore/lite/example/train_lenet/model/prepare_model.sh new file mode 100755 index 0000000000..742fbd02c0 --- /dev/null +++ b/mindspore/lite/example/train_lenet/model/prepare_model.sh @@ -0,0 +1,24 @@ +CONVERTER="../../../../../mindspore/lite/build/tools/converter/converter_lite" +if [ ! -f "$CONVERTER" ]; then + if ! command -v converter_lite &> /dev/null + then + echo "converter_lite could not be found in MindSpore build directory nor in system path" + exit + else + CONVERTER=converter_lite + fi +fi + +echo "============Exporting==========" +if [ -n "$1" ]; then + DOCKER_IMG=$1 + docker run -w $PWD --runtime=nvidia -v /home/$USER:/home/$USER --privileged=true ${DOCKER_IMG} /bin/bash -c "python lenet_export.py; chmod 444 lenet_tod.mindir; rm -rf __pycache__" +else + echo "MindSpore docker was not provided, attempting to run locally" + python lenet_export.py +fi + + +echo "============Converting=========" +$CONVERTER --fmk=MINDIR --trainModel=true --modelFile=lenet_tod.mindir --outputFile=lenet_tod + diff --git a/mindspore/lite/example/train_lenet/model/train_utils.py b/mindspore/lite/example/train_lenet/model/train_utils.py new file mode 100644 index 0000000000..7d26d9dac7 --- /dev/null +++ b/mindspore/lite/example/train_lenet/model/train_utils.py @@ -0,0 +1,34 @@ +# Copyright 2020 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ +"""train_utils.""" + +import mindspore.nn as nn +from mindspore.common.parameter import ParameterTuple + +def TrainWrap(net, loss_fn=None, optimizer=None, weights=None): + """ + TrainWrap + """ + if loss_fn is None: + loss_fn = nn.SoftmaxCrossEntropyWithLogits() + loss_net = nn.WithLossCell(net, loss_fn) + loss_net.set_train() + if weights is None: + weights = ParameterTuple(net.trainable_params()) + if optimizer is None: + optimizer = nn.Adam(weights, learning_rate=1e-3, beta1=0.9, beta2=0.999, eps=1e-8, use_locking=False, + use_nesterov=False, weight_decay=0.0, loss_scale=1.0) + train_net = nn.TrainOneStepCell(loss_net, optimizer) + return train_net diff --git a/mindspore/lite/example/train_lenet/prepare_and_run.sh b/mindspore/lite/example/train_lenet/prepare_and_run.sh new file mode 100755 index 0000000000..c7f88d1a71 --- /dev/null +++ b/mindspore/lite/example/train_lenet/prepare_and_run.sh @@ -0,0 +1,82 @@ +#!/bin/bash + +display_usage() { + echo -e "\nUsage: prepare_and_run.sh dataset_path [mindspore_docker] [release.tar.gz]\n" + } + +if [ -n "$1" ]; then + MNIST_DATA_PATH=$1 +else + echo "MNIST Dataset directory path was not provided" + display_usage + exit 0 +fi + +if [ -n "$2" ]; then + DOCKER=$2 +else + DOCKER="" + #echo "MindSpore docker was not provided" + #display_usage + #exit 0 +fi + +if [ -n "$3" ]; then + TARBALL=$3 +else + if [ -f ../../../../output/mindspore-lite-*-runtime-arm64-cpu-train.tar.gz ]; then + TARBALL="../../../../output/mindspore-lite-*-runtime-arm64-cpu-train.tar.gz" + else + echo "release.tar.gz was not found" + display_usage + exit 0 + fi +fi + + +# Prepare the model +cd model/ +rm -f *.ms +./prepare_model.sh $DOCKER +cd - + +# Copy the .ms model to the package folder +rm -rf package +mkdir -p package/model +cp model/*.ms package/model + +# Copy the running script to the package +cp scripts/train.sh package/ +cp scripts/eval.sh package/ + +# Copy the shared MindSpore ToD library +tar -xzvf ${TARBALL} --wildcards --no-anchored libmindspore-lite.so +tar -xzvf ${TARBALL} --wildcards --no-anchored include +mv mindspore-*/lib package/ +mkdir msl +mv mindspore-*/* msl/ +rm -rf mindspore-* + +# Copy the dataset to the package +cp -r ${MNIST_DATA_PATH} package/dataset + +# Compile program +make TARGET=arm64 + +# Copy the executable to the package +mv bin package/ + +# Push the folder to the device +adb push package /data/local/tmp/ + +echo "Training on Device" +adb shell < scripts/run_train.sh + +echo +echo "Load trained model and evaluate accuracy" +adb shell < scripts/run_eval.sh +echo + +#rm -rf src/*.o package model/__pycache__ model/*.ms + +#./prepare_and_run.sh /opt/share/dataset/mnist mindspore_dev:5 diff --git a/mindspore/lite/example/train_lenet/scripts/eval.sh b/mindspore/lite/example/train_lenet/scripts/eval.sh new file mode 100755 index 0000000000..43890c4028 --- /dev/null +++ b/mindspore/lite/example/train_lenet/scripts/eval.sh @@ -0,0 +1,19 @@ +#!/bin/bash +# Copyright 2020 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ + +# an simple tutorial as follows, more parameters can be setting +DATA_PATH=$1 +LD_LIBRARY_PATH=./lib/ bin/net_runner -f model/lenet_tod_trained_3000.ms -e 0 -d dataset diff --git a/mindspore/lite/example/train_lenet/scripts/run_eval.sh b/mindspore/lite/example/train_lenet/scripts/run_eval.sh new file mode 100644 index 0000000000..8ed3afbe26 --- /dev/null +++ b/mindspore/lite/example/train_lenet/scripts/run_eval.sh @@ -0,0 +1,2 @@ +cd /data/local/tmp/package +/system/bin/sh eval.sh diff --git a/mindspore/lite/example/train_lenet/scripts/run_train.sh b/mindspore/lite/example/train_lenet/scripts/run_train.sh new file mode 100644 index 0000000000..d265183a36 --- /dev/null +++ b/mindspore/lite/example/train_lenet/scripts/run_train.sh @@ -0,0 +1,2 @@ +cd /data/local/tmp/package +/system/bin/sh train.sh diff --git a/mindspore/lite/example/train_lenet/scripts/train.sh b/mindspore/lite/example/train_lenet/scripts/train.sh new file mode 100755 index 0000000000..ea04581ed5 --- /dev/null +++ b/mindspore/lite/example/train_lenet/scripts/train.sh @@ -0,0 +1,21 @@ +#!/bin/bash +# Copyright 2020 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ + +# an simple tutorial as follows, more parameters can be setting +script_self=$(readlink -f "$0") +self_path=$(dirname "${script_self}") +DATA_PATH=$1 +LD_LIBRARY_PATH=./lib/ bin/net_runner -f model/lenet_tod.ms -e 3000 -d dataset diff --git a/mindspore/lite/example/train_lenet/src/dataset.cc b/mindspore/lite/example/train_lenet/src/dataset.cc new file mode 100644 index 0000000000..b7f0d95767 --- /dev/null +++ b/mindspore/lite/example/train_lenet/src/dataset.cc @@ -0,0 +1,200 @@ +/** + * Copyright 2020 Huawei Technologies Co., Ltd + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +#include "src/dataset.h" +#include +#include +#include +#include +#include +#include +#include + +using LabelId = std::map; + +char *ReadFile(const std::string &file, size_t *size) { + assert(size != nullptr); + std::string realPath(file); + std::ifstream ifs(realPath); + if (!ifs.good()) { + std::cerr << "file: " << realPath << " does not exist"; + return nullptr; + } + + if (!ifs.is_open()) { + std::cerr << "file: " << realPath << " open failed"; + return nullptr; + } + + ifs.seekg(0, std::ios::end); + *size = ifs.tellg(); + std::unique_ptr buf(new (std::nothrow) char[*size]); + if (buf == nullptr) { + std::cerr << "malloc buf failed, file: " << realPath; + ifs.close(); + return nullptr; + } + + ifs.seekg(0, std::ios::beg); + ifs.read(buf.get(), *size); + ifs.close(); + + return buf.release(); +} + +DataSet::~DataSet() { + for (auto itr = train_data_.begin(); itr != train_data_.end(); ++itr) { + auto ptr = std::get<0>(*itr); + delete[] ptr; + } + for (auto itr = test_data_.begin(); itr != test_data_.end(); ++itr) { + auto ptr = std::get<0>(*itr); + delete[] ptr; + } +} + +int DataSet::Init(const std::string &data_base_directory, database_type type) { + InitializeMNISTDatabase(data_base_directory); + return 0; +} + +void DataSet::InitializeMNISTDatabase(std::string dpath) { + // int total_data = 0; + num_of_classes_ = 10; + // total_data += + ReadMNISTFile(dpath + "/train/train-images-idx3-ubyte", dpath + "/train/train-labels-idx1-ubyte", &train_data_); + // total_data += + ReadMNISTFile(dpath + "/test/t10k-images-idx3-ubyte", dpath + "/test/t10k-labels-idx1-ubyte", &test_data_); +} + +int DataSet::ReadMNISTFile(const std::string &ifile_name, const std::string &lfile_name, + std::vector *dataset) { + std::ifstream lfile(lfile_name, std::ios::binary); + if (!lfile.is_open()) { + std::cerr << "Cannot open label file " << lfile_name << std::endl; + return 0; + } + + std::ifstream ifile(ifile_name, std::ios::binary); + if (!ifile.is_open()) { + std::cerr << "Cannot open data file " << ifile_name << std::endl; + return 0; + } + + int magic_number = 0; + lfile.read(reinterpret_cast(&magic_number), sizeof(magic_number)); + magic_number = ntohl(magic_number); + if (magic_number != 2049) { + std::cout << "Invalid MNIST label file!" << std::endl; + return 0; + } + + int number_of_labels = 0; + lfile.read(reinterpret_cast(&number_of_labels), sizeof(number_of_labels)); + number_of_labels = ntohl(number_of_labels); + + ifile.read(reinterpret_cast(&magic_number), sizeof(magic_number)); + magic_number = ntohl(magic_number); + if (magic_number != 2051) { + std::cout << "Invalid MNIST image file!" << std::endl; + return 0; + } + + int number_of_images = 0; + ifile.read(reinterpret_cast(&number_of_images), sizeof(number_of_images)); + number_of_images = ntohl(number_of_images); + + int n_rows = 0; + ifile.read(reinterpret_cast(&n_rows), sizeof(n_rows)); + n_rows = ntohl(n_rows); + + int n_cols = 0; + ifile.read(reinterpret_cast(&n_cols), sizeof(n_cols)); + n_cols = ntohl(n_cols); + + if (number_of_labels != number_of_images) { + std::cout << "number of records in labels and images files does not match" << std::endl; + return 0; + } + + int image_size = n_rows * n_cols; + unsigned char labels[number_of_labels]; + unsigned char data[image_size]; + lfile.read(reinterpret_cast(labels), number_of_labels); + + for (int i = 0; i < number_of_labels; ++i) { + std::unique_ptr hwc_bin_image(new (std::nothrow) float[32 * 32]); + ifile.read(reinterpret_cast(data), image_size); + + for (size_t r = 0; r < 32; r++) { + for (size_t c = 0; c < 32; c++) { + if (r < 2 || r > 29 || c < 2 || c > 29) + hwc_bin_image[r * 32 + c] = 0.0; + else + hwc_bin_image[r * 32 + c] = (static_cast(data[(r - 2) * 28 + (c - 2)])) / 255.0; + } + } + DataLabelTuple data_entry = std::make_tuple(reinterpret_cast(hwc_bin_image.release()), labels[i]); + dataset->push_back(data_entry); + } + return number_of_labels; +} + +std::vector DataSet::ReadFileList(std::string dpath) { + std::vector vec; + std::ifstream ifs(dpath + "/file_list.txt"); + std::string file_name; + if (ifs.is_open()) { + int label; + while (!ifs.eof()) { + ifs >> label >> file_name; + vec.push_back(make_tuple(label, file_name)); + } + } + return vec; +} + +std::vector DataSet::ReadDir(const std::string dpath) { + std::filesystem::directory_iterator dir(dpath); + std::vector vec; + LabelId label_id; + int class_id = 0; + int class_label; + for (const auto p : dir) { + if (p.is_directory()) { + std::string path = p.path().stem().string(); + auto label = label_id.find(path); + if (label == label_id.end()) { + label_id[path] = class_id; + class_label = class_id; + class_id++; + num_of_classes_ = class_id; + } else { + class_label = label->second; + } + std::filesystem::directory_iterator ndir(dpath + "/" + path); + for (const auto np : ndir) { + if (np.path().extension().string() == ".bin") { + std::string entry = + dpath + "/" + np.path().parent_path().stem().string() + "/" + np.path().filename().string(); + FileTuple ft = make_tuple(class_label, entry); + vec.push_back(ft); + } + } + } + } + return vec; +} diff --git a/mindspore/lite/example/train_lenet/src/dataset.h b/mindspore/lite/example/train_lenet/src/dataset.h new file mode 100644 index 0000000000..e409c9a933 --- /dev/null +++ b/mindspore/lite/example/train_lenet/src/dataset.h @@ -0,0 +1,56 @@ +/** + * Copyright 2020 Huawei Technologies Co., Ltd + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +#ifndef MODEL_ZOO_OFFICIAL_TOD_TRAIN_LENET_SRC_DATASET_H_ +#define MODEL_ZOO_OFFICIAL_TOD_TRAIN_LENET_SRC_DATASET_H_ + +#include +#include +#include + +using DataLabelTuple = std::tuple; +using FileTuple = std::tuple; + +enum database_type { DS_CIFAR10_BINARY = 0, DS_MNIST_BINARY, DS_OTHER }; + +char *ReadFile(const std::string &file, size_t *size); // utility function + +class DataSet { + public: + DataSet() {} + ~DataSet(); + + int Init(const std::string &data_base_directory, database_type type = DS_OTHER); + + const std::vector &train_data() const { return train_data_; } + const std::vector &test_data() const { return test_data_; } + unsigned int num_of_classes() { return num_of_classes_; } + void set_expected_data_size(unsigned int expected_data_size) { expected_data_size_ = expected_data_size; } + unsigned int expected_data_size() { return expected_data_size_; } + + private: + std::vector ReadFileList(std::string dpath); + std::vector ReadDir(const std::string dpath); + int ReadMNISTFile(const std::string &ifile, const std::string &lfile, std::vector *dataset); + void InitializeMNISTDatabase(std::string dpath); + + std::vector train_data_; + std::vector test_data_; + unsigned int num_of_classes_ = 0; + unsigned int expected_data_size_ = 0; +}; + +#endif // MODEL_ZOO_OFFICIAL_TOD_TRAIN_LENET_SRC_DATASET_H_ diff --git a/mindspore/lite/example/train_lenet/src/net_runner.cc b/mindspore/lite/example/train_lenet/src/net_runner.cc new file mode 100644 index 0000000000..f3c64d5eb3 --- /dev/null +++ b/mindspore/lite/example/train_lenet/src/net_runner.cc @@ -0,0 +1,247 @@ +/** + * Copyright 2020 Huawei Technologies Co., Ltd + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +#include "src/net_runner.h" +#include +#include +#include +#include +#include "include/context.h" + +unsigned int NetRunner::seed_ = time(NULL); +// Definition of callback function after forwarding operator. +bool after_callback(const std::vector &after_inputs, + const std::vector &after_outputs, + const mindspore::CallBackParam &call_param) { + printf("%s\n", call_param.node_name.c_str()); + for (size_t i = 0; i < after_inputs.size(); i++) { + int num2p = (after_inputs.at(i)->ElementsNum()); + printf("in%zu(%d): ", i, num2p); + if (num2p > 10) num2p = 10; + if (after_inputs.at(i)->data_type() == mindspore::kNumberTypeInt32) { + auto d = reinterpret_cast(after_inputs.at(i)->MutableData()); + for (int j = 0; j < num2p; j++) printf("%d, ", d[j]); + } else { + auto d = reinterpret_cast(after_inputs.at(i)->MutableData()); + for (int j = 0; j < num2p; j++) printf("%f, ", d[j]); + } + printf("\n"); + } + for (size_t i = 0; i < after_outputs.size(); i++) { + auto d = reinterpret_cast(after_outputs.at(i)->MutableData()); + int num2p = (after_outputs.at(i)->ElementsNum()); + printf("ou%zu(%d): ", i, num2p); + if (num2p > 10) num2p = 10; + for (int j = 0; j < num2p; j++) printf("%f, ", d[j]); + printf("\n"); + } + return true; +} + +NetRunner::~NetRunner() { + if (session_ != nullptr) delete session_; +} + +void NetRunner::InitAndFigureInputs() { + mindspore::lite::Context context; + context.device_list_[0].device_info_.cpu_device_info_.cpu_bind_mode_ = mindspore::lite::NO_BIND; + context.thread_num_ = 1; + + session_ = mindspore::session::TrainSession::CreateSession(ms_file_, &context); + assert(nullptr != session_); + + auto inputs = session_->GetInputs(); + assert(inputs.size() > 1); + data_index_ = 0; + label_index_ = 1; + batch_size_ = inputs[data_index_]->shape()[0]; + data_size_ = inputs[data_index_]->Size() / batch_size_; // in bytes + if (verbose_) { + std::cout << "data size: " << data_size_ << std::endl << "batch size: " << batch_size_ << std::endl; + } +} + +mindspore::tensor::MSTensor *NetRunner::SearchOutputsForSize(size_t size) const { + auto outputs = session_->GetOutputs(); + for (auto it = outputs.begin(); it != outputs.end(); ++it) { + if (it->second->ElementsNum() == size) return it->second; + } + std::cout << "Model does not have an output tensor with size " << size << std::endl; + return nullptr; +} + +std::vector NetRunner::FillInputData(const std::vector &dataset, bool serially) const { + std::vector labels_vec; + static unsigned int idx = 1; + int total_size = dataset.size(); + + auto inputs = session_->GetInputs(); + char *input_data = reinterpret_cast(inputs.at(data_index_)->MutableData()); + auto labels = reinterpret_cast(inputs.at(label_index_)->MutableData()); + assert(total_size > 0); + assert(input_data != nullptr); + std::fill(labels, labels + inputs.at(label_index_)->ElementsNum(), 0.f); + for (int i = 0; i < batch_size_; i++) { + if (serially) { + idx = ++idx % total_size; + } else { + idx = rand_r(&seed_) % total_size; + } + int label = 0; + char *data = nullptr; + std::tie(data, label) = dataset[idx]; + memcpy(input_data + i * data_size_, data, data_size_); + labels[i * num_of_classes_ + label] = 1.0; // Model expects labels in onehot representation + labels_vec.push_back(label); + } + + return labels_vec; +} + +float NetRunner::CalculateAccuracy(int max_tests) const { + float accuracy = 0.0; + const std::vector test_set = ds_.test_data(); + int tests = test_set.size() / batch_size_; + if (max_tests != -1 && tests < max_tests) tests = max_tests; + + session_->Eval(); + for (int i = 0; i < tests; i++) { + auto labels = FillInputData(test_set, (max_tests == -1)); + session_->RunGraph(); + auto outputsv = SearchOutputsForSize(batch_size_ * num_of_classes_); + assert(outputsv != nullptr); + auto scores = reinterpret_cast(outputsv->MutableData()); + for (int b = 0; b < batch_size_; b++) { + int max_idx = 0; + float max_score = scores[num_of_classes_ * b]; + for (int c = 0; c < num_of_classes_; c++) { + if (scores[num_of_classes_ * b + c] > max_score) { + max_score = scores[num_of_classes_ * b + c]; + max_idx = c; + } + } + if (labels[b] == max_idx) accuracy += 1.0; + } + } + session_->Train(); + accuracy /= static_cast(batch_size_ * tests); + return accuracy; +} + +int NetRunner::InitDB() { + if (data_size_ != 0) ds_.set_expected_data_size(data_size_); + int ret = ds_.Init(data_dir_, DS_MNIST_BINARY); + num_of_classes_ = ds_.num_of_classes(); + if (ds_.test_data().size() == 0) { + std::cout << "No relevant data was found in " << data_dir_ << std::endl; + assert(ds_.test_data().size() != 0); + } + + return ret; +} + +float NetRunner::GetLoss() const { + auto outputsv = SearchOutputsForSize(1); // Search for Loss which is a single value tensor + assert(outputsv != nullptr); + auto loss = reinterpret_cast(outputsv->MutableData()); + return loss[0]; +} + +int NetRunner::TrainLoop() { + session_->Train(); + float min_loss = 1000.; + float max_acc = 0.; + for (int i = 0; i < cycles_; i++) { + FillInputData(ds_.train_data()); + session_->RunGraph(nullptr, verbose_ ? after_callback : nullptr); + float loss = GetLoss(); + if (min_loss > loss) min_loss = loss; + + if (save_checkpoint_ != 0 && (i + 1) % save_checkpoint_ == 0) { + auto cpkt_fn = ms_file_.substr(0, ms_file_.find_last_of('.')) + "_trained_" + std::to_string(i + 1) + ".ms"; + session_->SaveToFile(cpkt_fn); + } + + if ((i + 1) % 100 == 0) { + float acc = CalculateAccuracy(10); + if (max_acc < acc) max_acc = acc; + std::cout << i + 1 << ":\tLoss is " << std::setw(7) << loss << " [min=" << min_loss << "] " + << " max_acc=" << max_acc << std::endl; + } + } + return 0; +} + +int NetRunner::Main() { + InitAndFigureInputs(); + + InitDB(); + + TrainLoop(); + + float acc = CalculateAccuracy(); + std::cout << "accuracy = " << acc << std::endl; + + if (cycles_ > 0) { + auto trained_fn = ms_file_.substr(0, ms_file_.find_last_of('.')) + "_trained_" + std::to_string(cycles_) + ".ms"; + session_->SaveToFile(trained_fn); + } + return 0; +} + +void NetRunner::Usage() { + std::cout << "Usage: net_runner -f <.ms model file> -d [-c ] " + << "[-v (verbose mode)] [-s ]" << std::endl; +} + +bool NetRunner::ReadArgs(int argc, char *argv[]) { + int opt; + while ((opt = getopt(argc, argv, "f:e:d:s:ihc:v")) != -1) { + switch (opt) { + case 'f': + ms_file_ = std::string(optarg); + break; + case 'e': + cycles_ = atoi(optarg); + break; + case 'd': + data_dir_ = std::string(optarg); + break; + case 'v': + verbose_ = true; + break; + case 's': + save_checkpoint_ = atoi(optarg); + break; + case 'h': + default: + Usage(); + return false; + } + } + return true; +} + +int main(int argc, char **argv) { + NetRunner nr; + + if (nr.ReadArgs(argc, argv)) { + nr.Main(); + } else { + return -1; + } + return 0; +} diff --git a/mindspore/lite/example/train_lenet/src/net_runner.h b/mindspore/lite/example/train_lenet/src/net_runner.h new file mode 100644 index 0000000000..b6d743d87a --- /dev/null +++ b/mindspore/lite/example/train_lenet/src/net_runner.h @@ -0,0 +1,61 @@ +/** + * Copyright 2020 Huawei Technologies Co., Ltd + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +#ifndef MODEL_ZOO_OFFICIAL_TOD_TRAIN_LENET_SRC_NET_RUNNER_H_ +#define MODEL_ZOO_OFFICIAL_TOD_TRAIN_LENET_SRC_NET_RUNNER_H_ + +#include +#include +#include +#include +#include +#include "include/train_session.h" +#include "include/ms_tensor.h" +#include "src/dataset.h" + +class NetRunner { + public: + int Main(); + bool ReadArgs(int argc, char *argv[]); + ~NetRunner(); + + private: + void Usage(); + void InitAndFigureInputs(); + int InitDB(); + int TrainLoop(); + std::vector FillInputData(const std::vector &dataset, bool is_train_set = false) const; + float CalculateAccuracy(int max_tests = -1) const; + float GetLoss() const; + mindspore::tensor::MSTensor *SearchOutputsForSize(size_t size) const; + + DataSet ds_; + mindspore::session::TrainSession *session_ = nullptr; + + std::string ms_file_ = ""; + std::string data_dir_ = ""; + size_t data_size_ = 0; + size_t batch_size_ = 0; + unsigned int cycles_ = 100; + int data_index_ = 0; + int label_index_ = -1; + int num_of_classes_ = 0; + bool verbose_ = false; + int save_checkpoint_ = 0; + static unsigned int seed_; +}; + +#endif // MODEL_ZOO_OFFICIAL_TOD_TRAIN_LENET_SRC_NET_RUNNER_H_