From: @ling_qiao_min
Reviewed-by: @zhanghaibo5,@hangangqiang
Signed-off-by: @hangangqiang
pull/14593/MERGE
mindspore-ci-bot 4 years ago committed by Gitee
commit 18dfbc9fdc

@ -64,8 +64,8 @@ where:
- DATASET_PATH is the path to the [dataset](#dataset),
- MINDSPORE_DOCKER is the image name of the docker that runs [MindSpore](#environment-requirements). If not provided MindSpore will be run locally
- REALEASE.tar.gz is a pointer to the MindSpore ToD release tar ball. If not provided, the script will attempt to find MindSpore ToD compilation output
- target is defaulted to arm64, i.e., on-device. If x86 is provided, the demo will be run locally. Note that infrastructure is not optimized for running on x86. Also, note that user needs to call "make clean" when switching betweeen targets.
- RELEASE.tar.gz is a pointer to the MindSpore ToD release tar ball. If not provided, the script will attempt to find MindSpore ToD compilation output
- target is defaulted to arm64, i.e., on-device. If x86 is provided, the demo will be run locally. Note that infrastructure is not optimized for running on x86. Also, note that user needs to call "make clean" when switching between targets.
# Script Detailed Description
@ -75,11 +75,11 @@ The provided `prepare_and_run.sh` script is performing the followings:
- Prepare the folder that should be pushed into the device
- Copy this folder into the device and run the scripts on the device
See how to run the script and paramaters definitions in the [Quick Start Section](#quick-start)
See how to run the script and parameters definitions in the [Quick Start Section](#quick-start)
## Preparing the model
Within the model folder a `prepare_model.sh` script uses MindSpore infrastructure to export the model into a `.mindir` file. The user can specify a docker image on which MindSpore is installed. Otherwise, the pyhton script will be run locally.
Within the model folder a `prepare_model.sh` script uses MindSpore infrastructure to export the model into a `.mindir` file. The user can specify a docker image on which MindSpore is installed. Otherwise, the python script will be run locally.
The script then converts the `.mindir` to a `.ms` format using the MindSpore ToD converter.
The script accepts a tar ball where the converter resides. Otherwise, the script will attempt to find the converter in the MindSpore ToD build output directory.

@ -67,28 +67,28 @@ where:
- DATASET_PATH is the path to the [dataset](#dataset),
- MINDSPORE_DOCKER is the image name of the docker that runs [MindSpore](#environment-requirements). If not provided MindSpore will be run locally
- REALEASE.tar.gz is a pointer to the MindSpore ToD release tar ball. If not provided, the script will attempt to find MindSpore ToD compilation output
- target is defaulted to arm64, i.e., on-device. If x86 is provided, the demo will be run locally. Note that infrastructure is not optimized for running on x86. Also, note that user needs to call "make clean" when switching betweeen targets.
- RELEASE.tar.gz is a pointer to the MindSpore ToD release tar ball. If not provided, the script will attempt to find MindSpore ToD compilation output
- target is defaulted to arm64, i.e., on-device. If x86 is provided, the demo will be run locally. Note that infrastructure is not optimized for running on x86. Also, note that user needs to call "make clean" when switching between targets.
# Script Detailed Description
The provided `prepare_and_run.sh` script is performing the followings:
- Prepare the trainable transsfer learning model in a `.ms` format
- Prepare the trainable transfer learning model in a `.ms` format
- Prepare the folder that should be pushed into the device
- Copy this folder into the device and run the scripts on the device
See how to run the script and paramaters definitions in the [Quick Start Section](#quick-start)
See how to run the script and parameters definitions in the [Quick Start Section](#quick-start)
## Preparing the model
Within the model folder a `prepare_model.sh` script uses MindSpore infrastructure to export the model into a `.mindir` file. The user can specify a docker image on which MindSpore is installed. Otherwise, the pyhton script will be run locally. As explained above, the head of the network is pretrained and a `.ckpt` file should be loaded to the head network. In the first time the script is run, it attempts to download the `.ckpt` file using `wget` command.
Within the model folder a `prepare_model.sh` script uses MindSpore infrastructure to export the model into a `.mindir` file. The user can specify a docker image on which MindSpore is installed. Otherwise, the python script will be run locally. As explained above, the head of the network is pre-trained and a `.ckpt` file should be loaded to the head network. In the first time the script is run, it attempts to download the `.ckpt` file using `wget` command.
The script then converts the `.mindir` to a `.ms` format using the MindSpore ToD converter.
The script accepts a tar ball where the converter resides. Otherwise, the script will attempt to find the converter in the MindSpore ToD build output directory.
## Preparing the Folder
The `transfer_learning_tod.ms` model file is then copied into the `package` folder as well as scripts, the MindSpore ToD library and a subset of the Places dataset. This dataset undergoes preprocessing on the server prior to the packaging.
The `transfer_learning_tod.ms` model file is then copied into the `package` folder as well as scripts, the MindSpore ToD library and a subset of the Places dataset. This dataset undergoes pre-processing on the server prior to the packaging.
Finally, the code (in src) is compiled for the target and the binary is copied into the `package` folder.
### Running the code on the device

@ -10,7 +10,7 @@
// void ConvDwFp32Avx3x3(float *output, float **input, const float *weights, const float *bias, size_t channels, size_t output_width,
// size_t input_stride, size_t relum, szie_t relu6)
// in linux x64 platfrom:
// in linux x64 platform:
// rdi: output
// rsi: input
// rdx: weights
@ -21,7 +21,7 @@
// 16: relu
// 24: relu6
// in win x64 platfrom: "shadow space" needs to be opened up for first four parameters ==> 32 bites
// in win x64 platform: "shadow space" needs to be opened up for first four parameters ==> 32 bites
// rcx: output
// rdx: input
// r8: weights

@ -22,7 +22,7 @@
// 24: stride
// 32: writeNhwc/writeWino
// parameters pass in win x64 platfrom: "shadow space" needs to be opened up for first four parameters ==> 32 bites
// parameters pass in win x64 platform: "shadow space" needs to be opened up for first four parameters ==> 32 bites
// rcx: a
// rdx: b
// r8: c

@ -41,12 +41,8 @@ int PackDeConvWgDataFp32(const float *nhwc_weight, DeConvComputeUnit *unit, cons
if (unit->use_winograd_) {
/* Generate winograd */
float matrix_g[64];
float matrix_gt[64];
float matrix_a[64];
float matrix_at[64];
float matrix_b[64];
float matrix_bt[64];
float matrix_g[64], matrix_a[64], matrix_b[64];
float matrix_gt[64], matrix_at[64], matrix_bt[64];
int ret = CookToomFilter(matrix_a, matrix_at, matrix_b, matrix_bt, matrix_g, matrix_gt, 0.5f,
DECONV_WINOGRAD_DEFAULT_UNIT, unit->h_size_);
if (ret != NNACL_OK) {
@ -189,6 +185,67 @@ void TiledC4MatmulFp32(float *dst, const float *src, const float *weight, size_t
}
#endif
#ifdef ENABLE_ARM32
void DeConvWgMergeArm32(const float *src_ptr, float *dst_ptr, size_t src_step, size_t dst_step) {
asm volatile(
"mov r11, %[src_ptr]\n"
"mov r8, %[dst_ptr]\n"
"mov r10, r8\n"
"vld1.32 {q0}, [r11], %[src_step]\n"
"vld1.32 {q1}, [r8], %[dst_step]\n"
"vld1.32 {q2}, [r11], %[src_step]\n"
"vld1.32 {q3}, [r8], %[dst_step]\n"
"vadd.f32 q0, q0, q1\n"
"vld1.32 {q8}, [r11], %[src_step]\n"
"vadd.f32 q2, q2, q3\n"
"vst1.32 {q0}, [r10], %[dst_step]\n"
"vst1.32 {q2}, [r10], %[dst_step]\n"
"vld1.32 {q9}, [r8], %[dst_step]\n"
"vld1.32 {q10}, [r11], %[src_step]\n"
"vadd.f32 q8, q8, q9\n"
"vld1.32 {q11}, [r8], %[dst_step]\n"
"vadd.f32 q10, q10, q11\n"
"vld1.32 {q0}, [r11], %[src_step]\n"
"vst1.32 {q8}, [r10], %[dst_step]\n"
"vst1.32 {q10}, [r10], %[dst_step]\n"
"vld1.32 {q1}, [r8], %[dst_step]\n"
"vld1.32 {q2}, [r11], %[src_step]\n"
"vld1.32 {q3}, [r8], %[dst_step]\n"
"vadd.f32 q0, q0, q1\n"
"vadd.f32 q2, q2, q3\n"
"vst1.32 {q0}, [r10], %[dst_step]\n"
"vst1.32 {q2}, [r10], %[dst_step]\n"
"vld1.32 {q8}, [r11], %[src_step]\n"
"vld1.32 {q9}, [r8], %[dst_step]\n"
"vld1.32 {q10}, [r11], %[src_step]\n"
"vld1.32 {q11}, [r8], %[dst_step]\n"
"vadd.f32 q8, q8, q9\n"
"vadd.f32 q10, q10, q11\n"
"vst1.32 {q8}, [r10], %[dst_step]\n"
"vst1.32 {q10}, [r10], %[dst_step]\n"
:
: [ src_ptr ] "r"(src_ptr), [ dst_ptr ] "r"(dst_ptr), [ src_step ] "r"(src_step), [ dst_step ] "r"(dst_step)
: "r8", "r10", "r11", "q0", "q1", "q2", "q3", "q8", "q9", "q10", "q11");
return;
}
#endif
void DeConvWgMerge(const float *src, float *dst, size_t src_stride, size_t dst_stride, size_t count) {
const float *src_ptr = src;
float *dst_ptr = dst;
@ -257,61 +314,7 @@ void DeConvWgMerge(const float *src, float *dst, size_t src_stride, size_t dst_s
#elif ENABLE_ARM32
size_t src_step = src_stride * sizeof(float);
size_t dst_step = dst_stride * sizeof(float);
asm volatile(
"mov r11, %[src_ptr]\n"
"mov r8, %[dst_ptr]\n"
"mov r10, r8\n"
"vld1.32 {q0}, [r11], %[src_step]\n"
"vld1.32 {q1}, [r8], %[dst_step]\n"
"vld1.32 {q2}, [r11], %[src_step]\n"
"vld1.32 {q3}, [r8], %[dst_step]\n"
"vadd.f32 q0, q0, q1\n"
"vld1.32 {q8}, [r11], %[src_step]\n"
"vadd.f32 q2, q2, q3\n"
"vst1.32 {q0}, [r10], %[dst_step]\n"
"vst1.32 {q2}, [r10], %[dst_step]\n"
"vld1.32 {q9}, [r8], %[dst_step]\n"
"vld1.32 {q10}, [r11], %[src_step]\n"
"vadd.f32 q8, q8, q9\n"
"vld1.32 {q11}, [r8], %[dst_step]\n"
"vadd.f32 q10, q10, q11\n"
"vld1.32 {q0}, [r11], %[src_step]\n"
"vst1.32 {q8}, [r10], %[dst_step]\n"
"vst1.32 {q10}, [r10], %[dst_step]\n"
"vld1.32 {q1}, [r8], %[dst_step]\n"
"vld1.32 {q2}, [r11], %[src_step]\n"
"vld1.32 {q3}, [r8], %[dst_step]\n"
"vadd.f32 q0, q0, q1\n"
"vadd.f32 q2, q2, q3\n"
"vst1.32 {q0}, [r10], %[dst_step]\n"
"vst1.32 {q2}, [r10], %[dst_step]\n"
"vld1.32 {q8}, [r11], %[src_step]\n"
"vld1.32 {q9}, [r8], %[dst_step]\n"
"vld1.32 {q10}, [r11], %[src_step]\n"
"vld1.32 {q11}, [r8], %[dst_step]\n"
"vadd.f32 q8, q8, q9\n"
"vadd.f32 q10, q10, q11\n"
"vst1.32 {q8}, [r10], %[dst_step]\n"
"vst1.32 {q10}, [r10], %[dst_step]\n"
:
: [ src_ptr ] "r"(src_ptr), [ dst_ptr ] "r"(dst_ptr), [ src_step ] "r"(src_step), [ dst_step ] "r"(dst_step)
: "r8", "r10", "r11", "q0", "q1", "q2", "q3", "q8", "q9", "q10", "q11");
DeConvWgMergeArm32(src_ptr, dst_ptr, src_step, dst_step);
#else
for (int j = 0; j < 8; j++) {
const float *s = src_ptr + j * src_stride;
@ -342,16 +345,16 @@ void DeConvWgMerge(const float *src, float *dst, size_t src_stride, size_t dst_s
}
void DeConvWgCalWgFp32(const float *tile_in, float *tile_out, const float *weight_buf, float *tmp_buf,
const float *at_buf, float *a_mid_buf, float *trans_a_buf, bool *transfered, const float *bt_buf,
float *b_tmp_buf, int unit_size, int w_start, int h_start, const ConvParameter *conv_param,
const DeConvParam *deconv_param) {
const float *at_buf, float *a_mid_buf, float *trans_a_buf, bool *transferred,
const float *bt_buf, float *b_tmp_buf, int unit_size, int w_start, int h_start,
const ConvParameter *conv_param, const DeConvParam *deconv_param) {
int winograd_plane = unit_size * unit_size;
if (!transfered[unit_size]) {
if (!transferred[unit_size]) {
WinogradTransLeft(tile_in, at_buf, a_mid_buf, DECONV_WINOGRAD_DEFAULT_UNIT, unit_size, DECONV_WINOGRAD_DEFAULT_UNIT,
deconv_param->ic_div4_ * DECONV_WINOGRAD_DEFAULT_TILE);
WinogradTransRight(a_mid_buf, at_buf, trans_a_buf, unit_size, unit_size, DECONV_WINOGRAD_DEFAULT_UNIT,
deconv_param->ic_div4_ * DECONV_WINOGRAD_DEFAULT_TILE);
transfered[unit_size] = true;
transferred[unit_size] = true;
}
for (int index = 0; index < winograd_plane; index++) {
@ -449,7 +452,7 @@ void DeconvWg(const float *nhwc_input_, float *tile_in, float *tile_out, int sta
}
/* compute */
bool transfered[DECONV_WINOGRAD_BUFFER_COUNT] = {false};
bool transferred[DECONV_WINOGRAD_BUFFER_COUNT] = {false};
for (int i = 0; i < deconv_param->compute_size_; i++) {
DeConvComputeUnit *unit = &deconv_param->compute_units_[i];
if (unit->use_winograd_) {
@ -465,7 +468,7 @@ void DeconvWg(const float *nhwc_input_, float *tile_in, float *tile_out, int sta
float *tmp_b_buf = (float *)unit->winograd_.b_buffer_ + task_id * unit->winograd_.kh_ * unit->winograd_.kw_ *
deconv_param->oc_up4_ * DECONV_WINOGRAD_DEFAULT_TILE;
DeConvWgCalWgFp32(tile_in, tile_out, (float *)unit->weight_, tmp_buf, unit->winograd_.AT_, wg_mid_a_buf,
wg_dst_a_buf, transfered, unit->winograd_.BT_, tmp_b_buf, unit->winograd_.kh_, unit->w_start_,
wg_dst_a_buf, transferred, unit->winograd_.BT_, tmp_b_buf, unit->winograd_.kh_, unit->w_start_,
unit->h_start_, conv_param, deconv_param);
} else {
float *tmp_buf = (float *)unit->tmp_buffer_ + task_id * deconv_param->oc_div4_ * unit->w_size_ * unit->h_size_ *

@ -31,8 +31,8 @@ void ScaleInner(const float *in_data, float *out_data, const float *scale, const
float32x4_t data = vld1q_f32(in_data + in_offset);
float32x4_t scale_4 = vdupq_n_f32(scale[i]);
float32x4_t offset_4 = vdupq_n_f32(offset[i]);
float32x4_t reslut = vfmaq_f32(offset_4, data, scale_4);
vst1q_f32(out_data + in_offset, reslut);
float32x4_t result = vfmaq_f32(offset_4, data, scale_4);
vst1q_f32(out_data + in_offset, result);
}
#endif
for (; in_index < inner_size; in_index++) {
@ -54,8 +54,8 @@ void ScaleAxis(const float *in_data, float *out_data, const float *scale, const
float32x4_t data = vld1q_f32(in_data + in_offset);
float32x4_t scale_4 = vld1q_f32(scale + index);
float32x4_t offset_4 = vld1q_f32(offset + index);
float32x4_t reslut = vfmaq_f32(offset_4, data, scale_4);
vst1q_f32(out_data + in_offset, reslut);
float32x4_t result = vfmaq_f32(offset_4, data, scale_4);
vst1q_f32(out_data + in_offset, result);
}
#endif
for (; index < axis_size; index++) {

@ -130,11 +130,11 @@ int SaturatingRoundingMultiplyByPOT(int32_t x, int exponent) {
const int min = INT32_MIN;
const int max = INT32_MAX;
const int scalar_int_bits = 8 * sizeof(int32_t);
const int thresold = ((1 << (uint32_t)(scalar_int_bits - 1 - exponent)) - 1);
const int postive_mask = x > thresold ? BitNot(0) : 0;
const int negative_mask = x < -thresold ? BitNot(0) : 0;
const int threshold = ((1 << (uint32_t)(scalar_int_bits - 1 - exponent)) - 1);
const int positive_mask = x > threshold ? BitNot(0) : 0;
const int negative_mask = x < -threshold ? BitNot(0) : 0;
int result = x * ((int32_t)(1) << (uint32_t)exponent);
result = BitsSelect(postive_mask, max, result);
result = BitsSelect(positive_mask, max, result);
result = BitsSelect(negative_mask, min, result);
return result;
} else if (exponent < 0) {

@ -37,7 +37,7 @@ int SaturatingRoundingDoublingHighMul(int a, int b);
int16_t SaturatingRoundingDoublingHighMulInt16(int16_t a, int16_t b);
// division by a 2^exponent with rounding
// or arithmetic right shift with rouding
// or arithmetic right shift with rounding
int RoundingDivideByPOT(int x, int exponent);
int UpwardRounding(int x, int exponent);

@ -53,7 +53,7 @@ void QuantizeRoundParameterWithSinglePrecision(double double_multiplier, int32_t
int *right_shift) {
int shift = 0;
const uint32_t scale_bits = (uint32_t)(double_multiplier);
/* multipiler is in[0x40000000, 0x7FFFFF80] range */
/* multiplier is in[0x40000000, 0x7FFFFF80] range */
*quantized_multiplier = (int32_t)(((scale_bits & UINT32_C(0x007FFFFF)) | UINT32_C(0x00800000)) << 7);
if (quantized_multiplier[0] < INT32_C(0x40000000) || quantized_multiplier[0] > INT32_C(0x7FFFFF80)) {
return;

@ -186,7 +186,7 @@ int BatchNormOpenCLKernel::Run() {
int arg_cn = 0;
ocl_runtime_->SetKernelArg(kernel_, arg_cn++, in_tensors_.at(0)->data_c()); // input tensor
ocl_runtime_->SetKernelArg(kernel_, arg_cn++, scale_, lite::opencl::MemType::BUF); // scale
ocl_runtime_->SetKernelArg(kernel_, arg_cn++, offset_, lite::opencl::MemType::BUF); // offest
ocl_runtime_->SetKernelArg(kernel_, arg_cn++, offset_, lite::opencl::MemType::BUF); // offset
ocl_runtime_->SetKernelArg(kernel_, arg_cn++, mean_, lite::opencl::MemType::BUF); // mean
ocl_runtime_->SetKernelArg(kernel_, arg_cn++, variance_, lite::opencl::MemType::BUF); // variance
ocl_runtime_->SetKernelArg(kernel_, arg_cn++, out_tensors_.at(0)->data_c()); // out tensor

@ -57,7 +57,7 @@ class NoSupportOp {
void PrintOps() const {
if (!noSupportOps.empty()) {
MS_LOG(ERROR) << "===========================================";
MS_LOG(ERROR) << "UNSUPPORT OP LIST:";
MS_LOG(ERROR) << "UNSUPPORTED OP LIST:";
for (auto &op_name : noSupportOps) {
MS_LOG(ERROR) << "FMKTYPE: " << fmkType << ", OP TYPE: " << op_name;
}

@ -439,7 +439,7 @@ message TransformationParameter {
optional uint32 crop_size = 3 [default = 0];
// mean_file and mean_value cannot be specified at the same time
optional string mean_file = 4;
// if specified can be repeated once (would substract it from all the channels)
// if specified can be repeated once (would subtract it from all the channels)
// or can be repeated the same number of times as channels
// (would subtract them from the corresponding channel)
repeated float mean_value = 5;
@ -1499,10 +1499,10 @@ message PriorBoxParameter {
optional bool flip = 4 [default = true];
// If true, will clip the prior so that it is within [0, 1]
optional bool clip = 5 [default = false];
// Variance for adjusting the prior bboxes.
// Variance for adjusting the prior boxes.
repeated float variance = 6;
// By default, we calculate img_height, img_width, step_x, step_y based on
// bottom[0] (feat) and bottom[1] (img). Unless these values are explicitely
// bottom[0] (feat) and bottom[1] (img). Unless these values are explicitly
// provided.
// Explicitly provide the img_size.
optional uint32 img_size = 7;

@ -134,10 +134,10 @@ message AttributeProto {
// The type field MUST be present for this version of the IR.
// For 0.0.1 versions of the IR, this field was not defined, and
// implementations needed to use has_field hueristics to determine
// implementations needed to use has_field heuristics to determine
// which value field was in use. For IR_VERSION 0.0.2 or later, this
// field MUST be set and match the f|i|s|t|... field in use. This
// change was made to accomodate proto3 implementations.
// change was made to accommodate proto3 implementations.
optional AttributeType type = 20; // discriminator that indicates which field below is in use
// Exactly ONE of the following fields must be present for this version of the IR

Loading…
Cancel
Save