Added linter fix for album dataset Added testDataset Adding signature Added JsonDataset example API Example dataset Resolving format More fixing Refactor Small fix Added compiling album dataset Running tests Added linter fix #1 Passing UT Added dataset API Addressing clang Clang part 2 Fixing pass Fixed tree check lint fix Added lint fix part 2pull/4772/head
parent
e06dfaa80d
commit
c79db93c48
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
@ -0,0 +1,208 @@
|
|||||||
|
/**
|
||||||
|
* Copyright 2020 Huawei Technologies Co., Ltd
|
||||||
|
*
|
||||||
|
* Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
* you may not use this file except in compliance with the License.
|
||||||
|
* You may obtain a copy of the License at
|
||||||
|
*
|
||||||
|
* http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
*
|
||||||
|
* Unless required by applicable law or agreed to in writing, software
|
||||||
|
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
* See the License for the specific language governing permissions and
|
||||||
|
* limitations under the License.
|
||||||
|
*/
|
||||||
|
#include <fstream>
|
||||||
|
#include <iostream>
|
||||||
|
#include <memory>
|
||||||
|
#include <string>
|
||||||
|
#include "common/common.h"
|
||||||
|
#include "minddata/dataset/core/client.h"
|
||||||
|
#include "minddata/dataset/core/global_context.h"
|
||||||
|
#include "minddata/dataset/engine/datasetops/source/album_op.h"
|
||||||
|
#include "minddata/dataset/engine/datasetops/source/sampler/distributed_sampler.h"
|
||||||
|
#include "minddata/dataset/engine/datasetops/source/sampler/pk_sampler.h"
|
||||||
|
#include "minddata/dataset/engine/datasetops/source/sampler/random_sampler.h"
|
||||||
|
#include "minddata/dataset/engine/datasetops/source/sampler/sampler.h"
|
||||||
|
#include "minddata/dataset/engine/datasetops/source/sampler/sequential_sampler.h"
|
||||||
|
#include "minddata/dataset/engine/datasetops/source/sampler/subset_random_sampler.h"
|
||||||
|
#include "minddata/dataset/engine/datasetops/source/sampler/weighted_random_sampler.h"
|
||||||
|
#include "minddata/dataset/util/path.h"
|
||||||
|
#include "minddata/dataset/util/status.h"
|
||||||
|
#include "gtest/gtest.h"
|
||||||
|
#include "utils/log_adapter.h"
|
||||||
|
#include "securec.h"
|
||||||
|
#include "minddata/dataset/include/datasets.h"
|
||||||
|
#include "minddata/dataset/include/transforms.h"
|
||||||
|
|
||||||
|
using namespace mindspore::dataset;
|
||||||
|
using mindspore::MsLogLevel::ERROR;
|
||||||
|
using mindspore::ExceptionType::NoExceptionType;
|
||||||
|
using mindspore::LogStream;
|
||||||
|
|
||||||
|
std::shared_ptr<BatchOp> Batch(int batch_size = 1, bool drop = false, int rows_per_buf = 2);
|
||||||
|
|
||||||
|
std::shared_ptr<RepeatOp> Repeat(int repeat_cnt);
|
||||||
|
|
||||||
|
std::shared_ptr<ExecutionTree> Build(std::vector<std::shared_ptr<DatasetOp>> ops);
|
||||||
|
|
||||||
|
std::shared_ptr<AlbumOp> Album(int64_t num_works, int64_t rows, int64_t conns, std::string path,
|
||||||
|
bool shuf = false, std::unique_ptr<Sampler> sampler = nullptr,
|
||||||
|
bool decode = false) {
|
||||||
|
std::shared_ptr<AlbumOp> so;
|
||||||
|
AlbumOp::Builder builder;
|
||||||
|
Status rc = builder.SetNumWorkers(num_works)
|
||||||
|
.SetAlbumDir(path)
|
||||||
|
.SetRowsPerBuffer(rows)
|
||||||
|
.SetOpConnectorSize(conns)
|
||||||
|
.SetExtensions({".json"})
|
||||||
|
.SetSampler(std::move(sampler))
|
||||||
|
.SetDecode(decode)
|
||||||
|
.Build(&so);
|
||||||
|
return so;
|
||||||
|
}
|
||||||
|
|
||||||
|
std::shared_ptr<AlbumOp> AlbumSchema(int64_t num_works, int64_t rows, int64_t conns, std::string path,
|
||||||
|
std::string schema_file, std::vector<std::string> column_names = {},
|
||||||
|
bool shuf = false, std::unique_ptr<Sampler> sampler = nullptr,
|
||||||
|
bool decode = false) {
|
||||||
|
std::shared_ptr<AlbumOp> so;
|
||||||
|
AlbumOp::Builder builder;
|
||||||
|
Status rc = builder.SetNumWorkers(num_works)
|
||||||
|
.SetSchemaFile(schema_file)
|
||||||
|
.SetColumnsToLoad(column_names)
|
||||||
|
.SetAlbumDir(path)
|
||||||
|
.SetRowsPerBuffer(rows)
|
||||||
|
.SetOpConnectorSize(conns)
|
||||||
|
.SetExtensions({".json"})
|
||||||
|
.SetSampler(std::move(sampler))
|
||||||
|
.SetDecode(decode)
|
||||||
|
.Build(&so);
|
||||||
|
return so;
|
||||||
|
}
|
||||||
|
|
||||||
|
class MindDataTestAlbum : public UT::DatasetOpTesting {
|
||||||
|
protected:
|
||||||
|
};
|
||||||
|
|
||||||
|
TEST_F(MindDataTestAlbum, TestSequentialAlbumWithSchema) {
|
||||||
|
std::string folder_path = datasets_root_path_ + "/testAlbum/images";
|
||||||
|
std::string schema_file = datasets_root_path_ + "/testAlbum/datasetSchema.json";
|
||||||
|
std::vector<std::string> column_names = {"image", "label", "id"};
|
||||||
|
auto tree = Build({AlbumSchema(16, 2, 32, folder_path, schema_file, column_names, false), Repeat(2)});
|
||||||
|
tree->Prepare();
|
||||||
|
Status rc = tree->Launch();
|
||||||
|
if (rc.IsError()) {
|
||||||
|
MS_LOG(ERROR) << "Return code error detected during tree launch: " << ".";
|
||||||
|
EXPECT_TRUE(false);
|
||||||
|
} else {
|
||||||
|
DatasetIterator di(tree);
|
||||||
|
TensorMap tensor_map;
|
||||||
|
di.GetNextAsMap(&tensor_map);
|
||||||
|
EXPECT_TRUE(rc.IsOk());
|
||||||
|
uint64_t i = 0;
|
||||||
|
int32_t label = 0;
|
||||||
|
while (tensor_map.size() != 0) {
|
||||||
|
tensor_map["label"]->GetItemAt<int32_t>(&label, {});
|
||||||
|
MS_LOG(DEBUG) << "row: " << i << "\t" << tensor_map["image"]->shape() << "label:" << label << "label shape"
|
||||||
|
<< tensor_map["label"] << "\n";
|
||||||
|
i++;
|
||||||
|
di.GetNextAsMap(&tensor_map);
|
||||||
|
}
|
||||||
|
MS_LOG(INFO) << "got rows" << i << "\n";
|
||||||
|
EXPECT_TRUE(i == 14);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
TEST_F(MindDataTestAlbum, TestSequentialAlbumWithSchemaNoOrder) {
|
||||||
|
std::string folder_path = datasets_root_path_ + "/testAlbum/images";
|
||||||
|
std::string schema_file = datasets_root_path_ + "/testAlbum/datasetSchema.json";
|
||||||
|
auto tree = Build({AlbumSchema(16, 2, 32, folder_path, schema_file), Repeat(2)});
|
||||||
|
tree->Prepare();
|
||||||
|
Status rc = tree->Launch();
|
||||||
|
if (rc.IsError()) {
|
||||||
|
MS_LOG(ERROR) << "Return code error detected during tree launch: " << ".";
|
||||||
|
EXPECT_TRUE(false);
|
||||||
|
} else {
|
||||||
|
DatasetIterator di(tree);
|
||||||
|
TensorMap tensor_map;
|
||||||
|
di.GetNextAsMap(&tensor_map);
|
||||||
|
EXPECT_TRUE(rc.IsOk());
|
||||||
|
uint64_t i = 0;
|
||||||
|
int32_t label = 0;
|
||||||
|
while (tensor_map.size() != 0) {
|
||||||
|
tensor_map["label"]->GetItemAt<int32_t>(&label, {});
|
||||||
|
MS_LOG(DEBUG) << "row: " << i << "\t" << tensor_map["image"]->shape() << "label:" << label << "label shape"
|
||||||
|
<< tensor_map["label"] << "\n";
|
||||||
|
i++;
|
||||||
|
di.GetNextAsMap(&tensor_map);
|
||||||
|
}
|
||||||
|
MS_LOG(INFO) << "got rows" << i << "\n";
|
||||||
|
EXPECT_TRUE(i == 14);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
TEST_F(MindDataTestAlbum, TestSequentialAlbumWithSchemaFloat) {
|
||||||
|
std::string folder_path = datasets_root_path_ + "/testAlbum/images";
|
||||||
|
// add the priority column
|
||||||
|
std::string schema_file = datasets_root_path_ + "/testAlbum/floatSchema.json";
|
||||||
|
auto tree = Build({AlbumSchema(16, 2, 32, folder_path, schema_file), Repeat(2)});
|
||||||
|
tree->Prepare();
|
||||||
|
Status rc = tree->Launch();
|
||||||
|
if (rc.IsError()) {
|
||||||
|
MS_LOG(ERROR) << "Return code error detected during tree launch: " << ".";
|
||||||
|
EXPECT_TRUE(false);
|
||||||
|
} else {
|
||||||
|
DatasetIterator di(tree);
|
||||||
|
TensorMap tensor_map;
|
||||||
|
di.GetNextAsMap(&tensor_map);
|
||||||
|
EXPECT_TRUE(rc.IsOk());
|
||||||
|
uint64_t i = 0;
|
||||||
|
int32_t label = 0;
|
||||||
|
double priority = 0;
|
||||||
|
while (tensor_map.size() != 0) {
|
||||||
|
tensor_map["label"]->GetItemAt<int32_t>(&label, {});
|
||||||
|
tensor_map["_priority"]->GetItemAt<double>(&priority, {});
|
||||||
|
MS_LOG(DEBUG) << "row: " << i << "\t" << tensor_map["image"]->shape() << "label:" << label << "label shape"
|
||||||
|
<< tensor_map["label"] << "priority: " << priority << "\n";
|
||||||
|
i++;
|
||||||
|
di.GetNextAsMap(&tensor_map);
|
||||||
|
}
|
||||||
|
MS_LOG(INFO) << "got rows" << i << "\n";
|
||||||
|
EXPECT_TRUE(i == 14);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
TEST_F(MindDataTestAlbum, TestSequentialAlbumWithFullSchema) {
|
||||||
|
std::string folder_path = datasets_root_path_ + "/testAlbum/images";
|
||||||
|
// add the priority column
|
||||||
|
std::string schema_file = datasets_root_path_ + "/testAlbum/fullSchema.json";
|
||||||
|
auto tree = Build({AlbumSchema(16, 2, 32, folder_path, schema_file), Repeat(2)});
|
||||||
|
tree->Prepare();
|
||||||
|
Status rc = tree->Launch();
|
||||||
|
if (rc.IsError()) {
|
||||||
|
MS_LOG(ERROR) << "Return code error detected during tree launch: " << ".";
|
||||||
|
EXPECT_TRUE(false);
|
||||||
|
} else {
|
||||||
|
DatasetIterator di(tree);
|
||||||
|
TensorMap tensor_map;
|
||||||
|
di.GetNextAsMap(&tensor_map);
|
||||||
|
EXPECT_TRUE(rc.IsOk());
|
||||||
|
uint64_t i = 0;
|
||||||
|
int32_t label = 0;
|
||||||
|
double priority = 0;
|
||||||
|
while (tensor_map.size() != 0) {
|
||||||
|
tensor_map["label"]->GetItemAt<int32_t>(&label, {});
|
||||||
|
tensor_map["_priority"]->GetItemAt<double>(&priority, {});
|
||||||
|
MS_LOG(DEBUG) << "row: " << i << "\t" << tensor_map["image"]->shape() << "label:" << label << "label shape"
|
||||||
|
<< tensor_map["label"] << "priority: " << priority << " embedding : " <<
|
||||||
|
tensor_map["_embedding"]->shape() << "\n";
|
||||||
|
i++;
|
||||||
|
di.GetNextAsMap(&tensor_map);
|
||||||
|
}
|
||||||
|
MS_LOG(INFO) << "got rows" << i << "\n";
|
||||||
|
EXPECT_TRUE(i == 14);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
@ -0,0 +1,136 @@
|
|||||||
|
/**
|
||||||
|
* Copyright 2020 Huawei Technologies Co., Ltd
|
||||||
|
*
|
||||||
|
* Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
* you may not use this file except in compliance with the License.
|
||||||
|
* You may obtain a copy of the License at
|
||||||
|
*
|
||||||
|
* http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
*
|
||||||
|
* Unless required by applicable law or agreed to in writing, software
|
||||||
|
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
* See the License for the specific language governing permissions and
|
||||||
|
* limitations under the License.
|
||||||
|
*/
|
||||||
|
#include "common/common.h"
|
||||||
|
#include "minddata/dataset/include/datasets.h"
|
||||||
|
|
||||||
|
using namespace mindspore::dataset::api;
|
||||||
|
using mindspore::dataset::Tensor;
|
||||||
|
|
||||||
|
class MindDataTestPipeline : public UT::DatasetOpTesting {
|
||||||
|
protected:
|
||||||
|
};
|
||||||
|
|
||||||
|
TEST_F(MindDataTestPipeline, TestAlbumBasic) {
|
||||||
|
MS_LOG(INFO) << "Doing MindDataTestPipeline-TestAlbumBasic.";
|
||||||
|
|
||||||
|
std::string folder_path = datasets_root_path_ + "/testAlbum/images";
|
||||||
|
std::string schema_file = datasets_root_path_ + "/testAlbum/datasetSchema.json";
|
||||||
|
std::vector<std::string> column_names = {"image", "label", "id"};
|
||||||
|
// Create a Album Dataset
|
||||||
|
std::shared_ptr<Dataset> ds = Album(folder_path, schema_file, column_names);
|
||||||
|
EXPECT_NE(ds, nullptr);
|
||||||
|
|
||||||
|
// Create an iterator over the result of the above dataset
|
||||||
|
// This will trigger the creation of the Execution Tree and launch it.
|
||||||
|
std::shared_ptr<Iterator> iter = ds->CreateIterator();
|
||||||
|
EXPECT_NE(iter, nullptr);
|
||||||
|
|
||||||
|
// Iterate the dataset and get each row
|
||||||
|
std::unordered_map<std::string, std::shared_ptr<Tensor>> row;
|
||||||
|
iter->GetNextRow(&row);
|
||||||
|
|
||||||
|
uint64_t i = 0;
|
||||||
|
while (row.size() != 0) {
|
||||||
|
i++;
|
||||||
|
auto image = row["image"];
|
||||||
|
MS_LOG(INFO) << "Tensor image shape: " << image->shape();
|
||||||
|
iter->GetNextRow(&row);
|
||||||
|
}
|
||||||
|
|
||||||
|
EXPECT_EQ(i, 7);
|
||||||
|
|
||||||
|
// Manually terminate the pipeline
|
||||||
|
iter->Stop();
|
||||||
|
}
|
||||||
|
|
||||||
|
TEST_F(MindDataTestPipeline, TestAlbumDecode) {
|
||||||
|
MS_LOG(INFO) << "Doing MindDataTestPipeline-TestAlbumDecode.";
|
||||||
|
std::string folder_path = datasets_root_path_ + "/testAlbum/images";
|
||||||
|
std::string schema_file = datasets_root_path_ + "/testAlbum/datasetSchema.json";
|
||||||
|
std::vector<std::string> column_names = {"image", "label", "id"};
|
||||||
|
// Create a Album Dataset
|
||||||
|
std::shared_ptr<Dataset> ds = Album(folder_path, schema_file, column_names, true);
|
||||||
|
EXPECT_NE(ds, nullptr);
|
||||||
|
|
||||||
|
// Create an iterator over the result of the above dataset
|
||||||
|
// This will trigger the creation of the Execution Tree and launch it.
|
||||||
|
std::shared_ptr<Iterator> iter = ds->CreateIterator();
|
||||||
|
EXPECT_NE(iter, nullptr);
|
||||||
|
|
||||||
|
// Iterate the dataset and get each row
|
||||||
|
std::unordered_map<std::string, std::shared_ptr<Tensor>> row;
|
||||||
|
iter->GetNextRow(&row);
|
||||||
|
|
||||||
|
uint64_t i = 0;
|
||||||
|
while (row.size() != 0) {
|
||||||
|
i++;
|
||||||
|
auto image = row["image"];
|
||||||
|
auto shape = image->shape();
|
||||||
|
MS_LOG(INFO) << "Tensor image shape size: " << shape.Size();
|
||||||
|
MS_LOG(INFO) << "Tensor image shape: " << image->shape();
|
||||||
|
EXPECT_GT(shape.Size(), 1); // Verify decode=true took effect
|
||||||
|
iter->GetNextRow(&row);
|
||||||
|
}
|
||||||
|
|
||||||
|
EXPECT_EQ(i, 7);
|
||||||
|
|
||||||
|
// Manually terminate the pipeline
|
||||||
|
iter->Stop();
|
||||||
|
}
|
||||||
|
|
||||||
|
TEST_F(MindDataTestPipeline, TestAlbumNumSamplers) {
|
||||||
|
MS_LOG(INFO) << "Doing MindDataTestPipeline-TestAlbumNumSamplers.";
|
||||||
|
|
||||||
|
std::string folder_path = datasets_root_path_ + "/testAlbum/images";
|
||||||
|
std::string schema_file = datasets_root_path_ + "/testAlbum/datasetSchema.json";
|
||||||
|
std::vector<std::string> column_names = {"image", "label", "id"};
|
||||||
|
// Create a Album Dataset
|
||||||
|
std::shared_ptr<Dataset> ds = Album(folder_path, schema_file, column_names, true, SequentialSampler(0, 1));
|
||||||
|
EXPECT_NE(ds, nullptr);
|
||||||
|
|
||||||
|
// Create an iterator over the result of the above dataset
|
||||||
|
// This will trigger the creation of the Execution Tree and launch it.
|
||||||
|
std::shared_ptr<Iterator> iter = ds->CreateIterator();
|
||||||
|
EXPECT_NE(iter, nullptr);
|
||||||
|
|
||||||
|
// Iterate the dataset and get each row
|
||||||
|
std::unordered_map<std::string, std::shared_ptr<Tensor>> row;
|
||||||
|
iter->GetNextRow(&row);
|
||||||
|
|
||||||
|
uint64_t i = 0;
|
||||||
|
while (row.size() != 0) {
|
||||||
|
i++;
|
||||||
|
auto image = row["image"];
|
||||||
|
MS_LOG(INFO) << "Tensor image shape: " << image->shape();
|
||||||
|
iter->GetNextRow(&row);
|
||||||
|
}
|
||||||
|
|
||||||
|
EXPECT_EQ(i, 1);
|
||||||
|
|
||||||
|
// Manually terminate the pipeline
|
||||||
|
iter->Stop();
|
||||||
|
}
|
||||||
|
|
||||||
|
TEST_F(MindDataTestPipeline, TestAlbumError) {
|
||||||
|
MS_LOG(INFO) << "Doing MindDataTestPipeline-TestAlbumError.";
|
||||||
|
std::string folder_path = datasets_root_path_ + "/testAlbum/ima";
|
||||||
|
std::string schema_file = datasets_root_path_ + "/testAlbum/datasetSchema.json";
|
||||||
|
std::vector<std::string> column_names = {"image", "label", "id"};
|
||||||
|
// Create a Album Dataset
|
||||||
|
std::shared_ptr<Dataset> ds = Album(folder_path, schema_file, column_names, true, SequentialSampler(0, 1));
|
||||||
|
|
||||||
|
EXPECT_EQ(ds, nullptr);
|
||||||
|
}
|
@ -0,0 +1 @@
|
|||||||
|
just some random stuff
|
@ -0,0 +1,16 @@
|
|||||||
|
{
|
||||||
|
"columns": {
|
||||||
|
"image": {
|
||||||
|
"type": "uint8",
|
||||||
|
"rank": 1
|
||||||
|
},
|
||||||
|
"label" : {
|
||||||
|
"type": "string",
|
||||||
|
"rank": 1
|
||||||
|
},
|
||||||
|
"id" : {
|
||||||
|
"type": "int64",
|
||||||
|
"rank": 0
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
@ -0,0 +1 @@
|
|||||||
|
{"dataset": "", "image": "original/apple_expect_decoded.jpg", "label": ["3", "2"], "_priority": 0.8, "_embedding": "sample.bin", "_processed_image": "original/apple_expect_decoded.jpg"}
|
@ -1 +1 @@
|
|||||||
{"dataset": "", "image": "imagefolder/apple_expect_decoded.jpg", "label": [1, 2], "_priority": 0.8, "_embedding": "sample.bin", "_segmented_image": "imagefolder/apple_expect_decoded.jpg", "_processed_image": "imagefolder/apple_expect_decoded.jpg"}
|
{"dataset": "", "image": "testAlbum//testAlbum/original/apple_expect_resize_bilinear.jpg", "label": ["3", "2"], "_priority": 0.8, "_embedding": "testAlbum//testAlbum/sample.bin", "_processed_image": "testAlbum//testAlbum/original/apple_expect_resize_bilinear.jpg"}
|
||||||
|
@ -1 +1 @@
|
|||||||
{"dataset": "", "image": "imagefolder/apple_expect_resize_bilinear.jpg", "label": [1, 2], "_priority": 0.8, "_embedding": "sample.bin", "_segmented_image": "imagefolder/apple_expect_resize_bilinear.jpg", "_processed_image": "imagefolder/apple_expect_resize_bilinear.jpg"}
|
{"dataset": "", "image": "testAlbum//testAlbum/original/apple_expect_changemode.jpg", "label": ["3", "2"], "_priority": 0.8, "_embedding": "testAlbum//testAlbum/sample.bin", "_processed_image": "testAlbum//testAlbum/original/apple_expect_changemode.jpg"}
|
||||||
|
Some files were not shown because too many files have changed in this diff Show More
Loading…
Reference in new issue