!1768 dataset: repair format and type problem in split

Merge pull request !1768 from ms_yan/format_and_type
pull/1768/MERGE
mindspore-ci-bot 5 years ago committed by Gitee
commit b250b08781

@ -613,7 +613,7 @@ class Dataset:
# if we still need more rows, give them to the first split. # if we still need more rows, give them to the first split.
# if we have too many rows, remove the extras from the first split that has # if we have too many rows, remove the extras from the first split that has
# enough rows. # enough rows.
size_difference = dataset_size - absolute_sizes_sum size_difference = int(dataset_size - absolute_sizes_sum)
if size_difference > 0: if size_difference > 0:
absolute_sizes[0] += size_difference absolute_sizes[0] += size_difference
else: else:
@ -647,10 +647,14 @@ class Dataset:
Datasets of size round(f1*K), round(f2*K), , round(fn*K) where K is the size of the Datasets of size round(f1*K), round(f2*K), , round(fn*K) where K is the size of the
original dataset. original dataset.
If after rounding: If after rounding:
-Any size equals 0, an error will occur.
-The sum of split sizes < K, the difference will be added to the first split. - Any size equals 0, an error will occur.
-The sum of split sizes > K, the difference will be removed from the first large
- The sum of split sizes < K, the difference will be added to the first split.
- The sum of split sizes > K, the difference will be removed from the first large
enough split such that it will have atleast 1 row after removing the difference. enough split such that it will have atleast 1 row after removing the difference.
randomize (bool, optional): determines whether or not to split the data randomly (default=True). randomize (bool, optional): determines whether or not to split the data randomly (default=True).
If true, the data will be randomly split. Otherwise, each split will be created with If true, the data will be randomly split. Otherwise, each split will be created with
consecutive rows from the dataset. consecutive rows from the dataset.
@ -1282,10 +1286,14 @@ class MappableDataset(SourceDataset):
Datasets of size round(f1*K), round(f2*K), , round(fn*K) where K is the size of the Datasets of size round(f1*K), round(f2*K), , round(fn*K) where K is the size of the
original dataset. original dataset.
If after rounding: If after rounding:
-Any size equals 0, an error will occur.
-The sum of split sizes < K, the difference will be added to the first split. - Any size equals 0, an error will occur.
-The sum of split sizes > K, the difference will be removed from the first large
- The sum of split sizes < K, the difference will be added to the first split.
- The sum of split sizes > K, the difference will be removed from the first large
enough split such that it will have atleast 1 row after removing the difference. enough split such that it will have atleast 1 row after removing the difference.
randomize (bool, optional): determines whether or not to split the data randomly (default=True). randomize (bool, optional): determines whether or not to split the data randomly (default=True).
If true, the data will be randomly split. Otherwise, each split will be created with If true, the data will be randomly split. Otherwise, each split will be created with
consecutive rows from the dataset. consecutive rows from the dataset.

Loading…
Cancel
Save