repair format and type problem in split

pull/1768/head
ms_yan 5 years ago
parent 32a72c1979
commit a3f7531500

@ -613,7 +613,7 @@ class Dataset:
# if we still need more rows, give them to the first split. # if we still need more rows, give them to the first split.
# if we have too many rows, remove the extras from the first split that has # if we have too many rows, remove the extras from the first split that has
# enough rows. # enough rows.
size_difference = dataset_size - absolute_sizes_sum size_difference = int(dataset_size - absolute_sizes_sum)
if size_difference > 0: if size_difference > 0:
absolute_sizes[0] += size_difference absolute_sizes[0] += size_difference
else: else:
@ -647,10 +647,14 @@ class Dataset:
Datasets of size round(f1*K), round(f2*K), , round(fn*K) where K is the size of the Datasets of size round(f1*K), round(f2*K), , round(fn*K) where K is the size of the
original dataset. original dataset.
If after rounding: If after rounding:
-Any size equals 0, an error will occur.
-The sum of split sizes < K, the difference will be added to the first split. - Any size equals 0, an error will occur.
-The sum of split sizes > K, the difference will be removed from the first large
enough split such that it will have atleast 1 row after removing the difference. - The sum of split sizes < K, the difference will be added to the first split.
- The sum of split sizes > K, the difference will be removed from the first large
enough split such that it will have atleast 1 row after removing the difference.
randomize (bool, optional): determines whether or not to split the data randomly (default=True). randomize (bool, optional): determines whether or not to split the data randomly (default=True).
If true, the data will be randomly split. Otherwise, each split will be created with If true, the data will be randomly split. Otherwise, each split will be created with
consecutive rows from the dataset. consecutive rows from the dataset.
@ -1282,10 +1286,14 @@ class MappableDataset(SourceDataset):
Datasets of size round(f1*K), round(f2*K), , round(fn*K) where K is the size of the Datasets of size round(f1*K), round(f2*K), , round(fn*K) where K is the size of the
original dataset. original dataset.
If after rounding: If after rounding:
-Any size equals 0, an error will occur.
-The sum of split sizes < K, the difference will be added to the first split. - Any size equals 0, an error will occur.
-The sum of split sizes > K, the difference will be removed from the first large
enough split such that it will have atleast 1 row after removing the difference. - The sum of split sizes < K, the difference will be added to the first split.
- The sum of split sizes > K, the difference will be removed from the first large
enough split such that it will have atleast 1 row after removing the difference.
randomize (bool, optional): determines whether or not to split the data randomly (default=True). randomize (bool, optional): determines whether or not to split the data randomly (default=True).
If true, the data will be randomly split. Otherwise, each split will be created with If true, the data will be randomly split. Otherwise, each split will be created with
consecutive rows from the dataset. consecutive rows from the dataset.

Loading…
Cancel
Save