Merge branch 'develop' into improve-prefetch-on-server

helinwang-patch-1
Qiao Longfei 7 years ago committed by GitHub
commit e2bb405252
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

@ -34,7 +34,7 @@ addons:
- automake
- libtool
- ccache
ssh_known_hosts: 52.76.173.135
ssh_known_hosts: 13.229.163.131
before_install:
- if [[ "$JOB" == "check_style" ]]; then sudo ln -s /usr/bin/clang-format-3.8 /usr/bin/clang-format; fi
# Paddle is using protobuf 3.1 currently. Protobuf 3.2 breaks the compatibility. So we specify the python

@ -0,0 +1,226 @@
# API Doc Standard
- [API Doc Structure](#API Doc Structure)
- [Format and Examples](#Format and Examples)
- [Complete Example](#Complete Example)
## API Doc Structure
API Doc should contain the following parts(please write them in order):
- Python API Definition
The definition of API
- Function Description
Description of API's function.
The description includes: meaning, purpose and operation on input of API, reference and corresponding link(if any), formula(if necessary) and explanations of key variables in the formula.
- Args Description
Description of API parameters.
Introduce parameters one by one according to the order in API definition.
The introduction includes: data type, default value(if any), meaning, etc.
- Returns
Introduction of API returned value.
Introduce meaning of returned value, provide correspoding format if necessary.
If returned value is a tuple containing multiple parameters, then introduce parameters one by one in order.
- Raisesif any
Abnormality, error that may occur, and possible reasons. If there are more than one possible abnormity or error, they should be listed in order.
- Noteif any
Matters needing attention. If there are more than one matters, they should be listed in order.
- Examples
Examples of how to use API.
## Format and Examples
API documentation must obey reStructuredText format, please refer to [here](http://sphinx-doc-zh.readthedocs.io/en/latest/rest.html).
Format and examples of each part of API documantation are as follows: (take fc for example)
- Python API Definition
- Format
[Python API Definition]
- Example
```
fc(input,
size,
num_flatten_dims=1,
param_attr=None,
bias_attr=None,
act=None,
name=None,
main_program=None,
startup_program=None)
```
- Function Description
- Format
This part contains (please write them in order):
[Function Description]
[Formula]
[Symbols' Descriptions if necessary]
[References if necessary]
- Example
[Function Description]
```
**Fully Connected Layer**
The fully connected layer can take multiple tensors as its inputs. It
creates a variable called weights for each input tensor, which represents
a fully connected weight matrix from each input unit to each output unit.
The fully connected layer multiplies each input tensor with its coresponding
weight to produce an output Tensor. If multiple input tensors are given,
the results of multiple multiplications will be sumed up. If bias_attr is
not None, a bias variable will be created and added to the output. Finally,
if activation is not None, it will be applied to the output as well.
```
[Formula]
```
This process can be formulated as follows:
.. math::
Out = Act({\sum_{i=0}^{N-1}X_iW_i + b})
```
[Symbols' Descriptions if necessary]
```
In the above equation:
* :math:`N`: Number of the input.
* :math:`X_i`: The input tensor.
* :math:`W`: The weights created by this layer.
* :math:`b`: The bias parameter created by this layer (if needed).
* :math:`Act`: The activation function.
* :math:`Out`: The output tensor.
```
[References if necessary]
Since there is no need for reference of fc, we omit them here. Under other circumstances, please provide explicit reference and link, take layer_norm for example:
```
Refer to `Layer Normalization <https://arxiv.org/pdf/1607.06450v1.pdf>`_ for more details.
```
- Args Description
- Format
\[Arg's Name\][(Data Type, Default Value)][Description]
- Example
part of fc parameters are as follows:
```
Args:
input (Variable|list of Variable): The input tensor(s) of this layer, and the dimension of
the input tensor(s) is at least 2.
param_attr (ParamAttr|list of ParamAttr, default None): The parameter attribute for learnable
parameters/weights of this layer.
name (str, default None): The name of this layer.
```
- Returns
- Format
[Name][Shape]
- Example
```
Returns:
A tensor variable storing the transformation result.
```
when returned value contain more than one tuple, please introduce every parameter in order, take dynamic_lstm for example:
```
Returns:
A tuple containing:
The hidden state of LSTM whose shape is (T X D).
The cell state of LSTM whose shape is (T X D).
```
- Raises
- Format
[Exception Type][Condition]
- Example
```
Raises:
ValueError: If the rank of the input is less than 2.
```
- Note
- Format
[Note]
- Example
there is no Note in fc, so we omit this part. If there is any note, please write clearly. If there are more than one notes, please list them in order. Take scaled\_dot\_product\_attention for example:
```
Note:
1. When num_heads > 1, three linear projections are learned respectively
to map input queries, keys and values into queries', keys' and values'.
queries', keys' and values' have the same shapes with queries, keys
and values.
2. When num_heads == 1, scaled_dot_product_attention has no learnable
parameters.
```
- Examples
- Format
\[Python Code Snipper]
- Example
```
Examples:
.. code-block:: python
data = fluid.layers.data(name="data", shape=[32, 32], dtype="float32")
fc = fluid.layers.fc(input=data, size=1000, act="tanh")
```
## Complete Example
Complete Example of fc please see [here](src/fc.py)。

@ -104,7 +104,7 @@ cc_test(init_test SRCS init_test.cc DEPS init)
cc_test(op_kernel_type_test SRCS op_kernel_type_test.cc DEPS place device_context framework_proto)
cc_test(cow_ptr_tests SRCS details/cow_ptr_test.cc)
# cc_test(channel_test SRCS channel_test.cc)
cc_test(channel_test SRCS channel_test.cc)
cc_test(tuple_test SRCS tuple_test.cc )
cc_test(concurrency_test SRCS concurrency_test.cc DEPS go_op channel_close_op channel_create_op
channel_send_op channel_recv_op sum_op select_op elementwise_add_op compare_op

@ -138,8 +138,8 @@ void ChannelImpl<T>::Send(T *item) {
// If channel is closed, throw exception
if (closed_) {
lock.unlock();
send_return();
lock.unlock();
PADDLE_THROW("Cannot send on closed channel");
}
@ -152,11 +152,9 @@ void ChannelImpl<T>::Send(T *item) {
if (m != nullptr) {
*(m->data) = std::move(*item);
m->Notify();
lock.unlock();
send_return();
return;
} else {
lock.unlock();
Send(item);
send_return();
return;
@ -169,8 +167,6 @@ void ChannelImpl<T>::Send(T *item) {
if (buf_.size() < cap_) {
// Copy to buffer
buf_.push_back(std::move(*item));
// Release lock and return true
lock.unlock();
send_return();
return;
}
@ -181,8 +177,8 @@ void ChannelImpl<T>::Send(T *item) {
sendq.push_back(m);
m->Wait(lock);
if (m->chan_closed) {
lock.unlock();
send_return();
lock.unlock();
PADDLE_THROW("Cannot send on closed channel");
}
send_return();
@ -195,10 +191,7 @@ bool ChannelImpl<T>::Receive(T *item) {
// If channel is closed and buffer is empty or
// channel is unbuffered
if (closed_ && buf_.empty()) {
lock.unlock();
return recv_return(false);
}
if (closed_ && buf_.empty()) return recv_return(false);
// If there is a sender, directly receive the value we want
// from the sender. In case of a buffered channel, read from
@ -229,7 +222,6 @@ bool ChannelImpl<T>::Receive(T *item) {
} else
return recv_return(Receive(item));
}
lock.unlock();
return recv_return(true);
}
@ -238,8 +230,7 @@ bool ChannelImpl<T>::Receive(T *item) {
// Directly read from buffer
*item = std::move(buf_.front());
buf_.pop_front();
// Release lock and return true
lock.unlock();
// return true
return recv_return(true);
}

@ -2,8 +2,8 @@ if(WITH_DISTRIBUTE)
grpc_library(sendrecvop_grpc SRCS bytebuffer_stream.cc sendrecvop_utils.cc grpc_client.cc
grpc_server.cc variable_response.cc PROTO send_recv.proto DEPS lod_tensor selected_rows)
set(DISTRIBUTE_COMPILE_FLAGS "-Wno-non-virtual-dtor -Wno-error=non-virtual-dtor -Wno-error=delete-non-virtual-dtor")
set_source_files_properties(test_serde.cc grpc_server_test.cc PROPERTIES COMPILE_FLAGS ${DISTRIBUTE_COMPILE_FLAGS})
cc_test(serde_test SRCS test_serde.cc variable_response.cc DEPS grpc++_unsecure grpc_unsecure gpr
set_source_files_properties(serde_test.cc grpc_server_test.cc PROPERTIES COMPILE_FLAGS ${DISTRIBUTE_COMPILE_FLAGS})
cc_test(serde_test SRCS serde_test.cc variable_response.cc DEPS grpc++_unsecure grpc_unsecure gpr
cares zlib protobuf sendrecvop_grpc)
cc_test(grpc_server_test SRCS grpc_server_test.cc DEPS sendrecvop_grpc grpc++_unsecure grpc_unsecure gpr cares zlib protobuf)
endif()

@ -156,12 +156,12 @@ class RequestPrefetch final : public RequestBase {
virtual void Process() {
// prefetch process...
::grpc::ByteBuffer relay;
::grpc::ByteBuffer reply;
// TODO(Yancey1989): execute the Block which containers prefetch ops
VLOG(3) << "RequestPrefetch Process in";
responder_.Finish(relay, ::grpc::Status::OK, this);
responder_.Finish(reply, ::grpc::Status::OK, this);
status_ = FINISH;
}

@ -73,12 +73,13 @@ add_custom_target(paddle_python ALL DEPENDS ${paddle_python_deps})
set(PADDLE_PYTHON_PACKAGE_DIR ${CMAKE_CURRENT_BINARY_DIR}/dist/)
if (WITH_TESTING)
add_subdirectory(paddle/reader/tests)
add_subdirectory(paddle/dataset/tests)
if(NOT WITH_FLUID_ONLY)
add_subdirectory(paddle/trainer_config_helpers/tests)
if (WITH_SWIG_PY)
# enable v2 API unittest only when paddle swig api is compiled
add_subdirectory(paddle/v2/tests)
add_subdirectory(paddle/v2/reader/tests)
add_subdirectory(paddle/v2/plot/tests)
endif()
endif()

@ -14,8 +14,14 @@
try:
from version import full_version as __version__
from version import commit as __git_commit__
except ImportError:
import sys
sys.stderr.write('''Warning with import paddle: you should not
import paddle from the source directory; please install paddlepaddle*.whl firstly.'''
)
import reader
import dataset
import batch
batch = batch.batch

@ -28,6 +28,7 @@ import wmt16
import mq2007
import flowers
import voc2012
import image
__all__ = [
'mnist',
@ -43,4 +44,5 @@ __all__ = [
'mq2007',
'flowers',
'voc2012',
'image',
]

@ -31,7 +31,7 @@ images per class.
import cPickle
import itertools
import numpy
import paddle.v2.dataset.common
import paddle.dataset.common
import tarfile
__all__ = ['train100', 'test100', 'train10', 'test10', 'convert']
@ -75,7 +75,7 @@ def train100():
:rtype: callable
"""
return reader_creator(
paddle.v2.dataset.common.download(CIFAR100_URL, 'cifar', CIFAR100_MD5),
paddle.dataset.common.download(CIFAR100_URL, 'cifar', CIFAR100_MD5),
'train')
@ -90,7 +90,7 @@ def test100():
:rtype: callable
"""
return reader_creator(
paddle.v2.dataset.common.download(CIFAR100_URL, 'cifar', CIFAR100_MD5),
paddle.dataset.common.download(CIFAR100_URL, 'cifar', CIFAR100_MD5),
'test')
@ -105,7 +105,7 @@ def train10():
:rtype: callable
"""
return reader_creator(
paddle.v2.dataset.common.download(CIFAR10_URL, 'cifar', CIFAR10_MD5),
paddle.dataset.common.download(CIFAR10_URL, 'cifar', CIFAR10_MD5),
'data_batch')
@ -120,20 +120,20 @@ def test10():
:rtype: callable
"""
return reader_creator(
paddle.v2.dataset.common.download(CIFAR10_URL, 'cifar', CIFAR10_MD5),
paddle.dataset.common.download(CIFAR10_URL, 'cifar', CIFAR10_MD5),
'test_batch')
def fetch():
paddle.v2.dataset.common.download(CIFAR10_URL, 'cifar', CIFAR10_MD5)
paddle.v2.dataset.common.download(CIFAR100_URL, 'cifar', CIFAR100_MD5)
paddle.dataset.common.download(CIFAR10_URL, 'cifar', CIFAR10_MD5)
paddle.dataset.common.download(CIFAR100_URL, 'cifar', CIFAR100_MD5)
def convert(path):
"""
Converts dataset to recordio format
"""
paddle.v2.dataset.common.convert(path, train100(), 1000, "cifar_train100")
paddle.v2.dataset.common.convert(path, test100(), 1000, "cifar_test100")
paddle.v2.dataset.common.convert(path, train10(), 1000, "cifar_train10")
paddle.v2.dataset.common.convert(path, test10(), 1000, "cifar_test10")
paddle.dataset.common.convert(path, train100(), 1000, "cifar_train100")
paddle.dataset.common.convert(path, test100(), 1000, "cifar_test100")
paddle.dataset.common.convert(path, train10(), 1000, "cifar_train10")
paddle.dataset.common.convert(path, test10(), 1000, "cifar_test10")

@ -19,7 +19,7 @@ import errno
import shutil
import sys
import importlib
import paddle.v2.dataset
import paddle.dataset
import cPickle
import glob
import cPickle as pickle
@ -105,24 +105,24 @@ def download(url, module_name, md5sum, save_name=None):
def fetch_all():
for module_name in filter(lambda x: not x.startswith("__"),
dir(paddle.v2.dataset)):
dir(paddle.dataset)):
if "fetch" in dir(
importlib.import_module("paddle.v2.dataset.%s" % module_name)):
importlib.import_module("paddle.dataset.%s" % module_name)):
getattr(
importlib.import_module("paddle.v2.dataset.%s" % module_name),
importlib.import_module("paddle.dataset.%s" % module_name),
"fetch")()
def fetch_all_recordio(path):
for module_name in filter(lambda x: not x.startswith("__"),
dir(paddle.v2.dataset)):
dir(paddle.dataset)):
if "convert" in dir(
importlib.import_module("paddle.v2.dataset.%s" % module_name)) and \
importlib.import_module("paddle.dataset.%s" % module_name)) and \
not module_name == "common":
ds_path = os.path.join(path, module_name)
must_mkdirs(ds_path)
getattr(
importlib.import_module("paddle.v2.dataset.%s" % module_name),
importlib.import_module("paddle.dataset.%s" % module_name),
"convert")(ds_path)
@ -130,7 +130,7 @@ def split(reader, line_count, suffix="%05d.pickle", dumper=cPickle.dump):
"""
you can call the function as:
split(paddle.v2.dataset.cifar.train10(), line_count=1000,
split(paddle.dataset.cifar.train10(), line_count=1000,
suffix="imikolov-train-%05d.pickle")
the output files as:

@ -23,7 +23,7 @@ to initialize SRL model.
import tarfile
import gzip
import itertools
import paddle.v2.dataset.common
import paddle.dataset.common
__all__ = ['test, get_dict', 'get_embedding', 'convert']
@ -203,14 +203,11 @@ def get_dict():
Get the word, verb and label dictionary of Wikipedia corpus.
"""
word_dict = load_dict(
paddle.v2.dataset.common.download(WORDDICT_URL, 'conll05st',
WORDDICT_MD5))
paddle.dataset.common.download(WORDDICT_URL, 'conll05st', WORDDICT_MD5))
verb_dict = load_dict(
paddle.v2.dataset.common.download(VERBDICT_URL, 'conll05st',
VERBDICT_MD5))
paddle.dataset.common.download(VERBDICT_URL, 'conll05st', VERBDICT_MD5))
label_dict = load_label_dict(
paddle.v2.dataset.common.download(TRGDICT_URL, 'conll05st',
TRGDICT_MD5))
paddle.dataset.common.download(TRGDICT_URL, 'conll05st', TRGDICT_MD5))
return word_dict, verb_dict, label_dict
@ -218,7 +215,7 @@ def get_embedding():
"""
Get the trained word vector based on Wikipedia corpus.
"""
return paddle.v2.dataset.common.download(EMB_URL, 'conll05st', EMB_MD5)
return paddle.dataset.common.download(EMB_URL, 'conll05st', EMB_MD5)
def test():
@ -235,23 +232,23 @@ def test():
"""
word_dict, verb_dict, label_dict = get_dict()
reader = corpus_reader(
paddle.v2.dataset.common.download(DATA_URL, 'conll05st', DATA_MD5),
paddle.dataset.common.download(DATA_URL, 'conll05st', DATA_MD5),
words_name='conll05st-release/test.wsj/words/test.wsj.words.gz',
props_name='conll05st-release/test.wsj/props/test.wsj.props.gz')
return reader_creator(reader, word_dict, verb_dict, label_dict)
def fetch():
paddle.v2.dataset.common.download(WORDDICT_URL, 'conll05st', WORDDICT_MD5)
paddle.v2.dataset.common.download(VERBDICT_URL, 'conll05st', VERBDICT_MD5)
paddle.v2.dataset.common.download(TRGDICT_URL, 'conll05st', TRGDICT_MD5)
paddle.v2.dataset.common.download(EMB_URL, 'conll05st', EMB_MD5)
paddle.v2.dataset.common.download(DATA_URL, 'conll05st', DATA_MD5)
paddle.dataset.common.download(WORDDICT_URL, 'conll05st', WORDDICT_MD5)
paddle.dataset.common.download(VERBDICT_URL, 'conll05st', VERBDICT_MD5)
paddle.dataset.common.download(TRGDICT_URL, 'conll05st', TRGDICT_MD5)
paddle.dataset.common.download(EMB_URL, 'conll05st', EMB_MD5)
paddle.dataset.common.download(DATA_URL, 'conll05st', DATA_MD5)
def convert(path):
"""
Converts dataset to recordio format
"""
paddle.v2.dataset.common.convert(path, test(), 1000, "conl105_train")
paddle.v2.dataset.common.convert(path, test(), 1000, "conl105_test")
paddle.dataset.common.convert(path, test(), 1000, "conl105_train")
paddle.dataset.common.convert(path, test(), 1000, "conl105_test")

@ -34,8 +34,8 @@ import functools
from common import download
import tarfile
import scipy.io as scio
from paddle.v2.image import *
from paddle.v2.reader import *
from paddle.dataset.image import *
from paddle.reader import *
import os
import numpy as np
from multiprocessing import cpu_count

@ -20,7 +20,7 @@ of 25,000 highly polar movie reviews for training, and 25,000 for testing.
Besides, this module also provides API for building dictionary.
"""
import paddle.v2.dataset.common
import paddle.dataset.common
import collections
import tarfile
import re
@ -37,8 +37,7 @@ def tokenize(pattern):
Read files that match the given pattern. Tokenize and yield each file.
"""
with tarfile.open(paddle.v2.dataset.common.download(URL, 'imdb',
MD5)) as tarf:
with tarfile.open(paddle.dataset.common.download(URL, 'imdb', MD5)) as tarf:
# Note that we should use tarfile.next(), which does
# sequential access of member files, other than
# tarfile.extractfile, which does random access and might
@ -136,7 +135,7 @@ def word_dict():
def fetch():
paddle.v2.dataset.common.download(URL, 'imdb', MD5)
paddle.dataset.common.download(URL, 'imdb', MD5)
def convert(path):
@ -144,5 +143,5 @@ def convert(path):
Converts dataset to recordio format
"""
w = word_dict()
paddle.v2.dataset.common.convert(path, lambda: train(w), 1000, "imdb_train")
paddle.v2.dataset.common.convert(path, lambda: test(w), 1000, "imdb_test")
paddle.dataset.common.convert(path, lambda: train(w), 1000, "imdb_train")
paddle.dataset.common.convert(path, lambda: test(w), 1000, "imdb_test")

@ -18,7 +18,7 @@ This module will download dataset from
http://www.fit.vutbr.cz/~imikolov/rnnlm/ and parse training set and test set
into paddle reader creators.
"""
import paddle.v2.dataset.common
import paddle.dataset.common
import collections
import tarfile
@ -54,9 +54,9 @@ def build_dict(min_word_freq=50):
train_filename = './simple-examples/data/ptb.train.txt'
test_filename = './simple-examples/data/ptb.valid.txt'
with tarfile.open(
paddle.v2.dataset.common.download(
paddle.v2.dataset.imikolov.URL, 'imikolov',
paddle.v2.dataset.imikolov.MD5)) as tf:
paddle.dataset.common.download(paddle.dataset.imikolov.URL,
'imikolov',
paddle.dataset.imikolov.MD5)) as tf:
trainf = tf.extractfile(train_filename)
testf = tf.extractfile(test_filename)
word_freq = word_count(testf, word_count(trainf))
@ -77,9 +77,9 @@ def build_dict(min_word_freq=50):
def reader_creator(filename, word_idx, n, data_type):
def reader():
with tarfile.open(
paddle.v2.dataset.common.download(
paddle.v2.dataset.imikolov.URL, 'imikolov',
paddle.v2.dataset.imikolov.MD5)) as tf:
paddle.dataset.common.download(
paddle.dataset.imikolov.URL, 'imikolov',
paddle.dataset.imikolov.MD5)) as tf:
f = tf.extractfile(filename)
UNK = word_idx['<unk>']
@ -145,7 +145,7 @@ def test(word_idx, n, data_type=DataType.NGRAM):
def fetch():
paddle.v2.dataset.common.download(URL, "imikolov", MD5)
paddle.dataset.common.download(URL, "imikolov", MD5)
def convert(path):
@ -154,8 +154,7 @@ def convert(path):
"""
N = 5
word_dict = build_dict()
paddle.v2.dataset.common.convert(path,
train(word_dict, N), 1000,
"imikolov_train")
paddle.v2.dataset.common.convert(path,
test(word_dict, N), 1000, "imikolov_test")
paddle.dataset.common.convert(path,
train(word_dict, N), 1000, "imikolov_train")
paddle.dataset.common.convert(path,
test(word_dict, N), 1000, "imikolov_test")

@ -17,7 +17,7 @@ MNIST dataset.
This module will download dataset from http://yann.lecun.com/exdb/mnist/ and
parse training set and test set into paddle reader creators.
"""
import paddle.v2.dataset.common
import paddle.dataset.common
import subprocess
import numpy
import platform
@ -85,10 +85,10 @@ def train():
:rtype: callable
"""
return reader_creator(
paddle.v2.dataset.common.download(TRAIN_IMAGE_URL, 'mnist',
TRAIN_IMAGE_MD5),
paddle.v2.dataset.common.download(TRAIN_LABEL_URL, 'mnist',
TRAIN_LABEL_MD5), 100)
paddle.dataset.common.download(TRAIN_IMAGE_URL, 'mnist',
TRAIN_IMAGE_MD5),
paddle.dataset.common.download(TRAIN_LABEL_URL, 'mnist',
TRAIN_LABEL_MD5), 100)
def test():
@ -102,22 +102,21 @@ def test():
:rtype: callable
"""
return reader_creator(
paddle.v2.dataset.common.download(TEST_IMAGE_URL, 'mnist',
TEST_IMAGE_MD5),
paddle.v2.dataset.common.download(TEST_LABEL_URL, 'mnist',
TEST_LABEL_MD5), 100)
paddle.dataset.common.download(TEST_IMAGE_URL, 'mnist', TEST_IMAGE_MD5),
paddle.dataset.common.download(TEST_LABEL_URL, 'mnist', TEST_LABEL_MD5),
100)
def fetch():
paddle.v2.dataset.common.download(TRAIN_IMAGE_URL, 'mnist', TRAIN_IMAGE_MD5)
paddle.v2.dataset.common.download(TRAIN_LABEL_URL, 'mnist', TRAIN_LABEL_MD5)
paddle.v2.dataset.common.download(TEST_IMAGE_URL, 'mnist', TEST_IMAGE_MD5)
paddle.v2.dataset.common.download(TEST_LABEL_URL, 'mnist', TRAIN_LABEL_MD5)
paddle.dataset.common.download(TRAIN_IMAGE_URL, 'mnist', TRAIN_IMAGE_MD5)
paddle.dataset.common.download(TRAIN_LABEL_URL, 'mnist', TRAIN_LABEL_MD5)
paddle.dataset.common.download(TEST_IMAGE_URL, 'mnist', TEST_IMAGE_MD5)
paddle.dataset.common.download(TEST_LABEL_URL, 'mnist', TRAIN_LABEL_MD5)
def convert(path):
"""
Converts dataset to recordio format
"""
paddle.v2.dataset.common.convert(path, train(), 1000, "minist_train")
paddle.v2.dataset.common.convert(path, test(), 1000, "minist_test")
paddle.dataset.common.convert(path, train(), 1000, "minist_train")
paddle.dataset.common.convert(path, test(), 1000, "minist_test")

@ -23,7 +23,7 @@ set and test set into paddle reader creators.
"""
import zipfile
import paddle.v2.dataset.common
import paddle.dataset.common
import re
import random
import functools
@ -100,7 +100,7 @@ USER_INFO = None
def __initialize_meta_info__():
fn = paddle.v2.dataset.common.download(URL, "movielens", MD5)
fn = paddle.dataset.common.download(URL, "movielens", MD5)
global MOVIE_INFO
if MOVIE_INFO is None:
pattern = re.compile(r'^(.*)\((\d+)\)$')
@ -247,15 +247,15 @@ def unittest():
def fetch():
paddle.v2.dataset.common.download(URL, "movielens", MD5)
paddle.dataset.common.download(URL, "movielens", MD5)
def convert(path):
"""
Converts dataset to recordio format
"""
paddle.v2.dataset.common.convert(path, train(), 1000, "movielens_train")
paddle.v2.dataset.common.convert(path, test(), 1000, "movielens_test")
paddle.dataset.common.convert(path, train(), 1000, "movielens_train")
paddle.dataset.common.convert(path, test(), 1000, "movielens_test")
if __name__ == '__main__':

@ -26,7 +26,7 @@ from itertools import chain
import nltk
from nltk.corpus import movie_reviews
import paddle.v2.dataset.common
import paddle.dataset.common
__all__ = ['train', 'test', 'get_word_dict', 'convert']
NUM_TRAINING_INSTANCES = 1600
@ -39,13 +39,13 @@ def download_data_if_not_yet():
"""
try:
# make sure that nltk can find the data
if paddle.v2.dataset.common.DATA_HOME not in nltk.data.path:
nltk.data.path.append(paddle.v2.dataset.common.DATA_HOME)
if paddle.dataset.common.DATA_HOME not in nltk.data.path:
nltk.data.path.append(paddle.dataset.common.DATA_HOME)
movie_reviews.categories()
except LookupError:
print "Downloading movie_reviews data set, please wait....."
nltk.download(
'movie_reviews', download_dir=paddle.v2.dataset.common.DATA_HOME)
'movie_reviews', download_dir=paddle.dataset.common.DATA_HOME)
print "Download data set success....."
print "Path is " + nltk.data.find('corpora/movie_reviews').path
@ -129,13 +129,12 @@ def test():
def fetch():
nltk.download(
'movie_reviews', download_dir=paddle.v2.dataset.common.DATA_HOME)
nltk.download('movie_reviews', download_dir=paddle.dataset.common.DATA_HOME)
def convert(path):
"""
Converts dataset to recordio format
"""
paddle.v2.dataset.common.convert(path, train, 1000, "sentiment_train")
paddle.v2.dataset.common.convert(path, test, 1000, "sentiment_test")
paddle.dataset.common.convert(path, train, 1000, "sentiment_train")
paddle.dataset.common.convert(path, test, 1000, "sentiment_test")

@ -0,0 +1 @@
py_test(test_image SRCS test_image.py)

Before

Width:  |  Height:  |  Size: 56 KiB

After

Width:  |  Height:  |  Size: 56 KiB

@ -12,7 +12,7 @@
# See the License for the specific language governing permissions and
# limitations under the License.
import paddle.v2.dataset.cifar
import paddle.dataset.cifar
import unittest
@ -29,25 +29,25 @@ class TestCIFAR(unittest.TestCase):
def test_test10(self):
instances, max_label_value = self.check_reader(
paddle.v2.dataset.cifar.test10())
paddle.dataset.cifar.test10())
self.assertEqual(instances, 10000)
self.assertEqual(max_label_value, 9)
def test_train10(self):
instances, max_label_value = self.check_reader(
paddle.v2.dataset.cifar.train10())
paddle.dataset.cifar.train10())
self.assertEqual(instances, 50000)
self.assertEqual(max_label_value, 9)
def test_test100(self):
instances, max_label_value = self.check_reader(
paddle.v2.dataset.cifar.test100())
paddle.dataset.cifar.test100())
self.assertEqual(instances, 10000)
self.assertEqual(max_label_value, 99)
def test_train100(self):
instances, max_label_value = self.check_reader(
paddle.v2.dataset.cifar.train100())
paddle.dataset.cifar.train100())
self.assertEqual(instances, 50000)
self.assertEqual(max_label_value, 99)

Some files were not shown because too many files have changed in this diff Show More

Loading…
Cancel
Save