Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into my_spp_op

del_some_in_makelist
sweetsky0901 7 years ago
commit 89de58d990

1
.gitignore vendored

@ -28,3 +28,4 @@ cmake_install.cmake
paddle/.timestamp
python/paddlepaddle.egg-info/
paddle/pybind/pybind.h
python/paddle/version.py

@ -22,6 +22,8 @@ SET(CMAKE_C_FLAGS_RELWITHDEBINFO "-O3 -g -DNDEBUG")
include(system)
project(paddle CXX C Go)
message(STATUS "CXX compiler: " ${CMAKE_CXX_COMPILER} ", version: " ${CMAKE_CXX_COMPILER_VERSION})
message(STATUS "C compiler: " ${CMAKE_C_COMPILER} ", version: " ${CMAKE_C_COMPILER_VERSION})
find_package(Sphinx)
if(NOT CMAKE_CROSSCOMPILING)
@ -58,6 +60,7 @@ option(GLIDE_INSTALL "Download and install go dependencies " ON)
option(USE_NNPACK "Compile PaddlePaddle with NNPACK library" OFF)
option(WITH_DISTRIBUTE "Compile with grpc distributed support" OFF)
option(USE_EIGEN_FOR_BLAS "Use matrix multiplication in Eigen" OFF)
option(WITH_ARM_FP16 "Use half precision support on armv8.2-a cpu" OFF)
# CMAKE_BUILD_TYPE
if(NOT CMAKE_BUILD_TYPE)

@ -1,3 +1,62 @@
# v0.11.0版本
## PaddlePaddle Fluid
- PaddlePaddle发布版本v0.11.0包含一个新的特性*PaddlePaddle Fluid*. Fluid 是设计用来让用户像Pytorch和Tensorflow Eager Execution一样执行程序。在这些系统中不再有*模型*这个概念应用也不再包含一个用于描述Operator图或者一系列层的符号描述而是像通用程序那样描述训练或者预测的过程。而Fluid与PyTorch或Eager Execution的区别在于Fluid不依赖Python提供的控制流例如 if-else-then或者for而是提供了基于C++实现的控制流并暴露了对应的用with语法实现的Python接口。例如
https://github.com/PaddlePaddle/Paddle/blob/3df78ed2a98d37f7ae6725894cc7514effd5664b/python/paddle/v2/fluid/tests/test_while_op.py#L36-L44
- 在v0.11.0版本中我们提供了一个C++类`Executor`用于运行一个Fluid程序。Executor类似一个解释器。在未来的版本中我们将提升和优化Executor成为一个调试器就像GDB。并可能提供一些编译器这个编译器会读取一个上文所描述的应用然后编译成一个等价的
源代码这个源代码可以被nvcc编译成可以使用CUDA的二进制或者被icc编译成可以充分利用Intel CPU的二进制。
## 新特点
* 发布 `PaddlePaddle Fluid`
* 增加了用于模型预测的C-API。
* 用Fluid API实现了一个简单的GAN的例子。
* 增加了关于性能调优的文档。
* 为`paddle.v2.dataset`下载数据集提供了重试机制.
* C++中使用protobuf-lite替换protobuf减少了二进制的大小。
* 发布了新特性 [Elastic Deep Learning (EDL)](https://github.com/PaddlePaddle/cloud/tree/develop/doc/autoscale/experiment).
* 基于Bazel API利用cmake实现了一个的新的构建系统函数库。
* 当使用编译选项`WITH_MKL=ON`时自动下载和编译Intel® [MKLML](https://github.com/01org/mkl-dnn/releases/download/v0.11/mklml_lnx_2018.0.1.20171007.tgz) 函数库.
* [Intel® MKL-DNN on PaddlePaddle](https://github.com/PaddlePaddle/Paddle/tree/develop/doc/design/mkldnn):
- 完成了 11个 MKL-DNN 层: Convolution, Fully connectivity, Pooling, ReLU, Tanh, ELU, Softmax, BatchNorm, AddTo, Concat, LRN。
- 完成了 3个 MKL-DNN 网络: VGG-19, ResNet-50, GoogleNet
- 基于Intel Skylake 6148 CPU的[性能测试](https://github.com/PaddlePaddle/Paddle/blob/develop/benchmark/IntelOptimizedPaddle.md) : 相对于MKLML有2~3倍的训练加速。
* 增加 [softsign activation](http://www.paddlepaddle.org/docs/develop/documentation/zh/api/v2/config/activation.html#softsign)
* 增加 [dot product layer](http://www.paddlepaddle.org/docs/develop/documentation/zh/api/v2/config/layer.html#dot-prod)
* 增加 [L2 distance layer](http://www.paddlepaddle.org/docs/develop/documentation/zh/api/v2/config/layer.html#l2-distance)
* 增加 [sub-nested sequence layer](http://www.paddlepaddle.org/docs/develop/documentation/zh/api/v2/config/layer.html#sub-nested-seq)
* 增加 [kmax sequence score layer](http://www.paddlepaddle.org/docs/develop/documentation/zh/api/v2/config/layer.html#kmax-sequence-score)
* 增加 [sequence slice layer](http://www.paddlepaddle.org/docs/develop/documentation/zh/api/v2/config/layer.html#seq-slice)
* 增加 [row convolution layer](http://www.paddlepaddle.org/docs/develop/documentation/zh/api/v2/config/layer.html#row-conv)
* 增加移动端友好的网页
## 改进
* 使用一个Python`whl`包即可安装.
* [V2 API可以实现用户定制化评估](https://github.com/PaddlePaddle/models/tree/develop/ltr#训练过程中输出自定义评估指标)。
* 将 `PADDLE_ONLY_CPU` 改为 `PADDLE_WITH_GPU`, 因为我们会支持多种设备。
* 删除了有一些bug的BarrierStat。
* 清理和删除了paddle::Parameter中未使用的函数。
* 删除了ProtoDataProvider。
* Huber loss同时支持回归和分类。
* 为sequence pooling 层增加`stride`参数。
* v2 API自动使用cudnn batch normalization。
* 可以使用一个固定的参数名共享BN层的参数。
* 2D convolution operation支持variable-dimension input特性。
* 重构cmake中关于CUDA的部分并实现自动检测GPU架构的功能。
* 优化网页导航。
## 错误修复
* 修复ROI pooling的Bug. cc9a761
* 修复当label是dense vector是AUC变成0的问题. #5274
* 修复WarpCTC 层的Bug.
# v0.10.0版本
我们非常高兴发布了PaddlePaddle V0.10.0版,并开发了新的[Python API](http://research.baidu.com/paddlepaddles-new-api-simplifies-deep-learning-programs/)。

@ -1,3 +1,75 @@
# Release v0.11.0
## PaddlePaddle Fluid
- Release 0.11.0 includes a new feature *PaddlePaddle Fluid*. Fluid is
designed to allow users to program like PyTorch and TensorFlow Eager Execution.
In these systems, there is no longer the concept *model* and applications
do not include a symbolic description of a graph of operators nor a sequence
of layers. Instead, applications look exactly like a usual program that
describes a process of training or inference. The difference between
Fluid and PyTorch or Eager Execution is that Fluid doesn't rely on Python's
control-flow, `if-then-else` nor `for`. Instead, Fluid provides its
C++ implementations and their Python binding using the `with` statement. For an example
https://github.com/PaddlePaddle/Paddle/blob/3df78ed2a98d37f7ae6725894cc7514effd5664b/python/paddle/v2/fluid/tests/test_while_op.py#L36-L44
- In 0.11.0, we provides a C++ class `Executor` to run a Fluid program.
Executor works like an interpreter. In future version, we will improve
`Executor` into a debugger like GDB, and we might provide some compilers,
which, for example, takes an application like the above one, and outputs
an equivalent C++ source program, which can be compiled using
[`nvcc`](http://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/index.html)
to generate binaries that use CUDA, or using
[`icc`](https://software.intel.com/en-us/c-compilers) to generate binaries
that make full use of Intel CPUs.
## New Features
* Release `PaddlePaddle Fluid`.
* Add C-API for model inference
* Use fluid API to create a simple GAN demo.
* Add develop guide about performance tunning.
* Add retry when download `paddle.v2.dataset`.
* Linking protobuf-lite not protobuf in C++. Reduce the binary size.
* Feature [Elastic Deep Learning (EDL)](https://github.com/PaddlePaddle/cloud/tree/develop/doc/autoscale/experiment) released.
* A new style cmake functions for Paddle. It is based on Bazel API.
* Automatically download and compile with Intel® [MKLML](https://github.com/01org/mkl-dnn/releases/download/v0.11/mklml_lnx_2018.0.1.20171007.tgz) library as CBLAS when build `WITH_MKL=ON`.
* [Intel® MKL-DNN on PaddlePaddle](https://github.com/PaddlePaddle/Paddle/tree/develop/doc/design/mkldnn):
- Complete 11 MKL-DNN layers: Convolution, Fully connectivity, Pooling, ReLU, Tanh, ELU, Softmax, BatchNorm, AddTo, Concat, LRN.
- Complete 3 MKL-DNN networks: VGG-19, ResNet-50, GoogleNet
- [Benchmark](https://github.com/PaddlePaddle/Paddle/blob/develop/benchmark/IntelOptimizedPaddle.md) on Intel Skylake 6148 CPU: 2~3x training speedup compared with MKLML.
* Add the [`softsign` activation](http://www.paddlepaddle.org/docs/develop/documentation/zh/api/v2/config/activation.html#softsign).
* Add the [dot product layer](http://www.paddlepaddle.org/docs/develop/documentation/zh/api/v2/config/layer.html#dot-prod).
* Add the [L2 distance layer](http://www.paddlepaddle.org/docs/develop/documentation/zh/api/v2/config/layer.html#l2-distance).
* Add the [sub-nested sequence layer](http://www.paddlepaddle.org/docs/develop/documentation/zh/api/v2/config/layer.html#sub-nested-seq).
* Add the [kmax sequence score layer](http://www.paddlepaddle.org/docs/develop/documentation/zh/api/v2/config/layer.html#kmax-sequence-score).
* Add the [sequence slice layer](http://www.paddlepaddle.org/docs/develop/documentation/zh/api/v2/config/layer.html#seq-slice).
* Add the [row convolution layer](http://www.paddlepaddle.org/docs/develop/documentation/zh/api/v2/config/layer.html#row-conv)
* Add mobile friendly webpages.
## Improvements
* Build and install using a single `whl` package.
* [Custom evaluating in V2 API](https://github.com/PaddlePaddle/models/tree/develop/ltr#训练过程中输出自定义评估指标).
* Change `PADDLE_ONLY_CPU` to `PADDLE_WITH_GPU`, since we will support many kinds of devices.
* Remove buggy BarrierStat.
* Clean and remove unused functions in paddle::Parameter.
* Remove ProtoDataProvider.
* Huber loss supports both regression and classification.
* Add the `stride` parameter for sequence pooling layers.
* Enable v2 API use cudnn batch normalization automatically.
* The BN layer's parameter can be shared by a fixed the parameter name.
* Support variable-dimension input feature for 2D convolution operation.
* Refine cmake about CUDA to automatically detect GPU architecture.
* Improved website navigation.
## Bug Fixes
* Fix bug in ROI pooling. cc9a761
* Fix AUC is zero when label is dense vector. #5274
* Fix bug in WarpCTC layer.
# Release v0.10.0
We are glad to release version 0.10.0. In this version, we are happy to release the new

@ -2,27 +2,25 @@
Machine:
- Server
- Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz, 2 Sockets, 20 Cores per socket
- Laptop
- DELL XPS15-9560-R1745: i7-7700HQ 8G 256GSSD
- i5 MacBook Pro (Retina, 13-inch, Early 2015)
- Desktop
- i7-6700k
- Server: Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz, 2 Sockets, 20 Cores per socket
- Laptop: TBD
System: CentOS release 6.3 (Final), Docker 1.12.1.
PaddlePaddle: paddlepaddle/paddle:latest (for MKLML and MKL-DNN), paddlepaddle/paddle:latest-openblas (for OpenBLAS)
- MKL-DNN tag v0.11
- MKLML 2018.0.1.20171007
- OpenBLAS v0.2.20
(TODO: will rerun after 0.11.0)
PaddlePaddle: (TODO: will rerun after 0.11.0)
- paddlepaddle/paddle:latest (for MKLML and MKL-DNN)
- MKL-DNN tag v0.11
- MKLML 2018.0.1.20171007
- paddlepaddle/paddle:latest-openblas (for OpenBLAS)
- OpenBLAS v0.2.20
On each machine, we will test and compare the performance of training on single node using MKL-DNN / MKLML / OpenBLAS respectively.
## Benchmark Model
### Server
#### Training
Test on batch size 64, 128, 256 on Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz
Input image size - 3 * 224 * 224, Time: images/second
@ -35,9 +33,7 @@ Input image size - 3 * 224 * 224, Time: images/second
| MKLML | 12.12 | 13.70 | 16.18 |
| MKL-DNN | 28.46 | 29.83 | 30.44 |
chart on batch size 128
TBD
<img src="figs/vgg-cpu-train.png" width="500">
- ResNet-50
@ -47,9 +43,7 @@ TBD
| MKLML | 32.52 | 31.89 | 33.12 |
| MKL-DNN | 81.69 | 82.35 | 84.08 |
chart on batch size 128
TBD
<img src="figs/resnet-cpu-train.png" width="500">
- GoogLeNet
@ -59,10 +53,35 @@ TBD
| MKLML | 128.46| 137.89| 158.63 |
| MKL-DNN     | 250.46| 264.83| 269.50 |
chart on batch size 128
TBD
<img src="figs/googlenet-cpu-train.png" width="500">
#### Inference
Test on batch size 1, 2, 4, 8, 16 on Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz
- VGG-19
| BatchSize | 1 | 2 | 4 | 8 | 16 |
|-----------|-------|-------|-------|-------|-------|
| OpenBLAS | 1.07 | 1.08 | 1.06 | 0.88 | 0.65 |
| MKLML | 5.58 | 9.80 | 15.15 | 21.21 | 28.67 |
| MKL-DNN | 75.07 | 88.64 | 82.58 | 92.29 | 96.75 |
- ResNet-50
| BatchSize | 1 | 2 | 4 | 8 | 16 |
|-----------|-------|--------|--------|--------|--------|
| OpenBLAS | 3.35 | 3.19 | 3.09 | 2.55 | 1.96 |
| MKLML | 6.33 | 12.02 | 22.88 | 40.53 | 63.09 |
| MKL-DNN | 107.83| 148.84 | 177.78 | 189.35 | 217.69 |
- GoogLeNet
| BatchSize | 1 | 2 | 4 | 8 | 16 |
|-----------|--------|--------|--------|--------|--------|
| OpenBLAS | 12.04 | 11.31 | 10.00 | 9.07 | 4.34 |
| MKLML | 22.74 | 41.56 | 81.22 | 133.47 | 210.53 |
| MKL-DNN | 175.10 | 272.92 | 450.70 | 512.00 | 600.94 |
### Laptop
TBD
### Desktop
TBD

Binary file not shown.

After

Width:  |  Height:  |  Size: 18 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 20 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 18 KiB

@ -4,7 +4,7 @@ function clock_to_seconds() {
hours=`echo $1 | awk -F ':' '{print $1}'`
mins=`echo $1 | awk -F ':' '{print $2}'`
secs=`echo $1 | awk -F ':' '{print $3}'`
echo `bc -l <<< "$secs + $mins * 60 + $hours * 3600"`
echo `awk 'BEGIN{printf "%.2f",('$secs' + '$mins' * 60 + '$hours' * 3600)}'`
}
function infer() {
@ -58,9 +58,9 @@ function infer() {
end=`tail ${log} -n 2 | head -n 1 | awk -F ' ' '{print $2}' | xargs`
start_sec=`clock_to_seconds $start`
end_sec=`clock_to_seconds $end`
fps=`bc <<< "scale = 2; 1280 / ($end_sec - $start_sec)"`
fps=`awk 'BEGIN{printf "%.2f",(1280 / ('$end_sec' - '$start_sec'))}'`
echo "Last 1280 samples start: ${start}(${start_sec} sec), end: ${end}(${end_sec} sec;" >> ${log}
echo "FPS: $fps images/sec" >> ${log}
echo "FPS: $fps images/sec" 2>&1 | tee -a ${log}
}
if [ ! -f "train.list" ]; then

@ -17,7 +17,7 @@ if(WITH_MKLML AND MKLML_INC_DIR AND MKLML_LIB)
set(CBLAS_INC_DIR ${MKLML_INC_DIR})
set(CBLAS_LIBRARIES ${MKLML_LIB})
add_definitions(-DPADDLE_USE_MKLML)
add_definitions(-DPADDLE_WITH_MKLML)
add_definitions(-DLAPACK_FOUND)
message(STATUS "Found cblas and lapack in MKLML "

@ -24,6 +24,11 @@ if(WITH_DOUBLE)
add_definitions(-DPADDLE_TYPE_DOUBLE)
endif(WITH_DOUBLE)
if(WITH_ARM_FP16)
add_definitions(-DPADDLE_ARM_FP16)
add_definitions("-march=armv8.2-a+fp16+simd")
endif(WITH_ARM_FP16)
if(WITH_TESTING)
add_definitions(-DPADDLE_WITH_TESTING)
endif(WITH_TESTING)

@ -33,7 +33,7 @@ ExternalProject_Add(
UPDATE_COMMAND ""
CONFIGURE_COMMAND ./buildconf && ./configure --disable-shared --prefix=${CARES_INSTALL_DIR}
BUILD_IN_SOURCE 1
BUILD_COMMAND make
BUILD_COMMAND make -j8
INSTALL_COMMAND make install
)

@ -67,5 +67,5 @@ ADD_LIBRARY(mkldnn SHARED IMPORTED GLOBAL)
SET_PROPERTY(TARGET mkldnn PROPERTY IMPORTED_LOCATION ${MKLDNN_LIB})
ADD_DEPENDENCIES(mkldnn ${MKLDNN_PROJECT})
MESSAGE(STATUS "MKLDNN library: ${MKLDNN_LIB}")
add_definitions(-DPADDLE_USE_MKLDNN)
add_definitions(-DPADDLE_WITH_MKLDNN)
LIST(APPEND external_project_dependencies mkldnn)

@ -114,11 +114,7 @@ INCLUDE_DIRECTORIES(${CBLAS_INC_DIR})
# linear algebra libraries for cc_library(xxx SRCS xxx.c DEPS cblas)
SET(dummyfile ${CMAKE_CURRENT_BINARY_DIR}/cblas_dummy.c)
FILE(WRITE ${dummyfile} "const char * dummy = \"${dummyfile}\";")
IF("${CBLAS_PROVIDER}" STREQUAL "MKLML")
ADD_LIBRARY(cblas SHARED ${dummyfile})
ELSE()
ADD_LIBRARY(cblas STATIC ${dummyfile})
ENDIF()
ADD_LIBRARY(cblas STATIC ${dummyfile})
TARGET_LINK_LIBRARIES(cblas ${CBLAS_LIBRARIES})
IF(NOT ${CBLAS_FOUND})

@ -7,3 +7,4 @@ API
v2/model_configs.rst
v2/data.rst
v2/run_logic.rst
v2/fluid.rst

@ -99,3 +99,10 @@ STanh
.. automodule:: paddle.v2.activation
:members: STanh
:noindex:
SoftSign
========
.. automodule:: paddle.v2.activation
:members: SoftSign
:noindex:

@ -0,0 +1,18 @@
======================
Fluid
======================
.. toctree::
:maxdepth: 1
fluid/layers.rst
fluid/data_feeder.rst
fluid/executor.rst
fluid/initializer.rst
fluid/evaluator.rst
fluid/nets.rst
fluid/optimizer.rst
fluid/param_attr.rst
fluid/profiler.rst
fluid/regularizer.rst

@ -0,0 +1,9 @@
===========
DataFeeder
===========
DataFeeder
-----------
.. automodule:: paddle.v2.fluid.data_feeder
:members: DataFeeder
:noindex:

@ -0,0 +1,9 @@
===========
Evaluator
===========
Evaluator
-----------
.. automodule:: paddle.v2.fluid.evaluator
:members: Evaluator
:noindex:

@ -0,0 +1,9 @@
===========
Executor
===========
Executor
-----------
.. automodule:: paddle.v2.fluid.executor
:members: Executor
:noindex:

@ -0,0 +1,50 @@
===========
Initializer
===========
Initializer
-----------
.. automodule:: paddle.v2.fluid.initializer
:members: Initializer
:noindex:
ConstantInitializer
-------------------
.. automodule:: paddle.v2.fluid.initializer
:members: ConstantInitializer
:noindex:
UniformInitializer
------------------
.. automodule:: paddle.v2.fluid.initializer
:members: UniformInitializer
:noindex:
NormalInitializer
-----------------
.. automodule:: paddle.v2.fluid.initializer
:members: NormalInitializer
:noindex:
XavierInitializer
-----------------
.. automodule:: paddle.v2.fluid.initializer
:members: XavierInitializer
:noindex:
MSRAInitializer
---------------
.. automodule:: paddle.v2.fluid.initializer
:members: MSRAInitializer
:noindex:

File diff suppressed because it is too large Load Diff

@ -0,0 +1,22 @@
===========
Nets
===========
simple_img_conv_pool
-----------
.. autofunction:: paddle.v2.fluid.nets.simple_img_conv_pool
:noindex:
img_conv_group
-----------
.. autofunction:: paddle.v2.fluid.nets.img_conv_group
:noindex:
sequence_conv_pool
-----------
.. autofunction:: paddle.v2.fluid.nets.sequence_conv_pool
:noindex:

@ -0,0 +1,54 @@
===========
Optimizer
===========
Optimizer
-----------
.. automodule:: paddle.v2.fluid.optimizer
:members: Optimizer
:noindex:
SGDOptimizer
-----------
.. automodule:: paddle.v2.fluid.optimizer
:members: SGDOptimizer
:noindex:
MomentumOptimizer
-----------
.. automodule:: paddle.v2.fluid.optimizer
:members: MomentumOptimizer
:noindex:
AdagradOptimizer
-----------
.. automodule:: paddle.v2.fluid.optimizer
:members: AdagradOptimizer
:noindex:
AdamOptimizer
-----------
.. automodule:: paddle.v2.fluid.optimizer
:members: AdamOptimizer
:noindex:
AdamaxOptimizer
-----------
.. automodule:: paddle.v2.fluid.optimizer
:members: AdamaxOptimizer
:noindex:
DecayedAdagradOptimizer
-----------
.. automodule:: paddle.v2.fluid.optimizer
:members: DecayedAdagradOptimizer
:noindex:

@ -0,0 +1,11 @@
===========
ParamAttr
===========
ParamAttr
-----------
.. automodule:: paddle.v2.fluid.param_attr
:members: ParamAttr
:noindex:

Some files were not shown because too many files have changed in this diff Show More

Loading…
Cancel
Save