Merge branch 'develop' of https://github.com/baidu/Paddle into cn_doc

9 years ago · 2a21d8b39f
parent 24cfc5ab3c 85f0e18460
commit 2a21d8b39f
387 changed files with 11013 additions and 8006 deletions
--- a/.pre-commit-config.yaml
+++ b/.pre-commit-config.yaml
@ -7,18 +7,14 @@
    hooks:
    -   id: yapf
 -   repo: https://github.com/pre-commit/pre-commit-hooks
-    sha: 4ef03c4223ad322c7adaa6c6c0efb26b57df3b71
+    sha: 7539d8bd1a00a3c1bfd34cdb606d3a6372e83469
    hooks:
    -   id: check-added-large-files
    -   id: check-merge-conflict
    -   id: check-symlinks
    -   id: detect-private-key
    -   id: end-of-file-fixer
-# TODO(yuyang): trailing whitespace has some bugs on markdown 
+-   repo: https://github.com/PaddlePaddle/clang-format-pre-commit-hook.git
-# files now, please not add it to pre-commit hook now
+    sha: 28c0ea8a67a3e2dbbf4822ef44e85b63a0080a29
-#    -   id: trailing-whitespace
+    hooks:
-#
+    -   id: clang-formater
 # TODO(yuyang): debug-statements not fit for Paddle, because
 # not all of our python code is runnable. Some are used for 
 # documenation
 #    -   id: debug-statements
--- a/README.md
+++ b/README.md
@ -1,11 +1,11 @@
 # PaddlePaddle
-[![Build Status](https://travis-ci.org/PaddlePaddle/Paddle.svg?branch=develop)](https://travis-ci.org/baidu/Paddle)
+[![Build Status](https://travis-ci.org/PaddlePaddle/Paddle.svg?branch=develop)](https://travis-ci.org/PaddlePaddle/Paddle)
 [![Documentation Status](https://img.shields.io/badge/docs-latest-brightgreen.svg?style=flat)](http://www.paddlepaddle.org/)
 [![Documentation Status](https://img.shields.io/badge/中文文档-最新-brightgreen.svg)](http://www.paddlepaddle.org/cn/index.html)
-[![Coverage Status](https://coveralls.io/repos/github/PaddlePaddle/Paddle/badge.svg?branch=develop)](https://coveralls.io/github/baidu/Paddle?branch=develop)
+[![Coverage Status](https://coveralls.io/repos/github/PaddlePaddle/Paddle/badge.svg?branch=develop)](https://coveralls.io/github/PaddlePaddle/Paddle?branch=develop)
-[![Release](https://img.shields.io/github/release/baidu/Paddle.svg?colorB=fedcba)](https://github.com/baidu/Paddle/releases)
+[![Release](https://img.shields.io/github/release/PaddlePaddle/Paddle.svg)](https://github.com/PaddlePaddle/Paddle/releases)
 [![License](https://img.shields.io/badge/license-Apache%202-blue.svg)](LICENSE)
@ -17,7 +17,7 @@ developed by Baidu scientists and engineers for the purpose of applying deep
 learning to many products at Baidu.
 Our vision is to enable deep learning for everyone via PaddlePaddle.
-Please refer to our [release announcement](https://github.com/baidu/Paddle/releases) to track the latest feature of PaddlePaddle.
+Please refer to our [release announcement](https://github.com/PaddlePaddle/Paddle/releases) to track the latest feature of PaddlePaddle.
 ## Features
@ -92,7 +92,7 @@ Both [English Docs](http://paddlepaddle.org/doc/) and [Chinese Docs](http://padd
 ## Ask Questions
-You are welcome to submit questions and bug reports as [Github Issues](https://github.com/baidu/paddle/issues).
+You are welcome to submit questions and bug reports as [Github Issues](https://github.com/PaddlePaddle/Paddle/issues).
 ## Copyright and License
 PaddlePaddle is provided under the [Apache-2.0 license](LICENSE).
--- a/doc/build/build_from_source.md
+++ b/doc/build/build_from_source.md
@ -6,10 +6,10 @@ Installing from Sources
 * [3. Build on Ubuntu](#ubuntu)
 ## <span id="download">Download and Setup</span> 
-You can download PaddlePaddle from the [github source](https://github.com/gangliao/Paddle).
+You can download PaddlePaddle from the [github source](https://github.com/PaddlePaddle/Paddle).
 ```bash
-git clone https://github.com/baidu/Paddle paddle
+git clone https://github.com/PaddlePaddle/Paddle paddle
 cd paddle
 ```
--- a/doc_cn/build_and_install/cmake/cblas_settings.csv
+++ b/doc_cn/build_and_install/cmake/cblas_settings.csv
@ -1,4 +1,5 @@
-MKL_ROOT,mkl的路径，在${MKL_ROOT}/include下需要包含mkl.h，在${MKL_ROOT}/lib目录下需要包含 mkl_core，mkl_sequential和mkl_intel_lp64三个库
+编译选项,描述,注意
-ATLAS_ROOT,ATLAS库的路径，在${ATLAS_ROOT}/include下需要包含cblas.h，而在${ATLAS_ROOT}/lib下需要包含cblas和atlas两个库
+MKL_ROOT,MKL的路径,${MKL_ROOT}/include下需要包含mkl.h，${MKL_ROOT}/lib目录下需要包含mkl_core，mkl_sequential和mkl_intel_lp64三个库。
-OPENBLAS_ROOT,在${OPENBLAS_ROOT}/include下需要包含cblas.h，而在${OPENBLAS_ROOT}/lib下需要包含openblas库
+ATLAS_ROOT,ATLAS的路径,${ATLAS_ROOT}/include下需要包含cblas.h，${ATLAS_ROOT}/lib下需要包含cblas和atlas两个库。
-REFERENCE_CBLAS_ROOT,在${REFERENCE_CBLAS_ROOT}/include下需要包含cblas.h，在${REFERENCE_CBLAS_ROOT}/lib下需要包含cblas库
+OPENBLAS_ROOT,OpenBLAS的路径,${OPENBLAS_ROOT}/include下需要包含cblas.h，${OPENBLAS_ROOT}/lib下需要包含openblas库。
 REFERENCE_CBLAS_ROOT,REFERENCE BLAS的路径,${REFERENCE_CBLAS_ROOT}/include下需要包含cblas.h，${REFERENCE_CBLAS_ROOT}/lib下需要包含cblas库。
--- a/doc_cn/build_and_install/cmake/compile_options.csv
+++ b/doc_cn/build_and_install/cmake/compile_options.csv
@ -1,15 +1,14 @@
 选项,说明,默认值
-WITH_GPU,是否编译GPU支持。,是否寻找到cuda工具链
+WITH_GPU,是否支持GPU。,取决于是否寻找到CUDA工具链
 WITH_DOUBLE,是否使用双精度浮点数。,否
-WITH_DSO,是否使用运行时动态加载cuda动态库，而非静态加载cuda动态库。,是
+WITH_DSO,是否运行时动态加载CUDA动态库，而非静态加载CUDA动态库。,是
-WITH_AVX,是否编译含有AVX指令集的PaddlePaddle二进制,是
+WITH_AVX,是否编译含有AVX指令集的PaddlePaddle二进制文件,是
-WITH_PYTHON,是否内嵌python解释器。可以方便嵌入式工作。,是
+WITH_PYTHON,是否内嵌PYTHON解释器。方便今后的嵌入式移植工作。,是
 WITH_STYLE_CHECK,是否编译时进行代码风格检查,是
-WITH_RDMA,是否开启RDMA支持,否
+WITH_RDMA,是否开启RDMA,否
-WITH_GLOG,是否使用GLOG，如果不使用则会使用一个简化版的日志实现。可以方便嵌入式工作。,取决于是否寻找到GLOG
+WITH_GLOG,是否开启GLOG。如果不开启，则会使用一个简化版的日志，同时方便今后的嵌入式移植工作。,取决于是否寻找到GLOG
-WITH_GFLAGS,是否使用GFLAGS，如果不使用则会使用一个简化版的命令行参数解析。可以方便嵌入式工作。,取决于是否寻找到GFLAGS
+WITH_GFLAGS,是否使用GFLAGS。如果不开启，则会使用一个简化版的命令行参数解析器，同时方便今后的嵌入式移植工作。,取决于是否寻找到GFLAGS
-WITH_TIMER,是否开启计时功能开启计时功能会导致运行略慢，打印的日志变多。但是方便调试和benchmark,否
+WITH_TIMER,是否开启计时功能。如果开启会导致运行略慢，打印的日志变多，但是方便调试和测Benchmark,否
-WITH_TESTING,是否开启单元测试,取决于是否寻找到gtest
+WITH_TESTING,是否开启单元测试,取决于是否寻找到GTEST
-WITH_DOC,是否编译英文文档,否
+WITH_DOC,是否编译中英文文档,否
-WITH_DOC_CN,是否编译中文文档,否
+WITH_SWIG_PY,是否编译PYTHON的SWIG接口，该接口可用于预测和定制化训练,取决于是否寻找到SWIG
 WITH_SWIG_PY,是否编译python的swig接口，python的swig接口可以方便进行预测和定制化训练,取决于是否找到swig
--- a/doc_cn/build_and_install/cmake/compile_options.rst
+++ b/doc_cn/build_and_install/cmake/compile_options.rst
@ -1,62 +1,43 @@
-设置PaddlePaddle的编译选项
+PaddlePaddle的编译选项
-==========================
+======================
-PaddlePaddle的编译选项可以在调用cmake的时候设置。cmake是一个跨平台的编译脚本，调用
+PaddlePaddle的编译选项，包括生成CPU/GPU二进制文件、链接何种BLAS库等。用户可在调用cmake的时候设置它们，详细的cmake使用方法可以参考 `官方文档 <https://cmake.org/cmake-tutorial>`_ 。
 cmake可以将cmake项目文件，生成各个平台的makefile。详细的cmake使用方法可以参考
 `cmake的官方文档 <https://cmake.org/cmake-tutorial>`_ 。
-PaddlePaddle的编译选项是可以控制PaddlePaddle生成CPU/GPU版本二进制，链接何种blas等等。所有的
+Bool型的编译选项
-编译选项列表如下
+----------------
 用户可在cmake的命令行中，通过使用 ``-D`` 命令设置该类编译选项，例如
-PaddlePaddle的编译选项
+..  code-block:: bash
 ----------------------
-bool型的编译选项
+    cmake .. -DWITH_GPU=OFF
 ++++++++++++++++
 设置下列编译选项时，可以在cmake的命令行设置。使用 -D命令即可。例如 
 :code:`cmake -D WITH_GPU=OFF`
-..  csv-table:: PaddlePaddle的bool型编译选项
+..  csv-table:: Bool型的编译选项
    :widths: 1, 7, 2
    :file: compile_options.csv
-blas相关的编译选项
+BLAS/CUDA/Cudnn的编译选项
-++++++++++++++++++
+--------------------------
-
+BLAS
-PaddlePaddle可以使用 `MKL <https://software.intel.com/en-us/intel-mkl>`_ ，
+++++
 `Atlas <http://math-atlas.sourceforge.net/>`_ ,
 `OpenBlas <http://www.openblas.net/>`_ 和 
 `refference Blas <http://www.netlib.org/blas/>`_ ，任意一种cblas实现。
 通过编译时指定路径来实现引用各种blas。
-cmake编译时会首先在系统路径(/usr/lib\:/usr/local/lib)中寻找这些blas的实现。同时
+PaddlePaddle支持以下任意一种BLAS库：`MKL <https://software.intel.com/en-us/intel-mkl>`_ ，`ATLAS <http://math-atlas.sourceforge.net/>`_ ，`OpenBlAS <http://www.openblas.net/>`_ 和 `REFERENCE BLAS <http://www.netlib.org/blas/>`_ 。
 也会读取相关路径变量来进行搜索。路径变量为\:
-
+..  csv-table:: BLAS路径相关的编译选项
-..  csv-table:: PaddlePaddle的cblas编译选项
+    :widths: 1, 2, 7
    :widths: 1, 9
    :header: "编译选项", "描述"
    :file: cblas_settings.csv
-这些变量均可以使用 -D命令指定。例如 :code:`cmake -D MKL_ROOT=/opt/mkl/`。这些变
+CUDA/Cudnn
-量也可以通过调用cmake命令前通过环境变量指定。例如
+++++++++++
 ..  code-block:: bash
-    export MKL_ROOT=/opt/mkl
+PaddlePaddle可以使用cudnn v2之后的任何一个版本来编译运行，但尽量请保持编译和运行使用的cudnn是同一个版本。 我们推荐使用最新版本的cudnn v5.1。
    cmake
-需要注意的是，这些变量只在第一次cmake的时候有效。如果在第一次cmake之后想要重新设
+编译选项的设置
-置这些变量，推荐清理( :code:`rm -rf` )掉编译目录后，再指定。
++++++++++++++
-cuda/cudnn相关的编译选项
+PaddePaddle通过编译时指定路径来实现引用各种BLAS/CUDA/Cudnn库。cmake编译时，首先在系统路径(/usr/lib\:/usr/local/lib)中搜索这几个库，同时也会读取相关路径变量来进行搜索。 通过使用 ``-D`` 命令可以设置，例如 
 ++++++++++++++++++++++++
-PaddlePaddle可以使用 cudnn v2之后的任何一个cudnn版本来编译运行。但需要注意的是编译和
+..  code-block:: bash
 运行使用的cudnn尽量是同一个版本。推荐使用最新版本的cudnn v5.1。
-在cmake配置时可以使用 :code:`CUDNN_ROOT` 来配置CUDNN的安装路径。使用的命令也是 
+    cmake .. -DMKL_ROOT=/opt/mkl/ -DCUDNN_ROOT=/opt/cudnnv5
 -D，例如 :code:`cmake -D CUDNN_ROOT=/opt/cudnnv5` 。
-需要注意的是，这些变量只在第一次cmake的时候有效。如果在第一次cmake之后想要重新设
+注意：这几个编译选项的设置，只在第一次cmake的时候有效。如果之后想要重新设置，推荐清理整个编译目录（``rm -rf``）后，再指定。
 置这些变量，推荐清理( :code:`rm -rf` )掉编译目录后，再指定。
--- a/doc_cn/howto/how_to_write_docs/index.rst
+++ b/doc_cn/howto/how_to_write_docs/index.rst
@ -2,32 +2,19 @@
 如何贡献/修改PaddlePaddle的文档
 ###############################
-PaddlePaddle的文档使用 `cmake`_ 驱动 `sphinx`_ 生成。公有两个文档，:code:`doc` 和 :code:`doc_cn` 。这两者会在 `cmake`_ 中进行编译，生成后的文档会存储在服务器的 :code:`doc` 和 :code:`doc_cn` 两个目录下。
+PaddlePaddle的文档包括英文文档 ``doc`` 和中文文档 ``doc_cn`` 两个部分。文档都是通过 `cmake`_ 驱动 `sphinx`_ 编译生成，生成后的文档分别存储在编译目录的 ``doc`` 和 ``doc_cn`` 两个子目录下。
 下面分几个部分介绍一下PaddlePaddle文档的贡献方法。
 如何书写PaddlePaddle的文档
 ==========================
 TBD
 如何构建PaddlePaddle的文档
 ==========================
-构建PaddlePaddle文档，需要使用构建Paddle的全部环境。准备这个环境相对来说比较复杂，所以本文档提供两种方式构建PaddlePaddle的文档，即
+PaddlePaddle的文档构建有直接构建和基于Docker构建两种方式。构建PaddlePaddle文档需要准备的环境相对较复杂，所以我们推荐使用基于Docker来构建PaddlePaddle的文档。
 * 使用Docker构建PaddlePaddle的文档
 * 直接构建PaddlePaddle的文档。
 并且，我们推荐使用Docker来构建PaddlePaddle的文档。
 使用Docker构建PaddlePaddle的文档
 --------------------------------
-使用Docker构建PaddlePaddle的文档，首先要求在系统里安装好Docker工具包。安装Docker请参考 `Docker的官网 <https://docs.docker.com/>`_ 。
+使用Docker构建PaddlePaddle的文档，需要在系统里先安装好Docker工具包。Docker安装请参考 `Docker的官网 <https://docs.docker.com/>`_ 。安装好Docker之后可以使用源码目录下的脚本构建文档，即
 安装好Docker之后可以使用源码目录下的脚本构建文档，即
 ..	code-block:: bash
@ -35,10 +22,10 @@ TBD
 	cd paddle/scripts/tools/build_docs
 	bash build_docs.sh
-执行完这个脚本后，该目录下会生成两个目录，分别是\:
+编译完成后，该目录下会生成如下两个子目录\:
-* doc 目录，英文文档地址
+* doc 英文文档目录
-* doc_cn 目录，中文文档地址
+* doc_cn 中文文档目录
 打开浏览器访问对应目录下的index.html即可访问本地文档。
@ -52,6 +39,10 @@ TBD
 TBD
 如何书写PaddlePaddle的文档
 ==========================
 TBD
 如何更新www.paddlepaddle.org文档
 ================================
--- a/paddle/api/Arguments.cpp
+++ b/paddle/api/Arguments.cpp
@ -12,7 +12,6 @@ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 See the License for the specific language governing permissions and
 limitations under the License. */
 #include "PaddleAPI.h"
 #include "PaddleAPIPrivate.h"
--- a/paddle/api/ConfigParser.cpp
+++ b/paddle/api/ConfigParser.cpp
@ -12,7 +12,6 @@ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 See the License for the specific language governing permissions and
 limitations under the License. */
 #include "PaddleAPI.h"
 #include "PaddleAPIPrivate.h"
 #include "paddle/trainer/Trainer.h"
@ -44,8 +43,7 @@ TrainerConfig* TrainerConfig::createFromTrainerConfigFile(
  return retv;
 }
-TrainerConfig* TrainerConfig::createFromProtoString(
+TrainerConfig* TrainerConfig::createFromProtoString(const std::string& str) {
    const std::string& str) {
  auto retv = new TrainerConfig();
  paddle::TrainerConfig trainerConfigProto;
  auto conf = std::make_shared<paddle::TrainerConfigHelper>(trainerConfigProto);
--- a/paddle/api/GradientMachine.cpp
+++ b/paddle/api/GradientMachine.cpp
@ -12,7 +12,6 @@ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 See the License for the specific language governing permissions and
 limitations under the License. */
 #include "PaddleAPI.h"
 #include "PaddleAPIPrivate.h"
@ -27,7 +26,8 @@ GradientMachine::GradientMachine() : m(new GradientMachinePrivate()) {}
 GradientMachine::~GradientMachine() { delete m; }
 GradientMachine* GradientMachine::createFromPaddleModelPtr(
-    const void* confPtr, GradientMatchineCreateMode mode,
+    const void* confPtr,
    GradientMatchineCreateMode mode,
    const std::vector<int>& types) {
  auto& conf = *(const paddle::ModelConfig*)(confPtr);
  std::vector<ParameterType> realTypes;
@ -44,7 +44,8 @@ GradientMachine* GradientMachine::createFromPaddleModelPtr(
 }
 GradientMachine* GradientMachine::createByConfigProtoStr(
-    const std::string& protoStr, GradientMatchineCreateMode mode,
+    const std::string& protoStr,
    GradientMatchineCreateMode mode,
    const std::vector<int>& types) {
  paddle::ModelConfig conf;
  conf.ParseFromString(protoStr);
@ -56,13 +57,15 @@ GradientMachine* GradientMachine::createByConfigProtoStr(
 }
 GradientMachine* GradientMachine::createByModelConfig(
-    ModelConfig* conf, GradientMatchineCreateMode mode,
+    ModelConfig* conf,
    GradientMatchineCreateMode mode,
    const std::vector<int>& types) {
  auto confPtr = &conf->m->conf->getModelConfig();
  return GradientMachine::createFromPaddleModelPtr(confPtr, mode, types);
 }
-void GradientMachine::forward(const Arguments& inArgs, Arguments* outArgs,
+void GradientMachine::forward(const Arguments& inArgs,
                              Arguments* outArgs,
                              PassType passType) {
  auto& in =
      m->cast<std::vector<paddle::Argument>>(inArgs.getInternalArgumentsPtr());
@ -99,7 +102,8 @@ void GradientMachine::backward(const UpdateCallback& callback) {
 }
 void GradientMachine::forwardBackward(const Arguments& inArgs,
-                                      Arguments* outArgs, PassType passType,
+                                      Arguments* outArgs,
                                      PassType passType,
                                      const UpdateCallback& callback) {
  auto& in =
      m->cast<std::vector<paddle::Argument>>(inArgs.getInternalArgumentsPtr());
@ -140,8 +144,11 @@ Matrix* GradientMachine::getLayerOutput(const std::string& layerName) const
 }
 SequenceGenerator* GradientMachine::asSequenceGenerator(
-    const std::vector<std::string>& dict, size_t begin_id, size_t end_id,
+    const std::vector<std::string>& dict,
-    size_t max_length, size_t beam_size) {
+    size_t begin_id,
    size_t end_id,
    size_t max_length,
    size_t beam_size) {
  SequenceGenerator* r =
      SequenceGenerator::createByGradientMachineSharedPtr(&m->machine);
  r->setDict(dict);
--- a/paddle/api/Internal.h
+++ b/paddle/api/Internal.h
@ -12,7 +12,6 @@ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 See the License for the specific language governing permissions and
 limitations under the License. */
 #pragma once
 #include "PaddleAPI.h"
@ -23,7 +22,8 @@ limitations under the License. */
 template <typename T1, typename T2>
 void staticCastVector(std::vector<T2>* dest, const std::vector<T1>& src) {
  dest->resize(src.size());
-  std::transform(src.begin(), src.end(), dest->begin(), [](T1 t){
+  std::transform(src.begin(),
-    return static_cast<T2>(t);
+                 src.end(),
-  });
+                 dest->begin(),
                 [](T1 t) { return static_cast<T2>(t); });
 }
--- a/paddle/api/Matrix.cpp
+++ b/paddle/api/Matrix.cpp
@ -12,7 +12,6 @@ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 See the License for the specific language governing permissions and
 limitations under the License. */
 #include "PaddleAPI.h"
 #include "paddle/math/Matrix.h"
 #include "paddle/math/SparseMatrix.h"
@ -44,17 +43,21 @@ Matrix* Matrix::createZero(size_t height, size_t width, bool useGpu) {
  return m;
 }
-Matrix* Matrix::createDense(const std::vector<float>& data, size_t height,
+Matrix* Matrix::createDense(const std::vector<float>& data,
-                            size_t width, bool useGpu) {
+                            size_t height,
                            size_t width,
                            bool useGpu) {
  auto m = new Matrix();
  m->m->mat = paddle::Matrix::create(height, width, useGpu);
  m->m->mat->copyFrom(data.data(), data.size());
  return m;
 }
-Matrix* Matrix::createDenseFromNumpy(float* data, int dim1, int dim2,
+Matrix* Matrix::createDenseFromNumpy(float* data,
-                                      bool copy, bool useGpu)
+                                     int dim1,
-                                     throw (UnsupportError) {
+                                     int dim2,
                                     bool copy,
                                     bool useGpu) throw(UnsupportError) {
  if (useGpu) {
    /// Gpu mode only supports copy=True
    if (!copy) {
@ -66,7 +69,9 @@ Matrix* Matrix::createDenseFromNumpy(float* data, int dim1, int dim2,
  }
 }
-Matrix* Matrix::createCpuDenseFromNumpy(float* data, int dim1, int dim2,
+Matrix* Matrix::createCpuDenseFromNumpy(float* data,
                                        int dim1,
                                        int dim2,
                                        bool copy) {
  auto m = new Matrix();
  if (copy) {
@ -85,12 +90,20 @@ Matrix* Matrix::createGpuDenseFromNumpy(float* data, int dim1, int dim2) {
  return m;
 }
-Matrix* Matrix::createSparse(size_t height, size_t width, size_t nnz,
+Matrix* Matrix::createSparse(size_t height,
-                             bool isNonVal, bool isTrans, bool useGpu) {
+                             size_t width,
                             size_t nnz,
                             bool isNonVal,
                             bool isTrans,
                             bool useGpu) {
  auto m = new Matrix();
  m->m->mat = paddle::Matrix::createSparseMatrix(
-      height, width, nnz, isNonVal ? paddle::NO_VALUE : paddle::FLOAT_VALUE,
+      height,
-      isTrans, useGpu);
+      width,
      nnz,
      isNonVal ? paddle::NO_VALUE : paddle::FLOAT_VALUE,
      isTrans,
      useGpu);
  return m;
 }
@ -221,7 +234,8 @@ FloatArray Matrix::getData() const {
 }
 void Matrix::sparseCopyFrom(
-    const std::vector<int>& rows, const std::vector<int>& cols,
+    const std::vector<int>& rows,
    const std::vector<int>& cols,
    const std::vector<float>& vals) throw(UnsupportError) {
  auto cpuSparseMat =
      std::dynamic_pointer_cast<paddle::CpuSparseMatrix>(m->mat);
@ -240,7 +254,8 @@ void Matrix::sparseCopyFrom(
 void* Matrix::getSharedPtr() const { return &m->mat; }
-void Matrix::toNumpyMatInplace(float** view_data, int* dim1,
+void Matrix::toNumpyMatInplace(float** view_data,
                               int* dim1,
                               int* dim2) throw(UnsupportError) {
  auto cpuMat = std::dynamic_pointer_cast<paddle::CpuMatrix>(m->mat);
  if (cpuMat) {
@ -251,7 +266,8 @@ void Matrix::toNumpyMatInplace(float** view_data, int* dim1,
    throw UnsupportError();
  }
 }
-void Matrix::copyToNumpyMat(float** view_m_data, int* dim1,
+void Matrix::copyToNumpyMat(float** view_m_data,
                            int* dim1,
                            int* dim2) throw(UnsupportError) {
  static_assert(sizeof(paddle::real) == sizeof(float),
                "Currently PaddleAPI only support for single "
@ -269,8 +285,8 @@ void Matrix::copyToNumpyMat(float** view_m_data, int* dim1,
    } else if (auto gpuMat = dynamic_cast<paddle::GpuMatrix*>(m->mat.get())) {
      auto src = gpuMat->getData();
      auto dest = *view_m_data;
-      hl_memcpy_device2host(dest, src,
+      hl_memcpy_device2host(
-                            sizeof(paddle::real) * (*dim1) * (*dim2));
+          dest, src, sizeof(paddle::real) * (*dim1) * (*dim2));
    } else {
      LOG(WARNING) << "Unexpected Situation";
      throw UnsupportError();
@ -278,7 +294,8 @@ void Matrix::copyToNumpyMat(float** view_m_data, int* dim1,
  }
 }
-void Matrix::copyFromNumpyMat(float* data, int dim1,
+void Matrix::copyFromNumpyMat(float* data,
                              int dim1,
                              int dim2) throw(UnsupportError, RangeError) {
  if (isSparse()) {
    throw UnsupportError();
--- a/paddle/api/PaddleAPI.h
+++ b/paddle/api/PaddleAPI.h
@ -12,7 +12,6 @@ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 See the License for the specific language governing permissions and
 limitations under the License. */
 #pragma once
 #include <stddef.h>
@ -112,7 +111,8 @@ public:
  /**
   * Create A Matrix with height,width, which is filled by zero.
   */
-  static Matrix* createZero(size_t height, size_t width,
+  static Matrix* createZero(size_t height,
                            size_t width,
                            bool useGpu = isUsingGpu());
  /**
@ -124,8 +124,11 @@ public:
   *
   * @note the default sparse type is SPARSE_CSR.
   */
-  static Matrix* createSparse(size_t height, size_t width, size_t nnz,
+  static Matrix* createSparse(size_t height,
-                              bool isNonVal = true, bool trans = false,
+                              size_t width,
                              size_t nnz,
                              bool isNonVal = true,
                              bool trans = false,
                              bool useGpu = isUsingGpu());
  /**
@ -134,13 +137,17 @@ public:
   * @param data  list of float should be passed in python.
   * @note        the value will be copy into a new matrix.
   */
-  static Matrix* createDense(const std::vector<float>& data, size_t height,
+  static Matrix* createDense(const std::vector<float>& data,
-                             size_t width, bool useGpu = isUsingGpu());
+                             size_t height,
                             size_t width,
                             bool useGpu = isUsingGpu());
-  static Matrix* createDenseFromNumpy(float* data, int dim1, int dim2,
+  static Matrix* createDenseFromNumpy(
      float* data,
      int dim1,
      int dim2,
      bool copy = true,
-                                      bool useGpu = isUsingGpu())
+      bool useGpu = isUsingGpu()) throw(UnsupportError);
                                      throw (UnsupportError);
  /**
   *  Create Cpu Dense Matrix from numpy matrix, dtype=float32
@ -151,7 +158,9 @@ public:
   *  @param copy  true if copy into a new matrix, false will create
   *               matrix inplace.
   */
-  static Matrix* createCpuDenseFromNumpy(float* data, int dim1, int dim2,
+  static Matrix* createCpuDenseFromNumpy(float* data,
                                         int dim1,
                                         int dim2,
                                         bool copy = false);
  /// Create Gpu Dense Matrix from numpy matrix, dtype=float32
@ -171,11 +180,13 @@ public:
   * numpy_mat = m.toNumpyMat()
   * @endcode
   */
-  void toNumpyMatInplace(float** view_data, int* dim1,
+  void toNumpyMatInplace(float** view_data,
                         int* dim1,
                         int* dim2) throw(UnsupportError);
  /// Copy To numpy mat.
-  void copyToNumpyMat(float** view_m_data, int* dim1,
+  void copyToNumpyMat(float** view_m_data,
                      int* dim1,
                      int* dim2) throw(UnsupportError);
  /// Copy From Numpy Mat
@ -248,15 +259,18 @@ public:
  static Vector* create(const std::vector<float>& data,
                        bool useGpu = isUsingGpu());
-  static Vector* createVectorFromNumpy(float* data, int dim, bool copy = true,
+  static Vector* createVectorFromNumpy(
-                                       bool useGpu = isUsingGpu())
+      float* data,
-                                       throw (UnsupportError);
+      int dim,
      bool copy = true,
      bool useGpu = isUsingGpu()) throw(UnsupportError);
  /**
   * Create Cpu Vector from numpy array, which dtype=float32
   *
   * If copy is false, it will create vector inplace.
   */
-  static Vector* createCpuVectorFromNumpy(float* data, int dim,
+  static Vector* createCpuVectorFromNumpy(float* data,
                                          int dim,
                                          bool copy = false);
  /// Create Gpu Vector from numpy array, which dtype=float32
@ -312,16 +326,19 @@ public:
  static IVector* create(const std::vector<int>& data,
                         bool useGpu = isUsingGpu());
-  static IVector* createVectorFromNumpy(int* data, int dim, bool copy = true,
+  static IVector* createVectorFromNumpy(
-                                        bool useGpu = isUsingGpu())
+      int* data,
-                                        throw (UnsupportError);
+      int dim,
      bool copy = true,
      bool useGpu = isUsingGpu()) throw(UnsupportError);
  /**
   * Create Cpu IVector from numpy array, which dtype=int32
   *
   * If copy is false, it will create vector inplace
   */
-  static IVector* createCpuVectorFromNumpy(int* data, int dim,
+  static IVector* createCpuVectorFromNumpy(int* data,
                                           int dim,
                                           bool copy = false);
  /**
   * Create Gpu IVector from numpy array, which dtype=int32
@ -605,7 +622,8 @@ class ParameterTraverseCallback {
 public:
  ~ParameterTraverseCallback();
-  void apply(const std::vector<Vector*>& vecs, const ParameterConfig& config,
+  void apply(const std::vector<Vector*>& vecs,
             const ParameterConfig& config,
             size_t sparseId);
 private:
@ -638,7 +656,8 @@ public:
  void finishBatch();
-  void update(const std::vector<Vector*>& vecs, const ParameterConfig& conf,
+  void update(const std::vector<Vector*>& vecs,
              const ParameterConfig& conf,
              size_t sparseId = NO_SPARSE_ID);
  std::vector<int> getParameterTypes() const;
@ -678,7 +697,8 @@ public:
   * model config by TrainerConfig
   */
  static GradientMachine* createByModelConfig(
-      ModelConfig* conf, GradientMatchineCreateMode mode = CREATE_MODE_NORMAL,
+      ModelConfig* conf,
      GradientMatchineCreateMode mode = CREATE_MODE_NORMAL,
      const std::vector<int>& parameterTypes = defaultParamTypes);
  /**
@ -701,7 +721,8 @@ public:
  /**
   * Combine forward/backward
   */
-  void forwardBackward(const Arguments& inArgs, Arguments* outArgs,
+  void forwardBackward(const Arguments& inArgs,
                       Arguments* outArgs,
                       PassType passType,
                       const UpdateCallback& callback = UpdateCallback());
@ -722,14 +743,17 @@ public:
   */
  SequenceGenerator* asSequenceGenerator(
      const std::vector<std::string>& dict = std::vector<std::string>(),
-      size_t begin_id = 0UL, size_t end_id = 0UL, size_t max_length = 100UL,
+      size_t begin_id = 0UL,
      size_t end_id = 0UL,
      size_t max_length = 100UL,
      size_t beam_size = -1UL);
 private:
  GradientMachinePrivate* m;
  static GradientMachine* createFromPaddleModelPtr(
-      const void* confPtr, GradientMatchineCreateMode mode,
+      const void* confPtr,
      GradientMatchineCreateMode mode,
      const std::vector<int>& types);
  // Not to use c++ 11 init-list, so we use static var as function default arg.
@ -751,8 +775,8 @@ public:
  /// Create A Trainer By TrainerConfig. using paddle command line.
  static Trainer* createByCommandLine() throw(IOError);
-  static Trainer* create(TrainerConfig* optConfig, GradientMachine* gm)
+  static Trainer* create(TrainerConfig* optConfig,
-      throw(IOError);
+                         GradientMachine* gm) throw(IOError);
  /// Start training
  void startTrain();
--- a/paddle/api/Parameter.cpp
+++ b/paddle/api/Parameter.cpp
@ -12,7 +12,6 @@ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 See the License for the specific language governing permissions and
 limitations under the License. */
 #include "PaddleAPI.h"
 #include "paddle/parameter/Parameter.h"
--- a/paddle/api/ParameterOptimizer.cpp
+++ b/paddle/api/ParameterOptimizer.cpp
@ -12,7 +12,6 @@ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 See the License for the specific language governing permissions and
 limitations under the License. */
 #include "PaddleAPI.h"
 #include "PaddleAPIPrivate.h"
 #include "paddle/parameter/ParameterOptimizer.h"
@ -32,11 +31,15 @@ struct ParameterTraverseCallbackPrivate {
      const paddle::ParameterOptimizer::TraverseCallback& callback)
      : callback(callback) {}
-  void apply(const std::vector<Vector*>& vecs, const ParameterConfig& conf,
+  void apply(const std::vector<Vector*>& vecs,
             const ParameterConfig& conf,
             size_t sparseId) {
    std::vector<paddle::VectorPtr> real_vecs;
    real_vecs.resize(vecs.size());
-    std::transform(vecs.begin(), vecs.end(), real_vecs.begin(), [](Vector* v) {
+    std::transform(vecs.begin(),
                   vecs.end(),
                   real_vecs.begin(),
                   [](Vector* v) {
                     if (v) {
                       return *(paddle::VectorPtr*)(v->getSharedPtr());
                     } else {
@ -86,9 +89,11 @@ void ParameterOptimizer::startBatch(size_t numSamplesProcessed) {
 void ParameterOptimizer::finishBatch() { m->optimizer->finishBatch(); }
 void ParameterOptimizer::update(const std::vector<Vector*>& vecs,
-                                const ParameterConfig& conf, size_t sparseId) {
+                                const ParameterConfig& conf,
-  ParameterTraverseCallbackPrivate invoker([&](
+                                size_t sparseId) {
-      const paddle::VectorPtr _vecs[], const paddle::ParameterConfig& config,
+  ParameterTraverseCallbackPrivate invoker(
      [&](const paddle::VectorPtr _vecs[],
          const paddle::ParameterConfig& config,
          size_t sid = -1UL) { m->optimizer->update(_vecs, config, sid); });
  invoker.apply(vecs, conf, sparseId);
 }
@ -116,8 +121,9 @@ void ParameterTraverseCallback::apply(const std::vector<Vector*>& vecs,
 ParameterTraverseCallback* ParameterOptimizer::needSpecialTraversal(
    const ParameterConfig& config) const {
-  auto& param_config = *(paddle::ParameterConfig*)const_cast<ParameterConfig&>(
+  auto& param_config =
-                            config).getRawPtr();
+      *(paddle::ParameterConfig*)const_cast<ParameterConfig&>(config)
           .getRawPtr();
  auto callback = m->optimizer->needSpecialTraversal(param_config);
  if (callback) {
    auto retCallback = new ParameterTraverseCallback();
--- a/paddle/api/SequenceGenerator.cpp
+++ b/paddle/api/SequenceGenerator.cpp
@ -12,7 +12,6 @@ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 See the License for the specific language governing permissions and
 limitations under the License. */
 #include "PaddleAPI.h"
 #include "paddle/gserver/gradientmachines/GradientMachine.h"
 #include "paddle/parameter/Argument.h"
@ -42,8 +41,10 @@ struct Path {
 // position
 static void findNBest(paddle::GradientMachine* gradMachine,
                      std::vector<paddle::Argument>& inArgs,
-                      std::vector<Path>& finalPaths, size_t bos_id,
+                      std::vector<Path>& finalPaths,
-                      size_t eos_id, size_t max_length) {
+                      size_t bos_id,
                      size_t eos_id,
                      size_t max_length) {
  std::vector<Path> paths;
  Path emptyPath;
  paths.push_back(emptyPath);
@ -166,7 +167,8 @@ public:
    if (id < getSize()) {
      Path& p = (*path_)[id];
      std::ostringstream sout;
-      std::transform(p.ids.begin(), p.ids.end(),
+      std::transform(p.ids.begin(),
                     p.ids.end(),
                     std::ostream_iterator<std::string>(sout, split ? " " : ""),
                     [&](int id) { return (*dict_)[id]; });
      return sout.str();
--- a/paddle/api/Trainer.cpp
+++ b/paddle/api/Trainer.cpp
@ -67,9 +67,8 @@ Trainer::Trainer(TrainerConfig* config, GradientMachine* gm)
  m->init(config->m->conf, /* testing= */ false, gm ? gm->m->machine : nullptr);
 }
-Trainer* Trainer::create(TrainerConfig* config, GradientMachine* gm)
+Trainer* Trainer::create(TrainerConfig* config,
-    throw(IOError)
+                         GradientMachine* gm) throw(IOError) {
 {
  auto retv = new Trainer(config, gm);
  if (retv->m->getConfig().IsInitialized()) {
    return retv;
@ -140,7 +139,9 @@ Matrix* Trainer::getLayerOutput(const std::string& layerName) {
  return Matrix::createByPaddleMatrixPtr(&m);
 }
-void Trainer::forwardOneBatch(size_t batchSize) { m->forwardOneBatch(batchSize); }
+void Trainer::forwardOneBatch(size_t batchSize) {
  m->forwardOneBatch(batchSize);
 }
 bool TrainerPrivate::forwardOneBatch(size_t batchSize) {
  CHECK(dataProvider_) << "data_provider is not specified";
@ -156,7 +157,6 @@ bool TrainerPrivate::forwardOneBatch(size_t batchSize)  {
 void TrainerPrivate::forwardOneDataBatch(
    const std::vector<paddle::Argument>& inArgs) {
  std::vector<paddle::Argument>& outArgs = forwardOutput_;
  if (config_->getOptConfig().use_sparse_remote_updater()) {
--- a/paddle/api/Util.cpp
+++ b/paddle/api/Util.cpp
@ -37,7 +37,9 @@ FloatArray::FloatArray(const float* b, const size_t l)
 IntArray::IntArray(const int* b, const size_t l, bool f)
    : buf(b), length(l), needFree(f) {}
-IntWithFloatArray::IntWithFloatArray(const float* v, const int* i, size_t l,
+IntWithFloatArray::IntWithFloatArray(const float* v,
                                     const int* i,
                                     size_t l,
                                     bool f)
    : valBuf(v), idxBuf(i), length(l), needFree(f) {}
--- a/paddle/api/Vector.cpp
+++ b/paddle/api/Vector.cpp
@ -12,7 +12,6 @@ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 See the License for the specific language governing permissions and
 limitations under the License. */
 #include "PaddleAPI.h"
 #include "paddle/math/Vector.h"
@ -39,7 +38,9 @@ IVector* IVector::create(const std::vector<int>& data, bool useGpu) {
  return v;
 }
-IVector* IVector::createVectorFromNumpy(int* data, int dim, bool copy,
+IVector* IVector::createVectorFromNumpy(int* data,
                                        int dim,
                                        bool copy,
                                        bool useGpu) throw(UnsupportError) {
  if (useGpu) {
    /// if use gpu only copy=true is supported
@ -137,8 +138,8 @@ void IVector::copyToNumpyArray(int** view_m_data, int* dim1) {
  if (auto cpuVec = dynamic_cast<paddle::CpuIVector*>(m->vec.get())) {
    std::memcpy(*view_m_data, cpuVec->getData(), sizeof(int) * (*dim1));
  } else if (auto gpuVec = dynamic_cast<paddle::GpuIVector*>(m->vec.get())) {
-    hl_memcpy_device2host(*view_m_data, gpuVec->getData(),
+    hl_memcpy_device2host(
-                          sizeof(int) * (*dim1));
+        *view_m_data, gpuVec->getData(), sizeof(int) * (*dim1));
  } else {
    LOG(INFO) << "Unexpected situation";
  }
@ -201,7 +202,9 @@ Vector* Vector::createByPaddleVectorPtr(void* ptr) {
  }
 }
-Vector* Vector::createVectorFromNumpy(float* data, int dim, bool copy,
+Vector* Vector::createVectorFromNumpy(float* data,
                                      int dim,
                                      bool copy,
                                      bool useGpu) throw(UnsupportError) {
  if (useGpu) {
    /// if use gpu only copy=True is supported
@ -251,8 +254,8 @@ void Vector::copyToNumpyArray(float** view_m_data, int* dim1) {
  if (auto cpuVec = dynamic_cast<paddle::CpuVector*>(m->vec.get())) {
    std::memcpy(*view_m_data, cpuVec->getData(), sizeof(float) * (*dim1));
  } else if (auto gpuVec = dynamic_cast<paddle::CpuVector*>(m->vec.get())) {
-    hl_memcpy_device2host(*view_m_data, gpuVec->getData(),
+    hl_memcpy_device2host(
-                          sizeof(float) * (*dim1));
+        *view_m_data, gpuVec->getData(), sizeof(float) * (*dim1));
  } else {
    LOG(INFO) << "Unexpected situation";
  }
--- a/paddle/cuda/include/hl_activation_functions.h
+++ b/paddle/cuda/include/hl_activation_functions.h
@ -12,7 +12,6 @@ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 See the License for the specific language governing permissions and
 limitations under the License. */
 #ifndef HL_ACTIVATION_FUNCTIONS_H_
 #define HL_ACTIVATION_FUNCTIONS_H_
@ -21,11 +20,8 @@ limitations under the License. */
 /**
 * Active functions: sigmoid, relu, tanh and linear.
 */
-#define HPPL_ACTIVE_FUNCTION  {hppl::sigmoid,   \
+#define HPPL_ACTIVE_FUNCTION \
-                               hppl::relu,      \
+  { hppl::sigmoid, hppl::relu, hppl::tanh, hppl::linear }
                               hppl::tanh,      \
                               hppl::linear     \
                              }
 namespace hppl {
--- a/paddle/cuda/include/hl_aggregate.h
+++ b/paddle/cuda/include/hl_aggregate.h
@ -12,7 +12,6 @@ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 See the License for the specific language governing permissions and
 limitations under the License. */
 #ifndef HL_AGGREGATE_H_
 #define HL_AGGREGATE_H_
--- a/paddle/cuda/include/hl_avx_functions.h
+++ b/paddle/cuda/include/hl_avx_functions.h
@ -12,7 +12,6 @@ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 See the License for the specific language governing permissions and
 limitations under the License. */
 #ifndef HL_AVX_FUNCTIONS_H_
 #define HL_AVX_FUNCTIONS_H_
--- a/paddle/cuda/include/hl_base.h
+++ b/paddle/cuda/include/hl_base.h
@ -12,8 +12,6 @@ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 See the License for the specific language governing permissions and
 limitations under the License. */
 #ifndef HL_BASE_H_
 #define HL_BASE_H_
@ -153,7 +151,6 @@ typedef enum {
  HL_VALUE_END
 } hl_matrix_value_t;
 /**
 * @brief  HPPL matrix format.
 */
@ -163,7 +160,6 @@ typedef enum {
  HL_SPARSE_END
 } hl_matrix_format_t;
 typedef struct _hl_matrix_s *hl_matrix_s;
 /**
@ -209,7 +205,6 @@ typedef struct {
 #define HL_FLOAT_MIN 2.2250738585072014e-308
 #endif
 /**
 * The maximum input value for exp, used to avoid overflow problem.
 *
@ -217,7 +212,6 @@ typedef struct {
 */
 #define EXP_MAX_INPUT 40.0
 /**
 * @brief DIVUP(x, y) is similar to ceil(x / y).
 * @note  For CUDA, DIVUP will be used to specify
@ -244,11 +238,10 @@ extern __thread cudaStream_t default_stream;
 #define CHECK_SYNC(msg)                                               \
  if (true == g_sync_flag) {                                          \
    hl_stream_synchronize(HPPL_STREAM_DEFAULT);                       \
-    cudaError_t err                                       \
+    cudaError_t err = (cudaError_t)hl_get_device_last_error();        \
-      = (cudaError_t)hl_get_device_last_error();          \
+    CHECK_EQ(cudaSuccess, err)                                        \
-    CHECK_EQ(cudaSuccess, err) << "[" << msg << "] "      \
+        << "[" << msg << "] "                                         \
-      << "CUDA error: "                                   \
+        << "CUDA error: " << hl_get_device_error_string((size_t)err); \
      << hl_get_device_error_string((size_t)err);         \
  }
 #endif /* __NVCC__ */
--- a/paddle/cuda/include/hl_batch_transpose.h
+++ b/paddle/cuda/include/hl_batch_transpose.h
@ -12,7 +12,6 @@ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 See the License for the specific language governing permissions and
 limitations under the License. */
 #ifndef HL_BATCH_TRANSPOSE_H_
 #define HL_BATCH_TRANSPOSE_H_
@ -31,10 +30,7 @@ limitations under the License. */
 *          order. Each batch has height * width data, which are
 *          arranged in height-first (or row-first) manner.
 */
-extern void batchTranspose(const real* input,
+extern void batchTranspose(
-                           real* output,
+    const real* input, real* output, int width, int height, int batchSize);
                           int width,
                           int height,
                           int batchSize);
 #endif  // HL_BATCH_TRANSPOSE_H_
--- a/paddle/cuda/include/hl_cnn.h
+++ b/paddle/cuda/include/hl_cnn.h
--- a/Show More
+++ b/Show More