Merge branch 'develop' of https://github.com/baidu/Paddle into cn_doc

9 years ago · 2a21d8b39f
parent 24cfc5ab3c 85f0e18460
commit 2a21d8b39f
387 changed files with 11013 additions and 8006 deletions
--- a/.pre-commit-config.yaml
+++ b/.pre-commit-config.yaml
@ -7,18 +7,14 @@
    hooks:
    -   id: yapf
 -   repo: https://github.com/pre-commit/pre-commit-hooks
-    sha: 4ef03c4223ad322c7adaa6c6c0efb26b57df3b71
+    sha: 7539d8bd1a00a3c1bfd34cdb606d3a6372e83469
    hooks:
    -   id: check-added-large-files
    -   id: check-merge-conflict
    -   id: check-symlinks
    -   id: detect-private-key
    -   id: end-of-file-fixer
-# TODO(yuyang): trailing whitespace has some bugs on markdown 
-# files now, please not add it to pre-commit hook now
-#    -   id: trailing-whitespace
-#
-# TODO(yuyang): debug-statements not fit for Paddle, because
-# not all of our python code is runnable. Some are used for 
-# documenation
-#    -   id: debug-statements
+-   repo: https://github.com/PaddlePaddle/clang-format-pre-commit-hook.git
+    sha: 28c0ea8a67a3e2dbbf4822ef44e85b63a0080a29
+    hooks:
+    -   id: clang-formater
--- a/README.md
+++ b/README.md
@ -1,11 +1,11 @@
 # PaddlePaddle


-[![Build Status](https://travis-ci.org/PaddlePaddle/Paddle.svg?branch=develop)](https://travis-ci.org/baidu/Paddle)
+[![Build Status](https://travis-ci.org/PaddlePaddle/Paddle.svg?branch=develop)](https://travis-ci.org/PaddlePaddle/Paddle)
 [![Documentation Status](https://img.shields.io/badge/docs-latest-brightgreen.svg?style=flat)](http://www.paddlepaddle.org/)
 [![Documentation Status](https://img.shields.io/badge/中文文档-最新-brightgreen.svg)](http://www.paddlepaddle.org/cn/index.html)
-[![Coverage Status](https://coveralls.io/repos/github/PaddlePaddle/Paddle/badge.svg?branch=develop)](https://coveralls.io/github/baidu/Paddle?branch=develop)
-[![Release](https://img.shields.io/github/release/baidu/Paddle.svg?colorB=fedcba)](https://github.com/baidu/Paddle/releases)
+[![Coverage Status](https://coveralls.io/repos/github/PaddlePaddle/Paddle/badge.svg?branch=develop)](https://coveralls.io/github/PaddlePaddle/Paddle?branch=develop)
+[![Release](https://img.shields.io/github/release/PaddlePaddle/Paddle.svg)](https://github.com/PaddlePaddle/Paddle/releases)
 [![License](https://img.shields.io/badge/license-Apache%202-blue.svg)](LICENSE)


@ -17,7 +17,7 @@ developed by Baidu scientists and engineers for the purpose of applying deep
 learning to many products at Baidu.

 Our vision is to enable deep learning for everyone via PaddlePaddle.
-Please refer to our [release announcement](https://github.com/baidu/Paddle/releases) to track the latest feature of PaddlePaddle.
+Please refer to our [release announcement](https://github.com/PaddlePaddle/Paddle/releases) to track the latest feature of PaddlePaddle.

 ## Features

@ -92,7 +92,7 @@ Both [English Docs](http://paddlepaddle.org/doc/) and [Chinese Docs](http://padd

 ## Ask Questions

-You are welcome to submit questions and bug reports as [Github Issues](https://github.com/baidu/paddle/issues).
+You are welcome to submit questions and bug reports as [Github Issues](https://github.com/PaddlePaddle/Paddle/issues).

 ## Copyright and License
 PaddlePaddle is provided under the [Apache-2.0 license](LICENSE).
--- a/doc/build/build_from_source.md
+++ b/doc/build/build_from_source.md
@ -6,10 +6,10 @@ Installing from Sources
 * [3. Build on Ubuntu](#ubuntu)

 ## <span id="download">Download and Setup</span> 
-You can download PaddlePaddle from the [github source](https://github.com/gangliao/Paddle).
+You can download PaddlePaddle from the [github source](https://github.com/PaddlePaddle/Paddle).

 ```bash
-git clone https://github.com/baidu/Paddle paddle
+git clone https://github.com/PaddlePaddle/Paddle paddle
 cd paddle
 ```

--- a/doc_cn/build_and_install/cmake/cblas_settings.csv
+++ b/doc_cn/build_and_install/cmake/cblas_settings.csv
@ -1,4 +1,5 @@
-MKL_ROOT,mkl的路径，在${MKL_ROOT}/include下需要包含mkl.h，在${MKL_ROOT}/lib目录下需要包含 mkl_core，mkl_sequential和mkl_intel_lp64三个库
-ATLAS_ROOT,ATLAS库的路径，在${ATLAS_ROOT}/include下需要包含cblas.h，而在${ATLAS_ROOT}/lib下需要包含cblas和atlas两个库
-OPENBLAS_ROOT,在${OPENBLAS_ROOT}/include下需要包含cblas.h，而在${OPENBLAS_ROOT}/lib下需要包含openblas库
-REFERENCE_CBLAS_ROOT,在${REFERENCE_CBLAS_ROOT}/include下需要包含cblas.h，在${REFERENCE_CBLAS_ROOT}/lib下需要包含cblas库
+编译选项,描述,注意
+MKL_ROOT,MKL的路径,${MKL_ROOT}/include下需要包含mkl.h，${MKL_ROOT}/lib目录下需要包含mkl_core，mkl_sequential和mkl_intel_lp64三个库。
+ATLAS_ROOT,ATLAS的路径,${ATLAS_ROOT}/include下需要包含cblas.h，${ATLAS_ROOT}/lib下需要包含cblas和atlas两个库。
+OPENBLAS_ROOT,OpenBLAS的路径,${OPENBLAS_ROOT}/include下需要包含cblas.h，${OPENBLAS_ROOT}/lib下需要包含openblas库。
+REFERENCE_CBLAS_ROOT,REFERENCE BLAS的路径,${REFERENCE_CBLAS_ROOT}/include下需要包含cblas.h，${REFERENCE_CBLAS_ROOT}/lib下需要包含cblas库。
--- a/doc_cn/build_and_install/cmake/compile_options.csv
+++ b/doc_cn/build_and_install/cmake/compile_options.csv
@ -1,15 +1,14 @@
-选项,说明,默认值
-WITH_GPU,是否编译GPU支持。,是否寻找到cuda工具链
-WITH_DOUBLE,是否使用双精度浮点数。,否
-WITH_DSO,是否使用运行时动态加载cuda动态库，而非静态加载cuda动态库。,是
-WITH_AVX,是否编译含有AVX指令集的PaddlePaddle二进制,是
-WITH_PYTHON,是否内嵌python解释器。可以方便嵌入式工作。,是
-WITH_STYLE_CHECK,是否编译时进行代码风格检查,是
-WITH_RDMA,是否开启RDMA支持,否
-WITH_GLOG,是否使用GLOG，如果不使用则会使用一个简化版的日志实现。可以方便嵌入式工作。,取决于是否寻找到GLOG
-WITH_GFLAGS,是否使用GFLAGS，如果不使用则会使用一个简化版的命令行参数解析。可以方便嵌入式工作。,取决于是否寻找到GFLAGS
-WITH_TIMER,是否开启计时功能开启计时功能会导致运行略慢，打印的日志变多。但是方便调试和benchmark,否
-WITH_TESTING,是否开启单元测试,取决于是否寻找到gtest
-WITH_DOC,是否编译英文文档,否
-WITH_DOC_CN,是否编译中文文档,否
-WITH_SWIG_PY,是否编译python的swig接口，python的swig接口可以方便进行预测和定制化训练,取决于是否找到swig
+选项,说明,默认值
+WITH_GPU,是否支持GPU。,取决于是否寻找到CUDA工具链
+WITH_DOUBLE,是否使用双精度浮点数。,否
+WITH_DSO,是否运行时动态加载CUDA动态库，而非静态加载CUDA动态库。,是
+WITH_AVX,是否编译含有AVX指令集的PaddlePaddle二进制文件,是
+WITH_PYTHON,是否内嵌PYTHON解释器。方便今后的嵌入式移植工作。,是
+WITH_STYLE_CHECK,是否编译时进行代码风格检查,是
+WITH_RDMA,是否开启RDMA,否
+WITH_GLOG,是否开启GLOG。如果不开启，则会使用一个简化版的日志，同时方便今后的嵌入式移植工作。,取决于是否寻找到GLOG
+WITH_GFLAGS,是否使用GFLAGS。如果不开启，则会使用一个简化版的命令行参数解析器，同时方便今后的嵌入式移植工作。,取决于是否寻找到GFLAGS
+WITH_TIMER,是否开启计时功能。如果开启会导致运行略慢，打印的日志变多，但是方便调试和测Benchmark,否
+WITH_TESTING,是否开启单元测试,取决于是否寻找到GTEST
+WITH_DOC,是否编译中英文文档,否
+WITH_SWIG_PY,是否编译PYTHON的SWIG接口，该接口可用于预测和定制化训练,取决于是否寻找到SWIG
--- a/doc_cn/build_and_install/cmake/compile_options.rst
+++ b/doc_cn/build_and_install/cmake/compile_options.rst
@ -1,62 +1,43 @@
-设置PaddlePaddle的编译选项
-==========================
-
-PaddlePaddle的编译选项可以在调用cmake的时候设置。cmake是一个跨平台的编译脚本，调用
-cmake可以将cmake项目文件，生成各个平台的makefile。详细的cmake使用方法可以参考
-`cmake的官方文档 <https://cmake.org/cmake-tutorial>`_ 。
-
-PaddlePaddle的编译选项是可以控制PaddlePaddle生成CPU/GPU版本二进制，链接何种blas等等。所有的
-编译选项列表如下
-
-PaddlePaddle的编译选项
----------------------
-
-bool型的编译选项
-++++++++++++++++
-设置下列编译选项时，可以在cmake的命令行设置。使用 -D命令即可。例如 
-:code:`cmake -D WITH_GPU=OFF`
-
-..  csv-table:: PaddlePaddle的bool型编译选项
-    :widths: 1, 7, 2
-    :file: compile_options.csv
-
-blas相关的编译选项
-++++++++++++++++++
-
-PaddlePaddle可以使用 `MKL <https://software.intel.com/en-us/intel-mkl>`_ ，
-`Atlas <http://math-atlas.sourceforge.net/>`_ ,
-`OpenBlas <http://www.openblas.net/>`_ 和 
-`refference Blas <http://www.netlib.org/blas/>`_ ，任意一种cblas实现。
-通过编译时指定路径来实现引用各种blas。
-
-cmake编译时会首先在系统路径(/usr/lib\:/usr/local/lib)中寻找这些blas的实现。同时
-也会读取相关路径变量来进行搜索。路径变量为\:
-
-
-..  csv-table:: PaddlePaddle的cblas编译选项
-    :widths: 1, 9
-    :header: "编译选项", "描述"
-    :file: cblas_settings.csv
-
-这些变量均可以使用 -D命令指定。例如 :code:`cmake -D MKL_ROOT=/opt/mkl/`。这些变
-量也可以通过调用cmake命令前通过环境变量指定。例如
-
-..  code-block:: bash
-
-    export MKL_ROOT=/opt/mkl
-    cmake
-
-需要注意的是，这些变量只在第一次cmake的时候有效。如果在第一次cmake之后想要重新设
-置这些变量，推荐清理( :code:`rm -rf` )掉编译目录后，再指定。
-
-cuda/cudnn相关的编译选项
-++++++++++++++++++++++++
-
-PaddlePaddle可以使用 cudnn v2之后的任何一个cudnn版本来编译运行。但需要注意的是编译和
-运行使用的cudnn尽量是同一个版本。推荐使用最新版本的cudnn v5.1。
-
-在cmake配置时可以使用 :code:`CUDNN_ROOT` 来配置CUDNN的安装路径。使用的命令也是 
-D，例如 :code:`cmake -D CUDNN_ROOT=/opt/cudnnv5` 。
-
-需要注意的是，这些变量只在第一次cmake的时候有效。如果在第一次cmake之后想要重新设
-置这些变量，推荐清理( :code:`rm -rf` )掉编译目录后，再指定。
+PaddlePaddle的编译选项
+======================
+
+PaddlePaddle的编译选项，包括生成CPU/GPU二进制文件、链接何种BLAS库等。用户可在调用cmake的时候设置它们，详细的cmake使用方法可以参考 `官方文档 <https://cmake.org/cmake-tutorial>`_ 。
+
+Bool型的编译选项
+----------------
+用户可在cmake的命令行中，通过使用 ``-D`` 命令设置该类编译选项，例如
+
+..  code-block:: bash
+
+    cmake .. -DWITH_GPU=OFF
+
+..  csv-table:: Bool型的编译选项
+    :widths: 1, 7, 2
+    :file: compile_options.csv
+
+BLAS/CUDA/Cudnn的编译选项
+--------------------------
+BLAS
+++++
+
+PaddlePaddle支持以下任意一种BLAS库：`MKL <https://software.intel.com/en-us/intel-mkl>`_ ，`ATLAS <http://math-atlas.sourceforge.net/>`_ ，`OpenBlAS <http://www.openblas.net/>`_ 和 `REFERENCE BLAS <http://www.netlib.org/blas/>`_ 。
+
+..  csv-table:: BLAS路径相关的编译选项
+    :widths: 1, 2, 7
+    :file: cblas_settings.csv
+
+CUDA/Cudnn
+++++++++++
+
+PaddlePaddle可以使用cudnn v2之后的任何一个版本来编译运行，但尽量请保持编译和运行使用的cudnn是同一个版本。 我们推荐使用最新版本的cudnn v5.1。
+
+编译选项的设置
++++++++++++++
+
+PaddePaddle通过编译时指定路径来实现引用各种BLAS/CUDA/Cudnn库。cmake编译时，首先在系统路径(/usr/lib\:/usr/local/lib)中搜索这几个库，同时也会读取相关路径变量来进行搜索。 通过使用 ``-D`` 命令可以设置，例如 
+
+..  code-block:: bash
+
+    cmake .. -DMKL_ROOT=/opt/mkl/ -DCUDNN_ROOT=/opt/cudnnv5
+
+注意：这几个编译选项的设置，只在第一次cmake的时候有效。如果之后想要重新设置，推荐清理整个编译目录（``rm -rf``）后，再指定。
--- a/doc_cn/howto/how_to_write_docs/index.rst
+++ b/doc_cn/howto/how_to_write_docs/index.rst
@ -2,32 +2,19 @@
 如何贡献/修改PaddlePaddle的文档
 ###############################

-PaddlePaddle的文档使用 `cmake`_ 驱动 `sphinx`_ 生成。公有两个文档，:code:`doc` 和 :code:`doc_cn` 。这两者会在 `cmake`_ 中进行编译，生成后的文档会存储在服务器的 :code:`doc` 和 :code:`doc_cn` 两个目录下。
+PaddlePaddle的文档包括英文文档 ``doc`` 和中文文档 ``doc_cn`` 两个部分。文档都是通过 `cmake`_ 驱动 `sphinx`_ 编译生成，生成后的文档分别存储在编译目录的 ``doc`` 和 ``doc_cn`` 两个子目录下。

-下面分几个部分介绍一下PaddlePaddle文档的贡献方法。
-
-如何书写PaddlePaddle的文档
-==========================
-
-TBD

 如何构建PaddlePaddle的文档
 ==========================

-构建PaddlePaddle文档，需要使用构建Paddle的全部环境。准备这个环境相对来说比较复杂，所以本文档提供两种方式构建PaddlePaddle的文档，即
-
-* 使用Docker构建PaddlePaddle的文档
-* 直接构建PaddlePaddle的文档。
-
-并且，我们推荐使用Docker来构建PaddlePaddle的文档。
+PaddlePaddle的文档构建有直接构建和基于Docker构建两种方式。构建PaddlePaddle文档需要准备的环境相对较复杂，所以我们推荐使用基于Docker来构建PaddlePaddle的文档。


 使用Docker构建PaddlePaddle的文档
 --------------------------------

-使用Docker构建PaddlePaddle的文档，首先要求在系统里安装好Docker工具包。安装Docker请参考 `Docker的官网 <https://docs.docker.com/>`_ 。
-
-安装好Docker之后可以使用源码目录下的脚本构建文档，即
+使用Docker构建PaddlePaddle的文档，需要在系统里先安装好Docker工具包。Docker安装请参考 `Docker的官网 <https://docs.docker.com/>`_ 。安装好Docker之后可以使用源码目录下的脚本构建文档，即

 ..	code-block:: bash

@ -35,10 +22,10 @@ TBD
 	cd paddle/scripts/tools/build_docs
 	bash build_docs.sh

-执行完这个脚本后，该目录下会生成两个目录，分别是\:
+编译完成后，该目录下会生成如下两个子目录\:

-* doc 目录，英文文档地址
-* doc_cn 目录，中文文档地址
+* doc 英文文档目录
+* doc_cn 中文文档目录

 打开浏览器访问对应目录下的index.html即可访问本地文档。

@ -52,6 +39,10 @@ TBD

 TBD

+如何书写PaddlePaddle的文档
+==========================
+
+TBD

 如何更新www.paddlepaddle.org文档
 ================================
--- a/paddle/api/Arguments.cpp
+++ b/paddle/api/Arguments.cpp
@ -12,7 +12,6 @@ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 See the License for the specific language governing permissions and
 limitations under the License. */

-
 #include "PaddleAPI.h"
 #include "PaddleAPIPrivate.h"

@ -112,7 +111,7 @@ void Arguments::setSlotSequenceStartPositions(size_t idx,
 }

 void Arguments::setSlotSubSequenceStartPositions(
-    size_t idx, IVector *vec) throw(RangeError) {
+    size_t idx, IVector* vec) throw(RangeError) {
  auto& a = m->getArg(idx);
  auto& v = m->cast<paddle::IVector>(vec->getSharedPtr());
  a.subSequenceStartPositions = std::make_shared<paddle::ICpuGpuVector>(v);
--- a/paddle/api/ConfigParser.cpp
+++ b/paddle/api/ConfigParser.cpp
@ -12,7 +12,6 @@ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 See the License for the specific language governing permissions and
 limitations under the License. */

-
 #include "PaddleAPI.h"
 #include "PaddleAPIPrivate.h"
 #include "paddle/trainer/Trainer.h"
@ -44,8 +43,7 @@ TrainerConfig* TrainerConfig::createFromTrainerConfigFile(
  return retv;
 }

-TrainerConfig* TrainerConfig::createFromProtoString(
-    const std::string& str) {
+TrainerConfig* TrainerConfig::createFromProtoString(const std::string& str) {
  auto retv = new TrainerConfig();
  paddle::TrainerConfig trainerConfigProto;
  auto conf = std::make_shared<paddle::TrainerConfigHelper>(trainerConfigProto);
--- a/paddle/api/GradientMachine.cpp
+++ b/paddle/api/GradientMachine.cpp
@ -12,7 +12,6 @@ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 See the License for the specific language governing permissions and
 limitations under the License. */

-
 #include "PaddleAPI.h"
 #include "PaddleAPIPrivate.h"

@ -27,7 +26,8 @@ GradientMachine::GradientMachine() : m(new GradientMachinePrivate()) {}
 GradientMachine::~GradientMachine() { delete m; }

 GradientMachine* GradientMachine::createFromPaddleModelPtr(
-    const void* confPtr, GradientMatchineCreateMode mode,
+    const void* confPtr,
+    GradientMatchineCreateMode mode,
    const std::vector<int>& types) {
  auto& conf = *(const paddle::ModelConfig*)(confPtr);
  std::vector<ParameterType> realTypes;
@ -44,7 +44,8 @@ GradientMachine* GradientMachine::createFromPaddleModelPtr(
 }

 GradientMachine* GradientMachine::createByConfigProtoStr(
-    const std::string& protoStr, GradientMatchineCreateMode mode,
+    const std::string& protoStr,
+    GradientMatchineCreateMode mode,
    const std::vector<int>& types) {
  paddle::ModelConfig conf;
  conf.ParseFromString(protoStr);
@ -56,13 +57,15 @@ GradientMachine* GradientMachine::createByConfigProtoStr(
 }

 GradientMachine* GradientMachine::createByModelConfig(
-    ModelConfig* conf, GradientMatchineCreateMode mode,
+    ModelConfig* conf,
+    GradientMatchineCreateMode mode,
    const std::vector<int>& types) {
  auto confPtr = &conf->m->conf->getModelConfig();
  return GradientMachine::createFromPaddleModelPtr(confPtr, mode, types);
 }

-void GradientMachine::forward(const Arguments& inArgs, Arguments* outArgs,
+void GradientMachine::forward(const Arguments& inArgs,
+                              Arguments* outArgs,
                              PassType passType) {
  auto& in =
      m->cast<std::vector<paddle::Argument>>(inArgs.getInternalArgumentsPtr());
@ -99,7 +102,8 @@ void GradientMachine::backward(const UpdateCallback& callback) {
 }

 void GradientMachine::forwardBackward(const Arguments& inArgs,
-                                      Arguments* outArgs, PassType passType,
+                                      Arguments* outArgs,
+                                      PassType passType,
                                      const UpdateCallback& callback) {
  auto& in =
      m->cast<std::vector<paddle::Argument>>(inArgs.getInternalArgumentsPtr());
@ -129,7 +133,7 @@ Parameter* GradientMachine::getParameter(size_t i) throw(RangeError) {
 void GradientMachine::randParameters() { m->machine->randParameters(); }

 Matrix* GradientMachine::getLayerOutput(const std::string& layerName) const
-  throw(UnsupportError) {
+    throw(UnsupportError) {
  auto nn = std::dynamic_pointer_cast<paddle::NeuralNetwork>(m->machine);
  if (nn) {
    auto mat = nn->getLayerOutput(layerName);
@ -140,8 +144,11 @@ Matrix* GradientMachine::getLayerOutput(const std::string& layerName) const
 }

 SequenceGenerator* GradientMachine::asSequenceGenerator(
-    const std::vector<std::string>& dict, size_t begin_id, size_t end_id,
-    size_t max_length, size_t beam_size) {
+    const std::vector<std::string>& dict,
+    size_t begin_id,
+    size_t end_id,
+    size_t max_length,
+    size_t beam_size) {
  SequenceGenerator* r =
      SequenceGenerator::createByGradientMachineSharedPtr(&m->machine);
  r->setDict(dict);
--- a/paddle/api/Internal.h
+++ b/paddle/api/Internal.h
@ -12,7 +12,6 @@ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 See the License for the specific language governing permissions and
 limitations under the License. */

-
 #pragma once

 #include "PaddleAPI.h"
@ -23,7 +22,8 @@ limitations under the License. */
 template <typename T1, typename T2>
 void staticCastVector(std::vector<T2>* dest, const std::vector<T1>& src) {
  dest->resize(src.size());
-  std::transform(src.begin(), src.end(), dest->begin(), [](T1 t){
-    return static_cast<T2>(t);
-  });
+  std::transform(src.begin(),
+                 src.end(),
+                 dest->begin(),
+                 [](T1 t) { return static_cast<T2>(t); });
 }
--- a/paddle/api/Matrix.cpp
+++ b/paddle/api/Matrix.cpp
@ -12,7 +12,6 @@ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 See the License for the specific language governing permissions and
 limitations under the License. */

-
 #include "PaddleAPI.h"
 #include "paddle/math/Matrix.h"
 #include "paddle/math/SparseMatrix.h"
@ -44,17 +43,21 @@ Matrix* Matrix::createZero(size_t height, size_t width, bool useGpu) {
  return m;
 }

-Matrix* Matrix::createDense(const std::vector<float>& data, size_t height,
-                            size_t width, bool useGpu) {
+Matrix* Matrix::createDense(const std::vector<float>& data,
+                            size_t height,
+                            size_t width,
+                            bool useGpu) {
  auto m = new Matrix();
  m->m->mat = paddle::Matrix::create(height, width, useGpu);
  m->m->mat->copyFrom(data.data(), data.size());
  return m;
 }

-Matrix* Matrix::createDenseFromNumpy(float* data, int dim1, int dim2,
-                                      bool copy, bool useGpu)
-                                     throw (UnsupportError) {
+Matrix* Matrix::createDenseFromNumpy(float* data,
+                                     int dim1,
+                                     int dim2,
+                                     bool copy,
+                                     bool useGpu) throw(UnsupportError) {
  if (useGpu) {
    /// Gpu mode only supports copy=True
    if (!copy) {
@ -66,7 +69,9 @@ Matrix* Matrix::createDenseFromNumpy(float* data, int dim1, int dim2,
  }
 }

-Matrix* Matrix::createCpuDenseFromNumpy(float* data, int dim1, int dim2,
+Matrix* Matrix::createCpuDenseFromNumpy(float* data,
+                                        int dim1,
+                                        int dim2,
                                        bool copy) {
  auto m = new Matrix();
  if (copy) {
@ -85,12 +90,20 @@ Matrix* Matrix::createGpuDenseFromNumpy(float* data, int dim1, int dim2) {
  return m;
 }

-Matrix* Matrix::createSparse(size_t height, size_t width, size_t nnz,
-                             bool isNonVal, bool isTrans, bool useGpu) {
+Matrix* Matrix::createSparse(size_t height,
+                             size_t width,
+                             size_t nnz,
+                             bool isNonVal,
+                             bool isTrans,
+                             bool useGpu) {
  auto m = new Matrix();
  m->m->mat = paddle::Matrix::createSparseMatrix(
-      height, width, nnz, isNonVal ? paddle::NO_VALUE : paddle::FLOAT_VALUE,
-      isTrans, useGpu);
+      height,
+      width,
+      nnz,
+      isNonVal ? paddle::NO_VALUE : paddle::FLOAT_VALUE,
+      isTrans,
+      useGpu);
  return m;
 }

@ -221,7 +234,8 @@ FloatArray Matrix::getData() const {
 }

 void Matrix::sparseCopyFrom(
-    const std::vector<int>& rows, const std::vector<int>& cols,
+    const std::vector<int>& rows,
+    const std::vector<int>& cols,
    const std::vector<float>& vals) throw(UnsupportError) {
  auto cpuSparseMat =
      std::dynamic_pointer_cast<paddle::CpuSparseMatrix>(m->mat);
@ -240,7 +254,8 @@ void Matrix::sparseCopyFrom(

 void* Matrix::getSharedPtr() const { return &m->mat; }

-void Matrix::toNumpyMatInplace(float** view_data, int* dim1,
+void Matrix::toNumpyMatInplace(float** view_data,
+                               int* dim1,
                               int* dim2) throw(UnsupportError) {
  auto cpuMat = std::dynamic_pointer_cast<paddle::CpuMatrix>(m->mat);
  if (cpuMat) {
@ -251,7 +266,8 @@ void Matrix::toNumpyMatInplace(float** view_data, int* dim1,
    throw UnsupportError();
  }
 }
-void Matrix::copyToNumpyMat(float** view_m_data, int* dim1,
+void Matrix::copyToNumpyMat(float** view_m_data,
+                            int* dim1,
                            int* dim2) throw(UnsupportError) {
  static_assert(sizeof(paddle::real) == sizeof(float),
                "Currently PaddleAPI only support for single "
@ -269,8 +285,8 @@ void Matrix::copyToNumpyMat(float** view_m_data, int* dim1,
    } else if (auto gpuMat = dynamic_cast<paddle::GpuMatrix*>(m->mat.get())) {
      auto src = gpuMat->getData();
      auto dest = *view_m_data;
-      hl_memcpy_device2host(dest, src,
-                            sizeof(paddle::real) * (*dim1) * (*dim2));
+      hl_memcpy_device2host(
+          dest, src, sizeof(paddle::real) * (*dim1) * (*dim2));
    } else {
      LOG(WARNING) << "Unexpected Situation";
      throw UnsupportError();
@ -278,7 +294,8 @@ void Matrix::copyToNumpyMat(float** view_m_data, int* dim1,
  }
 }

-void Matrix::copyFromNumpyMat(float* data, int dim1,
+void Matrix::copyFromNumpyMat(float* data,
+                              int dim1,
                              int dim2) throw(UnsupportError, RangeError) {
  if (isSparse()) {
    throw UnsupportError();
--- a/paddle/api/PaddleAPI.h
+++ b/paddle/api/PaddleAPI.h
@ -12,7 +12,6 @@ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 See the License for the specific language governing permissions and
 limitations under the License. */

-
 #pragma once

 #include <stddef.h>
@ -61,8 +60,8 @@ class RangeError {};
 /// Not support Error, such as access GPU memory directly, etc.
 class UnsupportError : public std::runtime_error {
 public:
-  UnsupportError() : std::runtime_error(" ") {};
-  UnsupportError(const std::string& message) : std::runtime_error(message) {};
+  UnsupportError() : std::runtime_error(" "){};
+  UnsupportError(const std::string& message) : std::runtime_error(message){};
 };

 /// This type will map to python's list of float.
@ -112,7 +111,8 @@ public:
  /**
   * Create A Matrix with height,width, which is filled by zero.
   */
-  static Matrix* createZero(size_t height, size_t width,
+  static Matrix* createZero(size_t height,
+                            size_t width,
                            bool useGpu = isUsingGpu());

  /**
@ -124,8 +124,11 @@ public:
   *
   * @note the default sparse type is SPARSE_CSR.
   */
-  static Matrix* createSparse(size_t height, size_t width, size_t nnz,
-                              bool isNonVal = true, bool trans = false,
+  static Matrix* createSparse(size_t height,
+                              size_t width,
+                              size_t nnz,
+                              bool isNonVal = true,
+                              bool trans = false,
                              bool useGpu = isUsingGpu());

  /**
@ -134,13 +137,17 @@ public:
   * @param data  list of float should be passed in python.
   * @note        the value will be copy into a new matrix.
   */
-  static Matrix* createDense(const std::vector<float>& data, size_t height,
-                             size_t width, bool useGpu = isUsingGpu());
-
-  static Matrix* createDenseFromNumpy(float* data, int dim1, int dim2,
-                                      bool copy = true,
-                                      bool useGpu = isUsingGpu())
-                                      throw (UnsupportError);
+  static Matrix* createDense(const std::vector<float>& data,
+                             size_t height,
+                             size_t width,
+                             bool useGpu = isUsingGpu());
+
+  static Matrix* createDenseFromNumpy(
+      float* data,
+      int dim1,
+      int dim2,
+      bool copy = true,
+      bool useGpu = isUsingGpu()) throw(UnsupportError);

  /**
   *  Create Cpu Dense Matrix from numpy matrix, dtype=float32
@ -151,7 +158,9 @@ public:
   *  @param copy  true if copy into a new matrix, false will create
   *               matrix inplace.
   */
-  static Matrix* createCpuDenseFromNumpy(float* data, int dim1, int dim2,
+  static Matrix* createCpuDenseFromNumpy(float* data,
+                                         int dim1,
+                                         int dim2,
                                         bool copy = false);

  /// Create Gpu Dense Matrix from numpy matrix, dtype=float32
@ -171,11 +180,13 @@ public:
   * numpy_mat = m.toNumpyMat()
   * @endcode
   */
-  void toNumpyMatInplace(float** view_data, int* dim1,
+  void toNumpyMatInplace(float** view_data,
+                         int* dim1,
                         int* dim2) throw(UnsupportError);

  /// Copy To numpy mat.
-  void copyToNumpyMat(float** view_m_data, int* dim1,
+  void copyToNumpyMat(float** view_m_data,
+                      int* dim1,
                      int* dim2) throw(UnsupportError);

  /// Copy From Numpy Mat
@ -248,15 +259,18 @@ public:
  static Vector* create(const std::vector<float>& data,
                        bool useGpu = isUsingGpu());

-  static Vector* createVectorFromNumpy(float* data, int dim, bool copy = true,
-                                       bool useGpu = isUsingGpu())
-                                       throw (UnsupportError);
+  static Vector* createVectorFromNumpy(
+      float* data,
+      int dim,
+      bool copy = true,
+      bool useGpu = isUsingGpu()) throw(UnsupportError);
  /**
   * Create Cpu Vector from numpy array, which dtype=float32
   *
   * If copy is false, it will create vector inplace.
   */
-  static Vector* createCpuVectorFromNumpy(float* data, int dim,
+  static Vector* createCpuVectorFromNumpy(float* data,
+                                          int dim,
                                          bool copy = false);

  /// Create Gpu Vector from numpy array, which dtype=float32
@ -312,16 +326,19 @@ public:
  static IVector* create(const std::vector<int>& data,
                         bool useGpu = isUsingGpu());

-  static IVector* createVectorFromNumpy(int* data, int dim, bool copy = true,
-                                        bool useGpu = isUsingGpu())
-                                        throw (UnsupportError);
+  static IVector* createVectorFromNumpy(
+      int* data,
+      int dim,
+      bool copy = true,
+      bool useGpu = isUsingGpu()) throw(UnsupportError);

  /**
   * Create Cpu IVector from numpy array, which dtype=int32
   *
   * If copy is false, it will create vector inplace
   */
-  static IVector* createCpuVectorFromNumpy(int* data, int dim,
+  static IVector* createCpuVectorFromNumpy(int* data,
+                                           int dim,
                                           bool copy = false);
  /**
   * Create Gpu IVector from numpy array, which dtype=int32
@ -605,7 +622,8 @@ class ParameterTraverseCallback {
 public:
  ~ParameterTraverseCallback();

-  void apply(const std::vector<Vector*>& vecs, const ParameterConfig& config,
+  void apply(const std::vector<Vector*>& vecs,
+             const ParameterConfig& config,
             size_t sparseId);

 private:
@ -638,7 +656,8 @@ public:

  void finishBatch();

-  void update(const std::vector<Vector*>& vecs, const ParameterConfig& conf,
+  void update(const std::vector<Vector*>& vecs,
+              const ParameterConfig& conf,
              size_t sparseId = NO_SPARSE_ID);

  std::vector<int> getParameterTypes() const;
@ -678,7 +697,8 @@ public:
   * model config by TrainerConfig
   */
  static GradientMachine* createByModelConfig(
-      ModelConfig* conf, GradientMatchineCreateMode mode = CREATE_MODE_NORMAL,
+      ModelConfig* conf,
+      GradientMatchineCreateMode mode = CREATE_MODE_NORMAL,
      const std::vector<int>& parameterTypes = defaultParamTypes);

  /**
@ -701,7 +721,8 @@ public:
  /**
   * Combine forward/backward
   */
-  void forwardBackward(const Arguments& inArgs, Arguments* outArgs,
+  void forwardBackward(const Arguments& inArgs,
+                       Arguments* outArgs,
                       PassType passType,
                       const UpdateCallback& callback = UpdateCallback());

@ -722,14 +743,17 @@ public:
   */
  SequenceGenerator* asSequenceGenerator(
      const std::vector<std::string>& dict = std::vector<std::string>(),
-      size_t begin_id = 0UL, size_t end_id = 0UL, size_t max_length = 100UL,
+      size_t begin_id = 0UL,
+      size_t end_id = 0UL,
+      size_t max_length = 100UL,
      size_t beam_size = -1UL);

 private:
  GradientMachinePrivate* m;

  static GradientMachine* createFromPaddleModelPtr(
-      const void* confPtr, GradientMatchineCreateMode mode,
+      const void* confPtr,
+      GradientMatchineCreateMode mode,
      const std::vector<int>& types);

  // Not to use c++ 11 init-list, so we use static var as function default arg.
@ -751,8 +775,8 @@ public:
  /// Create A Trainer By TrainerConfig. using paddle command line.
  static Trainer* createByCommandLine() throw(IOError);

-  static Trainer* create(TrainerConfig* optConfig, GradientMachine* gm)
-      throw(IOError);
+  static Trainer* create(TrainerConfig* optConfig,
+                         GradientMachine* gm) throw(IOError);

  /// Start training
  void startTrain();
--- a/paddle/api/Parameter.cpp
+++ b/paddle/api/Parameter.cpp
@ -12,7 +12,6 @@ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 See the License for the specific language governing permissions and
 limitations under the License. */

-
 #include "PaddleAPI.h"
 #include "paddle/parameter/Parameter.h"

--- a/paddle/api/ParameterOptimizer.cpp
+++ b/paddle/api/ParameterOptimizer.cpp
@ -12,7 +12,6 @@ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 See the License for the specific language governing permissions and
 limitations under the License. */

-
 #include "PaddleAPI.h"
 #include "PaddleAPIPrivate.h"
 #include "paddle/parameter/ParameterOptimizer.h"
@ -32,17 +31,21 @@ struct ParameterTraverseCallbackPrivate {
      const paddle::ParameterOptimizer::TraverseCallback& callback)
      : callback(callback) {}

-  void apply(const std::vector<Vector*>& vecs, const ParameterConfig& conf,
+  void apply(const std::vector<Vector*>& vecs,
+             const ParameterConfig& conf,
             size_t sparseId) {
    std::vector<paddle::VectorPtr> real_vecs;
    real_vecs.resize(vecs.size());
-    std::transform(vecs.begin(), vecs.end(), real_vecs.begin(), [](Vector* v) {
-      if (v) {
-        return *(paddle::VectorPtr*)(v->getSharedPtr());
-      } else {
-        return paddle::VectorPtr();
-      }
-    });
+    std::transform(vecs.begin(),
+                   vecs.end(),
+                   real_vecs.begin(),
+                   [](Vector* v) {
+                     if (v) {
+                       return *(paddle::VectorPtr*)(v->getSharedPtr());
+                     } else {
+                       return paddle::VectorPtr();
+                     }
+                   });

    paddle::ParameterConfig& real_conf =
        *(paddle::ParameterConfig*)(const_cast<ParameterConfig&>(conf)
@ -86,10 +89,12 @@ void ParameterOptimizer::startBatch(size_t numSamplesProcessed) {
 void ParameterOptimizer::finishBatch() { m->optimizer->finishBatch(); }

 void ParameterOptimizer::update(const std::vector<Vector*>& vecs,
-                                const ParameterConfig& conf, size_t sparseId) {
-  ParameterTraverseCallbackPrivate invoker([&](
-      const paddle::VectorPtr _vecs[], const paddle::ParameterConfig& config,
-      size_t sid = -1UL) { m->optimizer->update(_vecs, config, sid); });
+                                const ParameterConfig& conf,
+                                size_t sparseId) {
+  ParameterTraverseCallbackPrivate invoker(
+      [&](const paddle::VectorPtr _vecs[],
+          const paddle::ParameterConfig& config,
+          size_t sid = -1UL) { m->optimizer->update(_vecs, config, sid); });
  invoker.apply(vecs, conf, sparseId);
 }

@ -116,8 +121,9 @@ void ParameterTraverseCallback::apply(const std::vector<Vector*>& vecs,

 ParameterTraverseCallback* ParameterOptimizer::needSpecialTraversal(
    const ParameterConfig& config) const {
-  auto& param_config = *(paddle::ParameterConfig*)const_cast<ParameterConfig&>(
-                            config).getRawPtr();
+  auto& param_config =
+      *(paddle::ParameterConfig*)const_cast<ParameterConfig&>(config)
+           .getRawPtr();
  auto callback = m->optimizer->needSpecialTraversal(param_config);
  if (callback) {
    auto retCallback = new ParameterTraverseCallback();
--- a/paddle/api/SequenceGenerator.cpp
+++ b/paddle/api/SequenceGenerator.cpp
@ -12,7 +12,6 @@ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 See the License for the specific language governing permissions and
 limitations under the License. */

-
 #include "PaddleAPI.h"
 #include "paddle/gserver/gradientmachines/GradientMachine.h"
 #include "paddle/parameter/Argument.h"
@ -42,8 +41,10 @@ struct Path {
 // position
 static void findNBest(paddle::GradientMachine* gradMachine,
                      std::vector<paddle::Argument>& inArgs,
-                      std::vector<Path>& finalPaths, size_t bos_id,
-                      size_t eos_id, size_t max_length) {
+                      std::vector<Path>& finalPaths,
+                      size_t bos_id,
+                      size_t eos_id,
+                      size_t max_length) {
  std::vector<Path> paths;
  Path emptyPath;
  paths.push_back(emptyPath);
@ -166,7 +167,8 @@ public:
    if (id < getSize()) {
      Path& p = (*path_)[id];
      std::ostringstream sout;
-      std::transform(p.ids.begin(), p.ids.end(),
+      std::transform(p.ids.begin(),
+                     p.ids.end(),
                     std::ostream_iterator<std::string>(sout, split ? " " : ""),
                     [&](int id) { return (*dict_)[id]; });
      return sout.str();
--- a/paddle/api/Trainer.cpp
+++ b/paddle/api/Trainer.cpp
@ -64,12 +64,11 @@ Trainer* Trainer::createByCommandLine() throw(IOError) {

 Trainer::Trainer(TrainerConfig* config, GradientMachine* gm)
    : m(new TrainerPrivate()) {
-  m->init(config->m->conf, /* testing= */false, gm ? gm->m->machine : nullptr);
+  m->init(config->m->conf, /* testing= */ false, gm ? gm->m->machine : nullptr);
 }

-Trainer* Trainer::create(TrainerConfig* config, GradientMachine* gm)
-    throw(IOError)
-{
+Trainer* Trainer::create(TrainerConfig* config,
+                         GradientMachine* gm) throw(IOError) {
  auto retv = new Trainer(config, gm);
  if (retv->m->getConfig().IsInitialized()) {
    return retv;
@ -134,15 +133,17 @@ void Trainer::finishTestPeriod() { m->finishTestPeriod(); }

 Matrix* Trainer::getLayerOutput(const std::string& layerName) {
  auto nn = std::dynamic_pointer_cast<paddle::NeuralNetwork>(
-          this->m->getGradientMachine());
+      this->m->getGradientMachine());
  CHECK(nn) << "trainerInternal_.getGradientMachine() is not NeuralNetwork";
  auto m = nn->getLayerOutput(layerName);
  return Matrix::createByPaddleMatrixPtr(&m);
 }

-void Trainer::forwardOneBatch(size_t batchSize) { m->forwardOneBatch(batchSize); }
+void Trainer::forwardOneBatch(size_t batchSize) {
+  m->forwardOneBatch(batchSize);
+}

-bool TrainerPrivate::forwardOneBatch(size_t batchSize)  {
+bool TrainerPrivate::forwardOneBatch(size_t batchSize) {
  CHECK(dataProvider_) << "data_provider is not specified";
  paddle::DataBatch dataBatch;
  int num = dataProvider_->getNextBatch(batchSize, &dataBatch);
@ -156,7 +157,6 @@ bool TrainerPrivate::forwardOneBatch(size_t batchSize)  {

 void TrainerPrivate::forwardOneDataBatch(
    const std::vector<paddle::Argument>& inArgs) {
-
  std::vector<paddle::Argument>& outArgs = forwardOutput_;

  if (config_->getOptConfig().use_sparse_remote_updater()) {
--- a/paddle/api/Util.cpp
+++ b/paddle/api/Util.cpp
@ -37,13 +37,15 @@ FloatArray::FloatArray(const float* b, const size_t l)
 IntArray::IntArray(const int* b, const size_t l, bool f)
    : buf(b), length(l), needFree(f) {}

-IntWithFloatArray::IntWithFloatArray(const float* v, const int* i, size_t l,
+IntWithFloatArray::IntWithFloatArray(const float* v,
+                                     const int* i,
+                                     size_t l,
                                     bool f)
    : valBuf(v), idxBuf(i), length(l), needFree(f) {}

-bool isUsingGpu() {return FLAGS_use_gpu;}
+bool isUsingGpu() { return FLAGS_use_gpu; }

-void setUseGpu(bool useGpu) {FLAGS_use_gpu = useGpu;}
+void setUseGpu(bool useGpu) { FLAGS_use_gpu = useGpu; }

 bool isGpuVersion() {
 #ifdef PADDLE_ONLY_CPU
--- a/paddle/api/Vector.cpp
+++ b/paddle/api/Vector.cpp
@ -12,7 +12,6 @@ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 See the License for the specific language governing permissions and
 limitations under the License. */

-
 #include "PaddleAPI.h"

 #include "paddle/math/Vector.h"
@ -39,8 +38,10 @@ IVector* IVector::create(const std::vector<int>& data, bool useGpu) {
  return v;
 }

-IVector* IVector::createVectorFromNumpy(int* data, int dim, bool copy,
-                                        bool useGpu) throw (UnsupportError){
+IVector* IVector::createVectorFromNumpy(int* data,
+                                        int dim,
+                                        bool copy,
+                                        bool useGpu) throw(UnsupportError) {
  if (useGpu) {
    /// if use gpu only copy=true is supported
    if (!copy) {
@ -137,8 +138,8 @@ void IVector::copyToNumpyArray(int** view_m_data, int* dim1) {
  if (auto cpuVec = dynamic_cast<paddle::CpuIVector*>(m->vec.get())) {
    std::memcpy(*view_m_data, cpuVec->getData(), sizeof(int) * (*dim1));
  } else if (auto gpuVec = dynamic_cast<paddle::GpuIVector*>(m->vec.get())) {
-    hl_memcpy_device2host(*view_m_data, gpuVec->getData(),
-                          sizeof(int) * (*dim1));
+    hl_memcpy_device2host(
+        *view_m_data, gpuVec->getData(), sizeof(int) * (*dim1));
  } else {
    LOG(INFO) << "Unexpected situation";
  }
@ -201,8 +202,10 @@ Vector* Vector::createByPaddleVectorPtr(void* ptr) {
  }
 }

-Vector* Vector::createVectorFromNumpy(float* data, int dim, bool copy,
-                                      bool useGpu) throw (UnsupportError){
+Vector* Vector::createVectorFromNumpy(float* data,
+                                      int dim,
+                                      bool copy,
+                                      bool useGpu) throw(UnsupportError) {
  if (useGpu) {
    /// if use gpu only copy=True is supported
    if (!copy) {
@ -251,8 +254,8 @@ void Vector::copyToNumpyArray(float** view_m_data, int* dim1) {
  if (auto cpuVec = dynamic_cast<paddle::CpuVector*>(m->vec.get())) {
    std::memcpy(*view_m_data, cpuVec->getData(), sizeof(float) * (*dim1));
  } else if (auto gpuVec = dynamic_cast<paddle::CpuVector*>(m->vec.get())) {
-    hl_memcpy_device2host(*view_m_data, gpuVec->getData(),
-                          sizeof(float) * (*dim1));
+    hl_memcpy_device2host(
+        *view_m_data, gpuVec->getData(), sizeof(float) * (*dim1));
  } else {
    LOG(INFO) << "Unexpected situation";
  }
--- a/paddle/cuda/include/hl_activation_functions.h
+++ b/paddle/cuda/include/hl_activation_functions.h
@ -12,7 +12,6 @@ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 See the License for the specific language governing permissions and
 limitations under the License. */

-
 #ifndef HL_ACTIVATION_FUNCTIONS_H_
 #define HL_ACTIVATION_FUNCTIONS_H_

@ -21,11 +20,8 @@ limitations under the License. */
 /**
 * Active functions: sigmoid, relu, tanh and linear.
 */
-#define HPPL_ACTIVE_FUNCTION  {hppl::sigmoid,   \
-                               hppl::relu,      \
-                               hppl::tanh,      \
-                               hppl::linear     \
-                              }
+#define HPPL_ACTIVE_FUNCTION \
+  { hppl::sigmoid, hppl::relu, hppl::tanh, hppl::linear }

 namespace hppl {

@ -42,18 +38,18 @@ public:

 #ifdef __NVCC__
 namespace gpu {
-static __device__ Active<real>::forward  forward[]  = HPPL_ACTIVE_FUNCTION;
+static __device__ Active<real>::forward forward[] = HPPL_ACTIVE_FUNCTION;
 static __device__ Active<real>::backward backward[] = HPPL_ACTIVE_FUNCTION;
 }
 #else
 namespace cpu {
-static Active<real>::forward  forward[] = HPPL_ACTIVE_FUNCTION;
+static Active<real>::forward forward[] = HPPL_ACTIVE_FUNCTION;
 static Active<real>::backward backward[] = HPPL_ACTIVE_FUNCTION;
 }

 #ifdef __AVX__
 namespace avx {
-static Active<__m256>::forward  forward[] = HPPL_ACTIVE_FUNCTION;
+static Active<__m256>::forward forward[] = HPPL_ACTIVE_FUNCTION;
 static Active<__m256>::backward backward[] = HPPL_ACTIVE_FUNCTION;
 }
 #endif
--- a/paddle/cuda/include/hl_aggregate.h
+++ b/paddle/cuda/include/hl_aggregate.h
@ -12,7 +12,6 @@ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 See the License for the specific language governing permissions and
 limitations under the License. */

-
 #ifndef HL_AGGREGATE_H_
 #define HL_AGGREGATE_H_

--- a/paddle/cuda/include/hl_avx_functions.h
+++ b/paddle/cuda/include/hl_avx_functions.h
@ -12,22 +12,21 @@ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 See the License for the specific language governing permissions and
 limitations under the License. */

-
 #ifndef HL_AVX_FUNCTIONS_H_
 #define HL_AVX_FUNCTIONS_H_

 #include <immintrin.h>

 namespace hppl {
-  __m256 relu(const __m256 a);
-  __m256 sigmoid(const __m256 a);
-  __m256 tanh(const __m256 a);
-  __m256 linear(const __m256 a);
-
-  __m256 relu(const __m256 a, const __m256 b);
-  __m256 sigmoid(const __m256 a, const __m256 b);
-  __m256 tanh(const __m256 a, const __m256 b);
-  __m256 linear(const __m256 a, const __m256 b);
+__m256 relu(const __m256 a);
+__m256 sigmoid(const __m256 a);
+__m256 tanh(const __m256 a);
+__m256 linear(const __m256 a);
+
+__m256 relu(const __m256 a, const __m256 b);
+__m256 sigmoid(const __m256 a, const __m256 b);
+__m256 tanh(const __m256 a, const __m256 b);
+__m256 linear(const __m256 a, const __m256 b);
 }  // namespace hppl

 #endif  // HL_AVX_FUNCTIONS_H_
--- a/paddle/cuda/include/hl_base.h
+++ b/paddle/cuda/include/hl_base.h
@ -12,8 +12,6 @@ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 See the License for the specific language governing permissions and
 limitations under the License. */

-
-
 #ifndef HL_BASE_H_
 #define HL_BASE_H_

@ -33,36 +31,36 @@ limitations under the License. */
 *          HPPL_STREAM_DEFAULT is HPPL default stream.
 */
 typedef enum {
-    HPPL_STREAM_DEFAULT = 0,    /* Thread Default Stream*/
-    HPPL_STREAM_1 = 1,
-    HPPL_STREAM_2 = 2,
-    HPPL_STREAM_3 = 3,
-    HPPL_STREAM_4 = 4,
-    HPPL_THREAD_STREAM_1 = 5,
-    HPPL_THREAD_STREAM_2 = 6,
-    HPPL_THREAD_STREAM_3 = 7,
-    HPPL_THREAD_STREAM_4 = 8,
-    HPPL_STREAM_END
+  HPPL_STREAM_DEFAULT = 0, /* Thread Default Stream*/
+  HPPL_STREAM_1 = 1,
+  HPPL_STREAM_2 = 2,
+  HPPL_STREAM_3 = 3,
+  HPPL_STREAM_4 = 4,
+  HPPL_THREAD_STREAM_1 = 5,
+  HPPL_THREAD_STREAM_2 = 6,
+  HPPL_THREAD_STREAM_3 = 7,
+  HPPL_THREAD_STREAM_4 = 8,
+  HPPL_STREAM_END
 } hl_stream_t;

 /**
 * @brief HPPL activation mode.
 */
 typedef enum {
-    HL_ACTIVATION_SIGMOID   = 0,
-    HL_ACTIVATION_RELU      = 1,
-    HL_ACTIVATION_TANH      = 2,
-    HL_ACTIVATION_LINEAR    = 3,
-    HL_ACTIVATION_END
+  HL_ACTIVATION_SIGMOID = 0,
+  HL_ACTIVATION_RELU = 1,
+  HL_ACTIVATION_TANH = 2,
+  HL_ACTIVATION_LINEAR = 3,
+  HL_ACTIVATION_END
 } hl_activation_mode_t;

 /**
 * @brief Transpose type.
 */
 typedef enum {
-    HPPL_OP_N = 0, /* transpose */
-    HPPL_OP_T = 1, /* non transpose */
-    HPPL_OP_END
+  HPPL_OP_N = 0, /* transpose */
+  HPPL_OP_T = 1, /* non transpose */
+  HPPL_OP_END
 } hl_trans_op_t;

 /**
@ -148,23 +146,21 @@ typedef struct {
 * @brief  Sparse matrix value type.
 */
 typedef enum {
-    HL_NO_VALUE = 0,                       /* matrix values only 0 or 1 */
-    HL_FLOAT_VALUE = 1,
-    HL_VALUE_END
+  HL_NO_VALUE = 0, /* matrix values only 0 or 1 */
+  HL_FLOAT_VALUE = 1,
+  HL_VALUE_END
 } hl_matrix_value_t;

-
 /**
 * @brief  HPPL matrix format.
 */
 typedef enum {
-    HL_SPARSE_CSR = 0,
-    HL_SPARSE_CSC = 1,
-    HL_SPARSE_END
+  HL_SPARSE_CSR = 0,
+  HL_SPARSE_CSC = 1,
+  HL_SPARSE_END
 } hl_matrix_format_t;

-
-typedef struct _hl_matrix_s * hl_matrix_s;
+typedef struct _hl_matrix_s *hl_matrix_s;

 /**
 * @brief   HPPL sparse matrix.
@ -177,12 +173,12 @@ typedef struct _hl_matrix_s * hl_matrix_s;
 * @param  nnz        nonzero values of sparse matrix.
 */
 typedef struct {
-    hl_matrix_s             matrix;
-    hl_matrix_format_t      format;
-    hl_matrix_value_t       type;
-    int                     rows;
-    int                     cols;
-    size_t                  nnz;
+  hl_matrix_s matrix;
+  hl_matrix_format_t format;
+  hl_matrix_value_t type;
+  int rows;
+  int cols;
+  size_t nnz;
 } _hl_sparse_matrix_s, *hl_sparse_matrix_s;

 #ifndef PADDLE_TYPE_DOUBLE
@ -195,7 +191,7 @@ typedef struct {
 *
 * HL_FLOAT_MIN: 1.17549435e-38F
 */
-#define HL_FLOAT_MAX        3.40282347e+38F
+#define HL_FLOAT_MAX 3.40282347e+38F
 /**
 * if real == double
 *
@ -203,20 +199,18 @@ typedef struct {
 *
 * HL_FLOAT_MIN: 2.2250738585072014e-308
 */
-#define HL_FLOAT_MIN        1.17549435e-38F
+#define HL_FLOAT_MIN 1.17549435e-38F
 #else
-#define HL_FLOAT_MAX        1.7976931348623157e+308
-#define HL_FLOAT_MIN        2.2250738585072014e-308
+#define HL_FLOAT_MAX 1.7976931348623157e+308
+#define HL_FLOAT_MIN 2.2250738585072014e-308
 #endif

-
 /**
 * The maximum input value for exp, used to avoid overflow problem.
 *
 * Currently only used for tanh function.
 */
-#define EXP_MAX_INPUT       40.0
-
+#define EXP_MAX_INPUT 40.0

 /**
 * @brief DIVUP(x, y) is similar to ceil(x / y).
@ -224,7 +218,7 @@ typedef struct {
 *        the size of blockDim.
 */
 #ifndef DIVUP
-#define DIVUP(x, y) (((x) + (y) - 1) / (y))
+#define DIVUP(x, y) (((x) + (y)-1) / (y))
 #endif

 #ifdef __NVCC__
@ -233,7 +227,7 @@ typedef struct {
 #include "hl_cuda.h"
 #include "cuda_runtime.h"

-extern  __thread bool g_sync_flag;
+extern __thread bool g_sync_flag;
 extern __thread cudaStream_t default_stream;
 #define STREAM_DEFAULT default_stream

@ -241,16 +235,15 @@ extern __thread cudaStream_t default_stream;
 * @brief   Check cuda kernel execution.
 * @param   msg   error string
 */
-#define CHECK_SYNC(msg)                                   \
-  if (true == g_sync_flag) {                              \
-    hl_stream_synchronize(HPPL_STREAM_DEFAULT);           \
-    cudaError_t err                                       \
-      = (cudaError_t)hl_get_device_last_error();          \
-    CHECK_EQ(cudaSuccess, err) << "[" << msg << "] "      \
-      << "CUDA error: "                                   \
-      << hl_get_device_error_string((size_t)err);         \
+#define CHECK_SYNC(msg)                                               \
+  if (true == g_sync_flag) {                                          \
+    hl_stream_synchronize(HPPL_STREAM_DEFAULT);                       \
+    cudaError_t err = (cudaError_t)hl_get_device_last_error();        \
+    CHECK_EQ(cudaSuccess, err)                                        \
+        << "[" << msg << "] "                                         \
+        << "CUDA error: " << hl_get_device_error_string((size_t)err); \
  }

-#endif  /* __NVCC__ */
+#endif /* __NVCC__ */

-#endif  /* HL_BASE_H_ */
+#endif /* HL_BASE_H_ */
--- a/paddle/cuda/include/hl_batch_transpose.h
+++ b/paddle/cuda/include/hl_batch_transpose.h
@ -12,7 +12,6 @@ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 See the License for the specific language governing permissions and
 limitations under the License. */

-
 #ifndef HL_BATCH_TRANSPOSE_H_
 #define HL_BATCH_TRANSPOSE_H_

@ -31,10 +30,7 @@ limitations under the License. */
 *          order. Each batch has height * width data, which are
 *          arranged in height-first (or row-first) manner.
 */
-extern void batchTranspose(const real* input,
-                           real* output,
-                           int width,
-                           int height,
-                           int batchSize);
+extern void batchTranspose(
+    const real* input, real* output, int width, int height, int batchSize);

 #endif  // HL_BATCH_TRANSPOSE_H_
--- a/paddle/cuda/include/hl_cnn.h
+++ b/paddle/cuda/include/hl_cnn.h
--- a/Show More
+++ b/Show More