Merge branch 'develop' of into lstm_fix

dangqingqing 7 years ago
commit c0005d5862

@ -1 +1,157 @@
./doc/howto/dev/ # Contribute Code
We sincerely appreciate your contribution. This document explains our workflow and work style.
## Workflow
PaddlePaddle uses this [Git branching model]( The following steps guide usual contributions.
1. Fork
Our development community has been growing fastly; it doesn't make sense for everyone to write into the official repo. So, please file Pull Requests from your fork. To make a fork, just head over to the GitHub page and click the ["Fork" button](
1. Clone
To make a copy of your fork to your local computers, please run
git clone
cd paddle
1. Create the local feature branch
For daily works like adding a new feature or fixing a bug, please open your feature branch before coding:
git checkout -b my-cool-stuff
1. Commit
Before issuing your first `git commit` command, please install [`pre-commit`]( by running the following commands:
pip install pre-commit
pre-commit install
Our pre-commit configuration requires clang-format 3.8 for auto-formating C/C++ code and yapf for Python.
Once installed, `pre-commit` checks the style of code and documentation in every commit. We will see something like the following when you run `git commit`:
➜ git commit
CRLF end-lines remover...............................(no files to check)Skipped
yapf.................................................(no files to check)Skipped
Check for added large files..............................................Passed
Check for merge conflicts................................................Passed
Check for broken symlinks................................................Passed
Detect Private Key...................................(no files to check)Skipped
Fix End of Files.....................................(no files to check)Skipped
clang-formater.......................................(no files to check)Skipped
[my-cool-stuff c703c041] add test file
1 file changed, 0 insertions(+), 0 deletions(-)
create mode 100644 233
1. Build and test
Users can build PaddlePaddle natively on Linux and Mac OS X. But to unify the building environment and to make it easy for debugging, the recommended way is [using Docker](
1. Keep pulling
An experienced Git user pulls from the official repo often -- daily or even hourly, so they notice conflicts with others work early, and it's easier to resolve smaller conflicts.
git remote add upstream
git pull upstream develop
1. Push and file a pull request
You can "push" your local work into your forked repo:
git push origin my-cool-stuff
The push allows you to create a pull request, requesting owners of this [official repo]( to pull your change into the official one.
To create a pull request, please follow [these steps](
If your change is for fixing an issue, please write ["Fixes <issue-URL>"]( in the description section of your pull request. Github would close the issue when the owners merge your pull request.
Please remember to specify some reviewers for your pull request. If you don't know who are the right ones, please follow Github's recommendation.
1. Delete local and remote branches
To keep your local workspace and your fork clean, you might want to remove merged branches:
git push origin :my-cool-stuff
git checkout develop
git pull upstream develop
git branch -d my-cool-stuff
### Code Review
- Please feel free to ping your reviewers by sending them the URL of your pull request via IM or email. Please do this after your pull request passes the CI.
- Please answer reviewers' every comment. If you are to follow the comment, please write "Done"; please give a reason otherwise.
- If you don't want your reviewers to get overwhelmed by email notifications, you might reply their comments by [in a batch](
- Reduce the unnecessary commits. Some developers commit often. It is recommended to append a sequence of small changes into one commit by running `git commit --amend` instead of `git commit`.
## Coding Standard
### Code Style
Our C/C++ code follows the [Google style guide](
Our Python code follows the [PEP8 style guide](
Our build process helps to check the code style. In [``](, the entry point of our [builder Docker image](, the CMake argument `WITH_STYLE_CHECK` is set to `ON` by default. This flag is on
Please install pre-commit, which automatically reformat the changes to C/C++ and Python code whenever we run `git commit`. To check the whole codebase, we can run the command `pre-commit run -a`, as in the [`` file](, which is invoked by [our Travis CI configuration](
### Unit Tests
Please remember to add related unit tests.
- For C/C++ code, please follow [`google-test` Primer](
- For Python code, please use [Python's standard `unittest` package](
### Writing Logs
We use [glog]( for logging in our C/C++ code.
For general information, please use `LOG`. For debug information, please use [`VLOG`]( The reason is at [here](
`VLOG` requires a *verbose level* parameter. For example:
VLOG(3) << "Operator FC is taking " << num_inputs << "inputs."
When we run a PaddlePaddle application or test, we can specify a verbose threshold. For example:
GLOG_vmodule=buddy_allocator=2 \
GLOG_v=10 \
python \
This will enable VLOG messages generated by `buddy_allocator.{h,cc}` and in the verbose range of 0 to 3, so you will see above example VLOG message, which is in level 3. This suggests that we output overall messages in lower verbose levels, so they display with higher probability. When coding C++, please follow the verbose level convention as follows:
- verbose level 1: [framework](
- verbose level 3: [operators](
- verbose level 5: [memory](, [platform](
- verbose level 7: [math](

@ -1,39 +1,36 @@
# 构建Raspberry Pi平台上的PaddlePaddle库 # 构建Raspberry Pi平台上的PaddlePaddle库
对于Rasspberry Pi系统用户可通过ssh等方式登录到Raspberry Pi系统上按照[源码编译PaddlePaddle](相关文档所述直接编译Raspberry Pi平台上适用的PaddlePaddle库。 通常有两个方法来构建基于 Rasspberry Pi 的版本:
用户也可以在自己熟悉的开发平台上通过交叉编译的方式来编译。这篇文档将以Linux x86-64平台为例介绍交叉编译Raspberry Pi平台上适用的PaddlePaddle的方法和步骤 1. 通过ssh等方式登录到Raspberry Pi系统上来构建。所需的开发工具和第三方库可以参考 [`/Dockerfile`](
## 准备交叉编译环境 1. 另一个方法是交叉编译。这篇文档介绍在 Linux/x64 上交叉编译Raspberry Pi平台上适用的PaddlePaddle的方法和步骤。
从源码交叉编译PaddlePaddle用户需要提前准备好交叉编译环境。用户可自行前往[github](下载Raspberry Pi平台使用的C/C++交叉编译工具链,也可通过以下命令获取: ## 安装交叉编译器
克隆下面 Github repo
```bash ```bash
git clone git clone
``` ```
该github仓库中包含若干个预编译好的、针对不同平台的编译工具。宿主机是Linux x86-64环境则需选用`arm-bcm2708/gcc-linaro-arm-linux-gnueabihf-raspbian-x64`下的作为编译工具所使用的编译器为arm-linux-gnueabihf-gcc 4.8.3。 即可在 `./tools/tree/master/arm-bcm2708/gcc-linaro-arm-linux-gnueabihf-raspbian-x64` 目录里找到交叉编译器 arm-linux-gnueabihf-gcc 4.8.3。运行该编译工具链需要一台 Linux x64 机器上以及 2.14版本以上的 glibc。
## 配置交叉编译参数 ## 配置交叉编译参数
CMake系统对交叉编译提供了支持[cmake-toolchains](。为了简化cmake配置PaddlePaddle为交叉编译提供了工具链配置文档[cmake/cross_compiling/raspberry_pi.cmake](,以提供一些默认的编译器和编译参数相关配置 CMake[支持交叉编译](。PaddlePaddle for Raspberry Pi的配置信息在[cmake/cross_compiling/raspberry_pi.cmake](。
交叉编译Raspberry Pi版本PaddlePaddle库时有一些必须配置的参数 交叉编译Raspberry Pi版本PaddlePaddle库时有一些必须配置的参数
- `CMAKE_SYSTEM_NAME`CMake编译的目标平台必须配置为`RPi`。在设置`CMAKE_SYSTEM_NAME=RPi`后PaddlePaddle的CMake系统才认为在是在交叉编译Raspberry Pi系统的版本并自动编译宿主机版protoc可执行文件、目标机版protobuf库、以及目标机版OpenBLAS库。 - `CMAKE_SYSTEM_NAME`CMake编译的目标平台必须配置为`RPi`。在设置`CMAKE_SYSTEM_NAME=RPi`后PaddlePaddle的CMake系统才认为在是在交叉编译Raspberry Pi系统的版本并自动编译宿主机版protoc可执行文件、目标机版protobuf库、以及目标机版OpenBLAS库。
Raspberry Pi平台可选配置参数
- `RPI_TOOLCHAIN`编译工具链所在的绝对路径或者相对于构建目录的相对路径。PaddlePaddle的CMake系统将根据该值自动设置需要使用的交叉编译器否则用户需要在cmake时手动设置这些值。无默认值。 - `RPI_TOOLCHAIN`编译工具链所在的绝对路径或者相对于构建目录的相对路径。PaddlePaddle的CMake系统将根据该值自动设置需要使用的交叉编译器否则用户需要在cmake时手动设置这些值。无默认值。
- `RPI_ARM_NEON`是否使用NEON指令。目前必须设置成`ON`,默认值为`ON`。
其他配置参数: - `RPI_ARM_NEON`是否使用NEON指令。目前必须设置成`ON`,默认值为`ON`。
- `HOST_C/CXX_COMPILER`宿主机的C/C++编译器。在编译宿主机版protoc可执行文件和目标机版OpenBLAS库时需要用到。默认设置成环境变量`CC`的值;若环境变量`CC`没有设置,则设置成`cc`编译器。 - `HOST_C/CXX_COMPILER`宿主机的C/C++编译器。在编译宿主机版protoc可执行文件和目标机版OpenBLAS库时需要用到。默认设置成环境变量`CC`的值;若环境变量`CC`没有设置,则设置成`cc`编译器。
cmake参数如下 一个常用的CMake配置如下
``` ```
@ -47,7 +44,9 @@ cmake -DCMAKE_SYSTEM_NAME=RPi \
.. ..
``` ```
用户还可根据自己的需求设置其他编译参数。比如希望最小化生成的库的大小,可以设置`CMAKE_BUILD_TYPE`为`MinSizeRel`;若希望最快的执行速度,则可设置`CMAKE_BUILD_TYPE`为`Release`。亦可以通过手动设置`CMAKE_C/CXX_FLAGS_MINSIZEREL/RELEASE`来影响PaddlePaddle的编译过程。 其中`WITH_C_API=ON`表示需要构建推理库。
## 编译和安装 ## 编译和安装
@ -60,6 +59,4 @@ make install
注意如果你曾经在源码目录下编译过其他平台的PaddlePaddle库请先使用`rm -rf`命令删除`third_party`目录和`build`目录以确保所有的第三方依赖库和PaddlePaddle代码都是针对新的CMake配置重新编译的。 注意如果你曾经在源码目录下编译过其他平台的PaddlePaddle库请先使用`rm -rf`命令删除`third_party`目录和`build`目录以确保所有的第三方依赖库和PaddlePaddle代码都是针对新的CMake配置重新编译的。
执行完安装命令后由于上一步cmake配置中`WITH_C_API`设置为`ON``your/path/to/install`目录中会包含`include`和`lib`目录,其中`include`中包含C-API的头文件`lib`中包含一个Raspberry Pi版本的库。 执行完安装命令后,,`your/path/to/install`目录中会包含`include`和`lib`目录,其中`include`中包含C-API的头文件`lib`中包含一个Raspberry Pi版本的库。

@ -0,0 +1,62 @@
# Build PaddlePaddle for Raspberry Pi
You may use any of the following two approaches to build the inference library of PaddlePaddle for Raspberry Pi:
1. Build using SSH: Log in to a Raspberry Pi using SSH and build the library. The required development tools and third-party dependencies are listed in here: [`/Dockerfile`](
1. Cross-compile: We talk about how to cross-compile PaddlePaddle for Raspberry Pi on a Linux/x64 machine, in more detail in this article.
## The Cross-Compiling Toolchain
Step 1. Clone the Github repo by running the following command.
git clone
Step 2. Use the pre-built cross-compiler found in `./tools/tree/master/arm-bcm2708/gcc-linaro-arm-linux-gnueabihf-raspbian-x64`. To run it on a Linux computer, glibc version >= 2.14 is needed.
## CMake Arguments
CMake supports [cross-compiling]( All CMake configuration arguments required for the cross-compilation for Raspberry Pi can be found in [`cmake/cross_compiling/raspberry_pi.cmake`](
Some important arguments that need to be set:
- `CMAKE_SYSTEM_NAME`: The target platform. Must be `RPi`.
- `RPI_TOOLCHAIN`: The absolute path of the cross-compiling toolchain.
- `RPI_ARM_NEON`: Use ARM NEON Intrinsics. This is a required argument and set default to `ON`.
- `HOST_C/CXX_COMPILER`: The C/C++ compiler for the host. It is used to build building tools running on the host, for example, protoc.
A commonly-used CMake configuration is as follows:
-DRPI_TOOLCHAIN=your/path/to/arm-bcm2708/gcc-linaro-arm-linux-gnueabihf-raspbian-x64 \
-DCMAKE_INSTALL_PREFIX=your/path/to/install \
To build the inference library, please set the argument WITH_API to ON: `WITH_C_API=ON`.
You can add more arguments. For example, to minimize the size of the generated inference library, you may use `CMAKE_BUILD_TYPE=MinSizeRel`. For performance optimization, you may use `CMAKE_BUILD_TYPE=Release`.
## Build and Install
The following commands build the inference library of PaddlePaddle for Raspberry Pi and third-party dependencies.
make install
The intermediate files will be stored in `build`. Third-party libraries will be located in `build/third_party`. If you have already built it for other platforms like Android or iOS, you may want to clear these directories by running the command: `rm -rf build`.
The infernece library will be in `your/path/to/install/lib`, with related header files in `your/path/to/install/include`.

@ -1,219 +0,0 @@
# Contribute Code
We sincerely appreciate your contributions. You can use fork and pull request
workflow to merge your code.
## Code Requirements
- Your code comments must be fully documented by
[Doxygen]( style.
- Make sure the compiler option `WITH_STYLE_CHECK` is on and the compiler
passes the code style check.
- All code must have unit test.
- Pass all unit tests.
The following tutorial guides you into submitting your contibution.
## [Creating a Fork](
Just head over to the GitHub page and click the "Fork" button.
It's just that simple.
## Clone
Clone remote repository.
➜ git clone
➜ cd Paddle
## Create a local branch
Paddle is currently using [Git-flow branching model](
All feature and bug fix development work should be done on a new branch, generally create new branch from `develop` branch .
➜ git checkout -b my-cool-stuff
Before the checkout, you need to keep the current branch directory clean, otherwise the untracked file will be brought to the new branch, which can be inspected by `git status`.
## Using `pre-commit` hook
Paddle developers use [pre-commit]( tool to manage git
pre-commit hooks. It can help us format source codes (cpp, python), check some
basic thing before commit (only one EOL for each file, do not add a huge file
in git). `pre-commit` tests is a part of unit tests in Travis-CI now, every
PR doesn't fit hook can not be merged into Paddle.
To use [pre-commit](, you should install it by
`pip install pre-commit`, and currently, Paddle uses `clang-format` to format
c/cpp sources. Please make sure clang-format 3.8+ installed.
Install and run it as follow:
➜ pip install pre-commit
➜ pre-commit install
When you commit your code, the pre-commit hook will check the local code if there is
anything not suitable to commit, and so on.
## Start to develop
In this tutorial, I delete a line in and created a new file.
We can use `git status` to inspect the changes of current directory, `git diff` to see difference.
➜ git status
On branch test
Changes not staged for commit:
(use "git add <file>..." to update what will be committed)
(use "git checkout -- <file>..." to discard changes in working directory)
Untracked files:
(use "git add <file>..." to include in what will be committed)
no changes added to commit (use "git add" and/or "git commit -a")
## Build and Test
We package PaddlePaddle's compile environment into a Docker image, called the develop image named `paddle:dev`, it contains all compiling tools that PaddlePaddle needs.
If you want to build the develop image, just run:
➜ docker build -t paddle:dev .
Then we can use the develop image to build PaddlePaddle source. For example:
➜ docker run -v $(pwd):/paddle -e "WITH_GPU=OFF" -e "WITH_AVX=ON" -e "WITH_TEST=ON" paddle:dev
The above command will compile PaddlePaddle and create a Dockerfile for building production image. All the generated files are in the build directory. "WITH_GPU" controls if the generated production image supports GPU. "WITH_AVX" controls if the generated production image supports AVX. "WITH_TEST" controls if the unit test will be generated.
Then we can generate the production image by copying the compiled PaddlePaddle program into the image by
➜ docker build -t paddle:prod -f build/Dockerfile .
Run unit test finally:
➜ docker run -it -v $(pwd):/paddle paddle:dev bash -c "cd /paddle/build && ctest"
For more details, you can read [this doc](
## Commit
Next we cancel the changes to the file and then commit our changes by following command lines:
➜ git checkout --
➜ git status
On branch test
Untracked files:
(use "git add <file>..." to include in what will be committed)
nothing added to commit but untracked files present (use "git add" to track)
➜ git add test
We should write a description of each commit by `git commit` to allow others to know
the changes in these files.
➜ git commit
CRLF end-lines remover...............................(no files to check)Skipped
yapf.................................................(no files to check)Skipped
Check for added large files..............................................Passed
Check for merge conflicts................................................Passed
Check for broken symlinks................................................Passed
Detect Private Key...................................(no files to check)Skipped
Fix End of Files.....................................(no files to check)Skipped
clang-formater.......................................(no files to check)Skipped
[my-cool-stuff c703c041] add test file
1 file changed, 0 insertions(+), 0 deletions(-)
create mode 100644 233
## Keeping Fork Up to Date
Before pull your request, you should sync your code from the latest PaddlePaddle.
To do this, you'll need to add a remote at first:
➜ git remote add upstream
➜ git remote
Update your fork with the latest upstream changes:
➜ git fetch upstream
➜ git pull upstream develop
Now, your local master branch is up-to-date with everything modified upstream.
## Push to GitHub
# push to your repository in Github
➜ git push origin my-cool-stuff
## Create an issue and a Pull Request
Create an Issue to describe the problem and record its number.
Go to the page for your fork on GitHub, select your development branch,
and click the `New pull request`.
<img width="295" alt="screen shot 2017-04-26 at 9 09 28 pm" src="">
Then select the target branch:
<img width="750" alt="screen shot 2017-04-26 at 9 11 52 pm" src="">
We can add `resolve #Issue number` in PR description to close the issue automatically after the PR is merge. More details in <>.
Then wait for review, if there need to modify, refer to the above steps to update the corresponding origin branch.
## Delete origin branch
After the PR is merge into the main repository, we can delete the remote branch on the PR page.
<img width="775" alt="screen shot 2017-04-26 at 9 18 24 pm" src="">
Or just run:
➜ git push origin :my-cool-stuff
## Delete local branch
Finally, we delete local branch:
➜ git checkout develop
# delete my-cool-stuff branch
➜ git branch -D my-cool-stuff

@ -0,0 +1 @@

@ -21,7 +21,6 @@
dev/build_cn.rst dev/build_cn.rst
dev/write_docs_cn.rst dev/write_docs_cn.rst
模型配置 模型配置
-------- --------

@ -19,7 +19,7 @@
* [启动集群作业](#启动集群作业-1) * [启动集群作业](#启动集群作业-1)
* [在Kubernetes集群中提交训练作业](#在kubernetes集群中提交训练作业) * [在Kubernetes集群中提交训练作业](#在kubernetes集群中提交训练作业)
# 概述 ## 概述
本文将介绍如何使用PaddlePaddle在不同的集群框架下完成分布式训练。分布式训练架构如下图所示 本文将介绍如何使用PaddlePaddle在不同的集群框架下完成分布式训练。分布式训练架构如下图所示
<img src="" width="500"> <img src="" width="500">
@ -32,7 +32,7 @@
在使用同步SGD训练神经网络时PaddlePaddle使用同步屏障barrier使梯度的提交和参数的更新按照顺序方式执行。在异步SGD中则并不会等待所有trainer提交梯度才更新参数这样极大地提高了计算的并行性参数服务器之间不相互依赖并行地接收梯度和更新参数参数服务器也不会等待计算节点全部都提交梯度之后才开始下一步计算节点之间也不会相互依赖并行地执行模型的训练。可以看出虽然异步SGD方式会提高参数更新并行度, 但是并不能保证参数同步更新在任意时间某一台参数服务器上保存的参数可能比另一台要更新与同步SGD相比梯度会有噪声。 在使用同步SGD训练神经网络时PaddlePaddle使用同步屏障barrier使梯度的提交和参数的更新按照顺序方式执行。在异步SGD中则并不会等待所有trainer提交梯度才更新参数这样极大地提高了计算的并行性参数服务器之间不相互依赖并行地接收梯度和更新参数参数服务器也不会等待计算节点全部都提交梯度之后才开始下一步计算节点之间也不会相互依赖并行地执行模型的训练。可以看出虽然异步SGD方式会提高参数更新并行度, 但是并不能保证参数同步更新在任意时间某一台参数服务器上保存的参数可能比另一台要更新与同步SGD相比梯度会有噪声。
# 环境准备 ## 环境准备
1. 准备您的计算集群。计算集群通常由一组几台到几千台规模的Linux服务器组成。服务器之间可以通过局域网LAN联通每台服务器具有集群中唯一的IP地址或者可被DNS解析的主机名。集群中的每台计算机通常被成为一个“节点”。 1. 准备您的计算集群。计算集群通常由一组几台到几千台规模的Linux服务器组成。服务器之间可以通过局域网LAN联通每台服务器具有集群中唯一的IP地址或者可被DNS解析的主机名。集群中的每台计算机通常被成为一个“节点”。
1. 我们需要在集群的所有节点上安装 PaddlePaddle。 如果要启用GPU还需要在节点上安装对应的GPU驱动以及CUDA。PaddlePaddle的安装可以参考[build_and_install](的多种安装方式。我们推荐使用[Docker](安装方式来快速安装PaddlePaddle。 1. 我们需要在集群的所有节点上安装 PaddlePaddle。 如果要启用GPU还需要在节点上安装对应的GPU驱动以及CUDA。PaddlePaddle的安装可以参考[build_and_install](的多种安装方式。我们推荐使用[Docker](安装方式来快速安装PaddlePaddle。
@ -51,8 +51,8 @@ PaddlePaddle 0.10.0, compiled with
下面以`doc/howto/usage/cluster/src/word2vec`中的代码作为实例介绍使用PaddlePaddle v2 API完成分布式训练。 下面以`doc/howto/usage/cluster/src/word2vec`中的代码作为实例介绍使用PaddlePaddle v2 API完成分布式训练。
# 启动参数说明 ## 启动参数说明
## 启动参数服务器 ### 启动参数服务器
执行以下的命令启动一个参数服务器并等待和计算节点的数据交互 执行以下的命令启动一个参数服务器并等待和计算节点的数据交互
```bash ```bash
$ paddle pserver --port=7164 --ports_num=1 --ports_num_for_sparse=1 --num_gradient_servers=1 $ paddle pserver --port=7164 --ports_num=1 --ports_num_for_sparse=1 --num_gradient_servers=1
@ -70,7 +70,7 @@ $ stdbuf -oL /usr/bin/nohup paddle pserver --port=7164 --ports_num=1 --ports_num
| ports_num_for_sparse | 必选 | 1 | 用于稀疏类型参数通信的端口个数 | | ports_num_for_sparse | 必选 | 1 | 用于稀疏类型参数通信的端口个数 |
| num_gradient_servers | 必选 | 1 | 当前训练任务pserver总数 | | num_gradient_servers | 必选 | 1 | 当前训练任务pserver总数 |
## 启动计算节点 ### 启动计算节点
执行以下命令启动使用python编写的trainer程序文件名为任意文件名如 执行以下命令启动使用python编写的trainer程序文件名为任意文件名如
```bash ```bash
$ python $ python
@ -117,7 +117,7 @@ paddle.init(
| pservers | 必选 | | 当前训练任务启动的pserver的IP列表多个IP使用“,”隔开 | | pservers | 必选 | | 当前训练任务启动的pserver的IP列表多个IP使用“,”隔开 |
## 准备数据集 ### 准备数据集
参考样例数据准备脚本[](准备训练数据和验证数据集我们使用paddle.dataset.imikolov数据集并根据分布式训练并发数trainer节点个数在``开头部分指定`SPLIT_COUNT`将数据切分成多份。 参考样例数据准备脚本[](准备训练数据和验证数据集我们使用paddle.dataset.imikolov数据集并根据分布式训练并发数trainer节点个数在``开头部分指定`SPLIT_COUNT`将数据切分成多份。
@ -149,7 +149,7 @@ test.txt-00002
对于不同的训练任务,训练数据格式和训练程序的`reader()`会大不相同,所以开发者需要根据自己训练任务的实际场景完成训练数据的分割和`reader()`的编写。 对于不同的训练任务,训练数据格式和训练程序的`reader()`会大不相同,所以开发者需要根据自己训练任务的实际场景完成训练数据的分割和`reader()`的编写。
## 准备训练程序 ### 准备训练程序
我们会对每个训练任务都会在每个节点上创建一个工作空间workspace其中包含了用户的训练程序、程序依赖、挂载或下载的训练数据分片。 我们会对每个训练任务都会在每个节点上创建一个工作空间workspace其中包含了用户的训练程序、程序依赖、挂载或下载的训练数据分片。
@ -184,7 +184,7 @@ test.txt-00002
- `train_data_dir`:包含训练数据的目录,可以是从分布式存储挂载过来的,也可以是在任务启动前下载到本地的。 - `train_data_dir`:包含训练数据的目录,可以是从分布式存储挂载过来的,也可以是在任务启动前下载到本地的。
- `test_data_dir`:包含测试数据集的目录。 - `test_data_dir`:包含测试数据集的目录。
# 使用分布式计算平台或工具 ## 使用分布式计算平台或工具
PaddlePaddle可以使用多种分布式计算平台构建分布式计算任务包括 PaddlePaddle可以使用多种分布式计算平台构建分布式计算任务包括
- [Kubernetes]( Google开源的容器集群的调度框架支持大规模集群生产环境的完整集群方案。 - [Kubernetes]( Google开源的容器集群的调度框架支持大规模集群生产环境的完整集群方案。
@ -195,12 +195,12 @@ PaddlePaddle可以使用多种分布式计算平台构建分布式计算任务
在使用分布式计算平台进行训练时任务被调度在集群中时分布式计算平台通常会通过API或者环境变量提供任务运行需要的参数比如节点的ID、IP和任务节点个数等。 在使用分布式计算平台进行训练时任务被调度在集群中时分布式计算平台通常会通过API或者环境变量提供任务运行需要的参数比如节点的ID、IP和任务节点个数等。
## 使用Fabric启动集群作业 ### 使用Fabric启动集群作业
### 准备一个Linux集群 #### 准备一个Linux集群
可以在`paddle/scripts/cluster_train_v2/fabric/docker_cluster`目录下,执行`kubectl -f ssh_servers.yaml`启动一个测试集群,并使用`kubectl get po -o wide`获得这些节点的IP地址。 可以在`paddle/scripts/cluster_train_v2/fabric/docker_cluster`目录下,执行`kubectl -f ssh_servers.yaml`启动一个测试集群,并使用`kubectl get po -o wide`获得这些节点的IP地址。
### 启动集群作业 #### 启动集群作业
`` 提供了自动化脚本来启动不同节点中的所有 PaddlePaddle 集群进程。默认情况下,所有命令行选项可以设置为 `` 命令选项并且 `` 将透明、自动地将这些选项应用到 PaddlePaddle 底层进程。 `` 提供了自动化脚本来启动不同节点中的所有 PaddlePaddle 集群进程。默认情况下,所有命令行选项可以设置为 `` 命令选项并且 `` 将透明、自动地将这些选项应用到 PaddlePaddle 底层进程。
@ -216,10 +216,10 @@ sh
集群作业将会在几秒后启动。 集群作业将会在几秒后启动。
### 终止集群作业 #### 终止集群作业
``能获取`Ctrl + C` SIGINT 信号来自动终止它启动的所有进程。只需中断 `` 任务来终止集群作业。如果程序崩溃你也可以手动终止。 ``能获取`Ctrl + C` SIGINT 信号来自动终止它启动的所有进程。只需中断 `` 任务来终止集群作业。如果程序崩溃你也可以手动终止。
### 检查集群训练结果 #### 检查集群训练结果
详细信息请检查 $workspace/log 里的日志,每一个节点都有相同的日志结构。 详细信息请检查 $workspace/log 里的日志,每一个节点都有相同的日志结构。
`paddle_trainer.INFO` `paddle_trainer.INFO`
@ -234,13 +234,13 @@ sh
`train.log` `train.log`
提供训练过程的 stderr 和 stdout。训练失败时可以检查错误日志。 提供训练过程的 stderr 和 stdout。训练失败时可以检查错误日志。
### 检查模型输出 #### 检查模型输出
运行完成后,模型文件将被写入节点 0 的 `output` 目录中。 运行完成后,模型文件将被写入节点 0 的 `output` 目录中。
工作空间中的 `nodefile` 表示当前集群作业的节点 ID。 工作空间中的 `nodefile` 表示当前集群作业的节点 ID。
## 在OpenMPI集群中提交训练作业 ### 在OpenMPI集群中提交训练作业
### 准备OpenMPI集群 #### 准备OpenMPI集群
执行下面的命令以启动3个节点的OpenMPI集群和一个"head"节点: 执行下面的命令以启动3个节点的OpenMPI集群和一个"head"节点:
@ -252,7 +252,7 @@ kubectl create -f mpi-nodes.yaml
然后可以从head节点ssh无密码登录到OpenMPI的每个节点上。 然后可以从head节点ssh无密码登录到OpenMPI的每个节点上。
### 启动集群作业 #### 启动集群作业
您可以按照下面的步骤在OpenMPI集群中提交paddle训练任务 您可以按照下面的步骤在OpenMPI集群中提交paddle训练任务
@ -280,6 +280,6 @@ scp train.txt-00002 test.txt-00002 [node3IP]:/home/tutorial
mpirun -hostfile machines -n 3 /home/tutorial/ mpirun -hostfile machines -n 3 /home/tutorial/
``` ```
## 在Kubernetes集群中提交训练作业 ### 在Kubernetes集群中提交训练作业
此部分的使用方法可以参考[here](../k8s/。 此部分的使用方法可以参考[here](../k8s/。

@ -19,7 +19,7 @@
* [Launching Cluster Job](#launching-cluster-job-1) * [Launching Cluster Job](#launching-cluster-job-1)
* [Cluster Training Using Kubernetes](#cluster-training-using-kubernetes) * [Cluster Training Using Kubernetes](#cluster-training-using-kubernetes)
# Introduction ## Introduction
In this article, we'll explain how to run distributed training jobs with PaddlePaddle on different types of clusters. The diagram below shows the main architecture of a distributed trainning job: In this article, we'll explain how to run distributed training jobs with PaddlePaddle on different types of clusters. The diagram below shows the main architecture of a distributed trainning job:
@ -33,7 +33,7 @@ PaddlePaddle can support both synchronize stochastic gradient descent (SGD) and
When training with synchronize SGD, PaddlePaddle uses an internal "synchronize barrier" which makes gradients update and parameter download in strict order. On the other hand, asynchronous SGD won't wait for all trainers to finish upload at a single step, this will increase the parallelism of distributed training: parameter servers do not depend on each other, they'll do parameter optimization concurrently. Parameter servers will not wait for trainers, so trainers will also do their work concurrently. But asynchronous SGD will introduce more randomness and noises in the gradient. When training with synchronize SGD, PaddlePaddle uses an internal "synchronize barrier" which makes gradients update and parameter download in strict order. On the other hand, asynchronous SGD won't wait for all trainers to finish upload at a single step, this will increase the parallelism of distributed training: parameter servers do not depend on each other, they'll do parameter optimization concurrently. Parameter servers will not wait for trainers, so trainers will also do their work concurrently. But asynchronous SGD will introduce more randomness and noises in the gradient.
# Preparations ## Preparations
1. Prepare your computer cluster. It's normally a bunch of Linux servers connected by LAN. Each server will be assigned a unique IP address. The computers in the cluster can be called "nodes". 1. Prepare your computer cluster. It's normally a bunch of Linux servers connected by LAN. Each server will be assigned a unique IP address. The computers in the cluster can be called "nodes".
2. Install PaddlePaddle on every node. If you are going to take advantage of GPU cards, you'll also need to install proper driver and CUDA libraries. To install PaddlePaddle please read [this build and install]( document. We strongly recommend using [Docker installation]( 2. Install PaddlePaddle on every node. If you are going to take advantage of GPU cards, you'll also need to install proper driver and CUDA libraries. To install PaddlePaddle please read [this build and install]( document. We strongly recommend using [Docker installation](
@ -52,9 +52,9 @@ PaddlePaddle 0.10.0rc, compiled with
We'll take `doc/howto/usage/cluster/src/word2vec` as an example to introduce distributed training using PaddlePaddle v2 API. We'll take `doc/howto/usage/cluster/src/word2vec` as an example to introduce distributed training using PaddlePaddle v2 API.
# Command-line arguments ## Command-line arguments
## Starting parameter server ### Starting parameter server
Type the below command to start a parameter server which will wait for trainers to connect: Type the below command to start a parameter server which will wait for trainers to connect:
@ -74,7 +74,7 @@ $ stdbuf -oL /usr/bin/nohup paddle pserver --port=7164 --ports_num=1 --ports_num
| ports_num_for_sparse | required | 1 | number of ports which serves sparse parameter update | | ports_num_for_sparse | required | 1 | number of ports which serves sparse parameter update |
| num_gradient_servers | required | 1 | total number of gradient servers | | num_gradient_servers | required | 1 | total number of gradient servers |
## Starting trainer ### Starting trainer
Type the command below to start the trainer(name the file whatever you want, like "") Type the command below to start the trainer(name the file whatever you want, like "")
```bash ```bash
@ -122,7 +122,7 @@ paddle.init(
| trainer_id | required | 0 | ID for every trainer, start from 0 | | trainer_id | required | 0 | ID for every trainer, start from 0 |
| pservers | required | | list of IPs of parameter servers, separated by "," | | pservers | required | | list of IPs of parameter servers, separated by "," |
## Prepare Training Dataset ### Prepare Training Dataset
Here's some example code [](, it will download public `imikolov` dataset and split it into multiple files according to job parallelism(trainers count). Modify `SPLIT_COUNT` at the begining of `` to change the count of output files. Here's some example code [](, it will download public `imikolov` dataset and split it into multiple files according to job parallelism(trainers count). Modify `SPLIT_COUNT` at the begining of `` to change the count of output files.
@ -155,7 +155,7 @@ When job started, every trainer needs to get it's own part of data. In some dist
Different training jobs may have different data format and `reader()` function, developers may need to write different data prepare scripts and `reader()` functions for their job. Different training jobs may have different data format and `reader()` function, developers may need to write different data prepare scripts and `reader()` functions for their job.
## Prepare Training program ### Prepare Training program
We'll create a *workspace* directory on each node, storing your training program, dependencies, mounted or downloaded dataset directory. We'll create a *workspace* directory on each node, storing your training program, dependencies, mounted or downloaded dataset directory.
@ -191,7 +191,7 @@ Your workspace may looks like:
- `train_data_dir`: containing training data. Mount from storage service or copy trainning data to here. - `train_data_dir`: containing training data. Mount from storage service or copy trainning data to here.
- `test_data_dir`: containing testing data. - `test_data_dir`: containing testing data.
# Use cluster platforms or cluster management tools ## Use cluster platforms or cluster management tools
PaddlePaddle supports running jobs on several platforms including: PaddlePaddle supports running jobs on several platforms including:
- [Kubernetes]( open-source system for automating deployment, scaling, and management of containerized applications from Google. - [Kubernetes]( open-source system for automating deployment, scaling, and management of containerized applications from Google.
@ -202,13 +202,13 @@ We'll introduce cluster job management on these platforms. The examples can be f
These cluster platforms provide API or environment variables for training processes, when the job is dispatched to different nodes. Like node ID, IP or total number of nodes etc. These cluster platforms provide API or environment variables for training processes, when the job is dispatched to different nodes. Like node ID, IP or total number of nodes etc.
## Cluster Training Using Fabric ### Cluster Training Using Fabric
### Prepare a Linux cluster #### Prepare a Linux cluster
Run `kubectl -f ssh_servers.yaml` under the directory: `paddle/scripts/cluster_train_v2/fabric/docker_cluster` will launch a demo cluster. Run `kubectl get po -o wide` to get IP addresses of these nodes. Run `kubectl -f ssh_servers.yaml` under the directory: `paddle/scripts/cluster_train_v2/fabric/docker_cluster` will launch a demo cluster. Run `kubectl get po -o wide` to get IP addresses of these nodes.
### Launching Cluster Job #### Launching Cluster Job
`` provides automatical scripts to start all PaddlePaddle cluster processes in different nodes. By default, all command line options can be set as `` command options and `` will transparently and automatically set these options to PaddlePaddle lower level processes. `` provides automatical scripts to start all PaddlePaddle cluster processes in different nodes. By default, all command line options can be set as `` command options and `` will transparently and automatically set these options to PaddlePaddle lower level processes.
``provides two distinguished command option for easy job launching. ``provides two distinguished command option for easy job launching.
@ -224,10 +224,10 @@ sh
The cluster Job will start in several seconds. The cluster Job will start in several seconds.
### Kill Cluster Job #### Kill Cluster Job
`` can capture `Ctrl + C` SIGINT signal to automatically kill all processes launched by it. So just stop `` to kill cluster job. You should manually kill the job if the program crashed. `` can capture `Ctrl + C` SIGINT signal to automatically kill all processes launched by it. So just stop `` to kill cluster job. You should manually kill the job if the program crashed.
### Check Cluster Training Result #### Check Cluster Training Result
Check log in $workspace/log for details, each node owns same log structure. Check log in $workspace/log for details, each node owns same log structure.
`paddle_trainer.INFO` `paddle_trainer.INFO`
@ -242,13 +242,13 @@ It provides stderr and stdout of parameter server process. Check error log if tr
`train.log` `train.log`
It provides stderr and stdout of trainer process. Check error log if training crashes. It provides stderr and stdout of trainer process. Check error log if training crashes.
### Check Model Output #### Check Model Output
After one pass finished, model files will be written in `output` directory in node 0. After one pass finished, model files will be written in `output` directory in node 0.
`nodefile` in workspace indicates the node id of current cluster job. `nodefile` in workspace indicates the node id of current cluster job.
## Cluster Training Using OpenMPI ### Cluster Training Using OpenMPI
### Prepare an OpenMPI cluster #### Prepare an OpenMPI cluster
Run the following command to start a 3-node MPI cluster and one "head" node. Run the following command to start a 3-node MPI cluster and one "head" node.
@ -260,7 +260,7 @@ kubectl create -f mpi-nodes.yaml
Then you can log in to every OpenMPI node using ssh without input any passwords. Then you can log in to every OpenMPI node using ssh without input any passwords.
### Launching Cluster Job #### Launching Cluster Job
Follow the steps to launch a PaddlePaddle training job in OpenMPI cluster:\ Follow the steps to launch a PaddlePaddle training job in OpenMPI cluster:\
@ -288,6 +288,6 @@ scp train.txt-00002 test.txt-00002 [node3IP]:/home/tutorial
mpirun -hostfile machines -n 3 /home/tutorial/ mpirun -hostfile machines -n 3 /home/tutorial/
``` ```
## Cluster Training Using Kubernetes ### Cluster Training Using Kubernetes
The details can be found [here](../k8s/ The details can be found [here](../k8s/

go/.gitignore vendored

@ -1,2 +1,3 @@
vendor/ vendor/
.glide/ .glide/

go/glide.lock generated

@ -1,5 +1,5 @@
hash: 51d9e2e46d7fd9173ff11ecada40f7b7728756be18d5e2f032535f66465e6e15 hash: 107c058cf5c9163a75d40eef2273a793c36112683c25d72aa8288827fdde3a19
updated: 2017-10-24T15:04:09.987751592-07:00 updated: 2017-10-30T03:46:19.137696069Z
imports: imports:
- name: - name:
version: bae2f1293d092fd8167939d5108d1b025eaef9de version: bae2f1293d092fd8167939d5108d1b025eaef9de

@ -30,3 +30,4 @@ import:
version: v2.13 version: v2.13
- package: - package:
version: v1.6.0 version: v1.6.0
- package:

@ -0,0 +1,4 @@
# Ignore everything in this directory
# Except this file

@ -13,5 +13,5 @@
# limitations under the License. # limitations under the License.
# #
go_test(pserver_test DEPS paddle_go_optimizer) go_test(pserver_test DEPS paddle_go_optimizer gen_proto_go)
endif() endif()

@ -17,6 +17,7 @@ package pserver
import ( import (
"bufio" "bufio"
"bytes" "bytes"
"encoding/gob" "encoding/gob"
"encoding/json" "encoding/json"
"errors" "errors"
@ -26,11 +27,15 @@ import (
"os" "os"
"path" "path"
"strconv" "strconv"
"sync" "sync"
"time" "time"
uuid "" uuid ""
pb ""
log "" log ""
) )
@ -65,6 +70,46 @@ type Parameter struct {
Content []byte Content []byte
} }
func float32ToString(b []byte) string {
f := make([]float32, len(b)/4)
buf := bytes.NewReader(b)
err := binary.Read(buf, binary.LittleEndian, &f)
if err != nil {
return ""
return fmt.Sprintf("%v", f)
func float32ByteToString(c []byte) string {
var a []byte
var b []byte
if len(c) <= 80 {
a = c
} else {
a = c[0:40]
b = c[len(c)-40:]
var s string
s = float32ToString(a)
if b == nil {
return s
s = strings.Replace(s, "]", "", -1) + "..." + strings.Replace(float32ToString(b), "[", "", -1)
return s
func (p Parameter) String() string {
if p.ElementType != Float32 {
return fmt.Sprintf("name:%v ElementType:%v",
p.Name, p.ElementType)
return float32ByteToString(p.Content)
// ParameterWithConfig contains the parameter and the configuration. // ParameterWithConfig contains the parameter and the configuration.
type ParameterWithConfig struct { type ParameterWithConfig struct {
Param Parameter Param Parameter
@ -189,7 +234,9 @@ func (s *Service) InitParam(paramWithConfigs ParameterWithConfig, _ *int) error
default: default:
} }
// TODO(helin): parse parameter config c := &pb.OptimizerConfig{}
proto.Unmarshal(paramWithConfigs.Config, c)
log.Debug(fmt.Sprintf("OptimizerConfig:%v", c))
defer defer
@ -239,7 +286,8 @@ func (s *Service) SendGrad(g Gradient, _ *int) error {
select { select {
case <-s.initialized: case <-s.initialized:
default: default:
log.Warn("received gradient before initialization.", "name", g.Name, "size", len(g.Content), "type", g.ElementType) log.Warn("received gradient before initialization.",
"name", g.Name, "size", len(g.Content), "type", g.ElementType)
return errors.New(Uninitialized) return errors.New(Uninitialized)
} }
@ -248,10 +296,14 @@ func (s *Service) SendGrad(g Gradient, _ *int) error {
o, ok := s.optMap[g.Name] o, ok := s.optMap[g.Name]
if !ok { if !ok {
log.Warn("received gradient but can't find name.",
"name", g.Name, "size", len(g.Content), "type", g.ElementType)
return fmt.Errorf("parameter: %s does not exist", g.Name) return fmt.Errorf("parameter: %s does not exist", g.Name)
} }
log.Info("received gradient from trainer, updating gradient.", "name", g.Name, "size", len(g.Content), "type", g.ElementType) log.Debug(Parameter(g).String())
log.Info("received gradient from trainer, updating gradient.",
"name", g.Name, "size", len(g.Content), "type", g.ElementType)
return o.UpdateParameter(g) return o.UpdateParameter(g)
} }
@ -277,7 +329,7 @@ func (s *Service) GetParam(name string, parameter *Parameter) error {
parameter.Name = name parameter.Name = name
parameter.ElementType = opt.elementType parameter.ElementType = opt.elementType
parameter.Content = opt.GetWeights() parameter.Content = opt.GetWeights()
log.Info("sending parameter to the trainer", "name", parameter.Name, "size", len(parameter.Content), "type", parameter.ElementType) log.Info("sending parameter to the trainer", "name", parameter.Name, "size", len(parameter.Content), "type", parameter.ElementType)
return nil return nil
} }

@ -15,6 +15,7 @@
package pserver_test package pserver_test
import ( import (
"io/ioutil" "io/ioutil"
"reflect" "reflect"
"sync" "sync"
@ -178,3 +179,33 @@ func TestBlockUntilInitialized(t *testing.T) {
wg.Wait() wg.Wait()
} }
func TestGradientString(t *testing.T) {
g := pserver.Parameter{}
g.ElementType = pserver.Float32
g.Content = []byte{0x18, 0x2d, 0x44, 0x54, 0xfb, 0x21, 0x09, 0x40, 0x18, 0x2d, 0x44, 0x54, 0xfb, 0x21, 0x09, 0x40}
if g.String() != "[3.3702806e+12 2.142699 3.3702806e+12 2.142699]" {
t.Fatal("get float data error!")
g.Content = []byte{0x18, 0x2d, 0x44, 0x54, 0xfb, 0x21, 0x09, 0x40,
0x18, 0x2d, 0x44, 0x54, 0xfb, 0x21, 0x09, 0x40,
0x18, 0x2d, 0x44, 0x54, 0xfb, 0x21, 0x09, 0x40,
0x18, 0x2d, 0x44, 0x54, 0xfb, 0x21, 0x09, 0x40,
0x18, 0x2d, 0x44, 0x54, 0xfb, 0x21, 0x09, 0x40,
0x18, 0x2d, 0x44, 0x54, 0xfb, 0x21, 0x09, 0x40,
0x18, 0x2d, 0x44, 0x54, 0xfb, 0x21, 0x09, 0x40,
0x18, 0x2d, 0x44, 0x54, 0xfb, 0x21, 0x09, 0x40,
0x18, 0x2d, 0x44, 0x54, 0xfb, 0x21, 0x09, 0x40,
0x18, 0x2d, 0x44, 0x54, 0xfb, 0x21, 0x09, 0x40,
0x18, 0x2d, 0x44, 0x54, 0xfb, 0x21, 0x09, 0x40,
0x18, 0x2d, 0x44, 0x54, 0xfb, 0x21, 0x09, 0x40,
0x18, 0x2d, 0x44, 0x54, 0xfb, 0x21, 0x09, 0x40,
0x18, 0x2d, 0x44, 0x54, 0xfb, 0x21, 0x09, 0x40,
0x18, 0x2d, 0x44, 0x54, 0xfb, 0x21, 0x09, 0x40,
0x18, 0x2d, 0x44, 0x54, 0xfb, 0x21, 0x09, 0x40}
if g.String() != "[3.3702806e+12 2.142699 3.3702806e+12 2.142699 3.3702806e+12 2.142699 3.3702806e+12 2.142699 3.3702806e+12 2.142699...3.3702806e+12 2.142699 3.3702806e+12 2.142699 3.3702806e+12 2.142699 3.3702806e+12 2.142699 3.3702806e+12 2.142699]" {
t.Fatal("get float data error!", g.String())

@ -1,24 +1,29 @@
# gserver pacakge unittests # gserver pacakge unittests
if(NOT MOBILE_INFERENCE) add_simple_unittest(test_LinearChainCRF)
################### test_ProtoDataProvider ############ add_simple_unittest(test_MultinomialSampler)
add_unittest_without_exec(test_ProtoDataProvider add_simple_unittest(test_RecurrentLayer)
# test_ProtoDataProvider will mkdir as same name,
# so if WORKING_DIRECTORY is default directory, then
# mkdir will get error.
add_test(NAME test_ProtoDataProvider
################# test_LayerGrad ####################### function(gserver_test TARGET)
add_unittest_without_exec(test_LayerGrad add_unittest_without_exec(${TARGET}
test_LayerGrad.cpp ${TARGET}.cpp
LayerGradUtil.cpp) LayerGradUtil.cpp)
add_test(NAME test_LayerGrad add_test(NAME ${TARGET}
########## test_Mkldnn layers and activations ########## ########## test_Mkldnn layers and activations ##########
@ -32,89 +37,6 @@ if(WITH_MKLDNN)
endif() endif()
################ test_CRFLayerGrad ####################
add_test(NAME test_CRFLayerGrad
COMMAND test_CRFLayerGrad)
################ test_CrossEntropyOverBeam ####################
add_test(NAME test_CrossEntropyOverBeam
COMMAND test_CrossEntropyOverBeam)
################ test_SeqSliceLayerGrad ####################
add_test(NAME test_SeqSliceLayerGrad
COMMAND test_SeqSliceLayerGrad)
add_test(NAME test_ActivationGrad
COMMAND test_ActivationGrad)
################# test_ConvTrans #######################
add_test(NAME test_ConvTrans
COMMAND test_ConvTrans)
################# test_PriorBox #######################
add_test(NAME test_PriorBox
COMMAND test_PriorBox)
################# test_DetectionOutput #######################
add_test(NAME test_DetectionOutput
COMMAND test_DetectionOutput)
################# test_ConvUnify #######################
add_test(NAME test_ConvUnify
COMMAND test_ConvUnify)
################# test_BatchNorm #######################
add_test(NAME test_BatchNorm
COMMAND test_BatchNorm)
################# test_KmaxSeqScore #######################
add_test(NAME test_KmaxSeqScore
COMMAND test_KmaxSeqScore)
################## test_Evaluator #######################
################ test_LinearChainCRF ####################
############## test_MultinomialSampler ###################
############## test_PyDataProvider ######################## ############## test_PyDataProvider ########################
add_unittest_without_exec(test_PyDataProvider add_unittest_without_exec(test_PyDataProvider
@ -125,9 +47,6 @@ if(WITH_PYTHON)
endif() endif()
############### test_RecurrentLayer #######################
############### test_WarpCTCLayer ####################### ############### test_WarpCTCLayer #######################
add_unittest_without_exec(test_WarpCTCLayer add_unittest_without_exec(test_WarpCTCLayer
@ -139,19 +58,33 @@ if(NOT WITH_DOUBLE)
endif() endif()
################### test_ProtoDataProvider ############
# test_ProtoDataProvider will mkdir as same name,
# so if WORKING_DIRECTORY is default directory, then
# mkdir will get error.
add_test(NAME test_ProtoDataProvider
################## test_Evaluator #######################
############### test_RecurrentGradientMachine ############### ############### test_RecurrentGradientMachine ###############
# TODO(yuyang18): There is some bug in test_RecurrentGradientMachine # TODO(yuyang18): There is some bug in test_RecurrentGradientMachine
# I will fix it. # I will fix it.
add_unittest_without_exec(test_RecurrentGradientMachine add_unittest_without_exec(test_RecurrentGradientMachine
test_RecurrentGradientMachine.cpp) test_RecurrentGradientMachine.cpp)
add_test(NAME test_RecurrentGradientMachine add_test(NAME test_RecurrentGradientMachine
${PADDLE_SOURCE_DIR}/python:${PADDLE_SOURCE_DIR}/paddle/gserver/tests ${PADDLE_SOURCE_DIR}/python:${PADDLE_SOURCE_DIR}/paddle/gserver/tests
${CMAKE_CURRENT_BINARY_DIR}/test_RecurrentGradientMachine ${CMAKE_CURRENT_BINARY_DIR}/test_RecurrentGradientMachine
if(NOT MOBILE_INFERENCE) ############### test_NetworkCompare ###############
add_unittest_without_exec(test_NetworkCompare add_unittest_without_exec(test_NetworkCompare
test_NetworkCompare.cpp) test_NetworkCompare.cpp)

@ -0,0 +1,127 @@
/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
See the License for the specific language governing permissions and
limitations under the License. */
#include <gtest/gtest.h>
#include <string>
#include <vector>
#include "LayerGradUtil.h"
#include "paddle/testing/TestUtil.h"
using namespace paddle; // NOLINT
using namespace std; // NOLINT
// Do one forward pass of expand layer and check to see if its output
// matches the given result.(Test onlyCPU currently.)
void doOneExpandTest(string trans_type,
bool hasSubseq,
bool useGpu,
Argument& input1,
Argument& input2,
Argument& result) {
FLAGS_use_gpu = false;
// Setting up the expand layer
TestConfig config;
auto inputType1 =
trans_type == "non-seq" ? INPUT_DENSE_DIM_DATA : INPUT_SEQUENCE_DATA;
config.inputDefs.push_back({inputType1, "layer0", 1, 0});
auto inputType2 =
config.inputDefs.push_back({inputType2, "layer1", 1, 0});
// data layer initialize
std::vector<DataLayerPtr> dataLayers;
LayerMap layerMap;
vector<Argument> datas;
config, &dataLayers, &datas, &layerMap, "expand", 1, false, useGpu);
dataLayers[0]->getOutput() = input1;
dataLayers[1]->getOutput() = input2;
// test layer initialize
std::vector<ParameterPtr> parameters;
LayerPtr expandLayer;
initTestLayer(config, &layerMap, &parameters, &expandLayer);
checkMatrixEqual(expandLayer->getOutputValue(), result.value);
TEST(Layer, ExpandLayerFwd) {
bool useGpu = false;
// Assume batch_size =3 in all cases.
// CPU case 1. non-seq expand to seq
// input1 = 1,2,3
// input2 = [4,5],[6],[7,8,9]
// result = [1,1],[2],[3,3,3]
Argument input1, input2, result;
input1.value = Matrix::create(3, 1, false, useGpu);
real input1Data[] = {1, 2, 3};
input2.value = Matrix::create(6, 1, false, useGpu);
real input2Data[] = {4, 5, 6, 7, 8, 9};
input2.sequenceStartPositions = ICpuGpuVector::create(4, useGpu);
int input2Seq[] = {0, 2, 3, 6};
input2.sequenceStartPositions->copyFrom(input2Seq, 4, useGpu);
result.value = Matrix::create(6, 1, false, useGpu);
real resultData[] = {1, 1, 2, 3, 3, 3};
doOneExpandTest("non-seq", false, useGpu, input1, input2, result);
// CPU case 2. non-seq expand to sub-seq
// NOTE: input1.batch_size == input2.sequencelength in this case.
// i.e, input1 expands by input2.sequence
// input1 = 1,2,3
// input2 = [[4,5]],[[6]],[[7],[8,9]]
// result = [[1,1]],[[2]],[[3],[3,3]]
input2.subSequenceStartPositions = ICpuGpuVector::create(5, useGpu);
int input2SubSeq[] = {0, 2, 3, 4, 6};
input2.subSequenceStartPositions->copyFrom(input2SubSeq, 5, useGpu);
doOneExpandTest("non-seq", true, useGpu, input1, input2, result);
// CPU case 3. seq expand to sub-seq
// input1 = [1,2],[3],[4]
// input2 = [[4,5]],[[6]],[[7],[8,9]]
// result = [[1,1]],[[2]],[[3],[4,4]]
Matrix::resizeOrCreate(input1.value, 4, 1, false, useGpu);
real input1Data_case3[] = {1, 2, 3, 4};
input1.sequenceStartPositions = ICpuGpuVector::create(4, useGpu);
int input1Seq[] = {0, 2, 3, 4};
input1.sequenceStartPositions->copyFrom(input1Seq, 4, useGpu);
real resultData_case3[] = {1, 1, 2, 3, 4, 4};
doOneExpandTest("seq", true, useGpu, input1, input2, result);
int main(int argc, char** argv) {
testing::InitGoogleTest(&argc, argv);
initMain(argc, argv);
return RUN_ALL_TESTS();

@ -22,22 +22,35 @@ class AccuracyOp : public framework::OperatorWithKernel {
using framework::OperatorWithKernel::OperatorWithKernel; using framework::OperatorWithKernel::OperatorWithKernel;
void InferShape(framework::InferShapeContext *ctx) const override { void InferShape(framework::InferShapeContext *ctx) const override {
PADDLE_ENFORCE(ctx->HasInput("Inference"), PADDLE_ENFORCE(ctx->HasInput("Out"),
"Input(Inference) of AccuracyOp should not be null."); "Input (Out) of accuracy op should not be null.");
"Input (Indices) of accuracy op should not be null.");
PADDLE_ENFORCE(ctx->HasInput("Label"), PADDLE_ENFORCE(ctx->HasInput("Label"),
"Input(Label) of AccuracyOp should not be null."); "Input (Label) of accuracy op should not be null.");
PADDLE_ENFORCE(ctx->HasOutput("Accuracy"), PADDLE_ENFORCE(ctx->HasOutput("Accuracy"),
"Output(Accuracy) of AccuracyOp should not be null."); "Output (Accuracy) of AccuracyOp should not be null.");
auto inference_dim = ctx->GetInputDim("Inference"); auto inference_dim = ctx->GetInputDim("Out");
auto label_dim = ctx->GetInputDim("Label"); auto label_dim = ctx->GetInputDim("Label");
// Assume indices has same shape with infernece, because
// it's the output of topk.
PADDLE_ENFORCE_EQ(label_dim.size(), 1, "label must be a vector"); PADDLE_ENFORCE_EQ(label_dim.size(), 2, "label's rank must be 2.");
PADDLE_ENFORCE_EQ(label_dim[1], 1, "label's second dimension must be 1");
PADDLE_ENFORCE_EQ(inference_dim[0], label_dim[0], PADDLE_ENFORCE_EQ(inference_dim[0], label_dim[0],
"inference size must be the same as label size"); "the inference tensor's num_rows must be"
" the same as label.");
ctx->SetOutputDim("Accuracy", {1}); ctx->SetOutputDim("Accuracy", {1});
ctx->ShareLoD("Inference", /*->*/ "Accuracy"); ctx->ShareLoD("Out", /*->*/ "Accuracy");
// IndicateDataType
framework::DataType IndicateDataType(
const framework::ExecutionContext &ctx) const override {
return framework::ToDataType(ctx.Input<Tensor>("Out")->type());
} }
}; };
@ -47,7 +60,8 @@ class AccuracyOpMaker : public framework::OpProtoAndCheckerMaker {
framework::OpAttrChecker *op_checker) framework::OpAttrChecker *op_checker)
: OpProtoAndCheckerMaker(proto, op_checker) { : OpProtoAndCheckerMaker(proto, op_checker) {
// TODO(typhoonzero): support both inference value and indices. // TODO(typhoonzero): support both inference value and indices.
AddInput("Inference", "topk(indices) the network output"); AddInput("Out", "topk (inferences) the network output");
AddInput("Indices", "topk (indices) the network output");
AddInput("Label", "Label of the training data"); AddInput("Label", "Label of the training data");
// TODO(typhoonzero): AddInput("Weight", ... // TODO(typhoonzero): AddInput("Weight", ...
AddOutput("Accuracy", "The accuracy of current batch"); AddOutput("Accuracy", "The accuracy of current batch");
@ -58,7 +72,7 @@ The accuracy is:
.. math:: .. math::
accuracy = \\frac{NumOfCorrectPredicts}{NumOfAllSamples}) accuracy = \\frac{NumOfCorrectPredicts}{NumOfAllSamples})
Both the input `Inference` and `Label` can carry the LoD (Level of Details) Both the input `Out` and `Label` can carry the LoD (Level of Details)
information, or not. But the output only shares the LoD with input `Inference`. information, or not. But the output only shares the LoD with input `Inference`.
)DOC"); )DOC");
} }
@ -68,7 +82,10 @@ information, or not. But the output only shares the LoD with input `Inference`.
} // namespace paddle } // namespace paddle
namespace ops = paddle::operators; namespace ops = paddle::operators;
REGISTER_OP_WITHOUT_GRADIENT(accuracy, ops::AccuracyOp, ops::AccuracyOpMaker); REGISTER_OPERATOR(accuracy, ops::AccuracyOp, ops::AccuracyOpMaker,
REGISTER_OP_CPU_KERNEL( paddle::framework::EmptyGradOpMaker);
accuracy, ops::AccuracyKernel<paddle::platform::CPUPlace, int>, // FIXME(typhoonzero): types of T is for infernece data.
ops::AccuracyKernel<paddle::platform::CPUPlace, int64_t>); // label data is always int.
ops::AccuracyKernel<paddle::platform::CPUPlace, float>,
ops::AccuracyKernel<paddle::platform::CPUPlace, double>);

@ -21,9 +21,10 @@ namespace paddle {
namespace operators { namespace operators {
using platform::PADDLE_CUDA_NUM_THREADS; using platform::PADDLE_CUDA_NUM_THREADS;
template <typename T, int BlockSize> template <int BlockSize>
__global__ void AccuracyCudaKernel(const int N, const int D, const T* Xdata, __global__ void AccuracyCudaKernel(const int N, const int D,
const T* labeldata, float* accuracy) { const int64_t* Xdata,
const int64_t* labeldata, float* accuracy) {
int count = 0; int count = 0;
__shared__ int total[BlockSize]; __shared__ int total[BlockSize];
@ -52,13 +53,14 @@ class AccuracyOpCUDAKernel : public framework::OpKernel<T> {
void Compute(const framework::ExecutionContext& ctx) const override { void Compute(const framework::ExecutionContext& ctx) const override {
PADDLE_ENFORCE(platform::is_gpu_place(ctx.GetPlace()), PADDLE_ENFORCE(platform::is_gpu_place(ctx.GetPlace()),
"It must use GPUPlace."); "It must use GPUPlace.");
auto* inference = ctx.Input<Tensor>("Inference"); auto* inference = ctx.Input<Tensor>("Out");
auto* indices = ctx.Input<Tensor>("Indices");
auto* label = ctx.Input<Tensor>("Label"); auto* label = ctx.Input<Tensor>("Label");
auto* accuracy = ctx.Output<Tensor>("Accuracy"); auto* accuracy = ctx.Output<Tensor>("Accuracy");
// FIXME(typhoonzero): only support indices currently // FIXME(typhoonzero): only support indices currently
// if add support for output values, how to detect the data type? // if add support for output values, how to detect the data type?
const T* inference_data = inference->data<T>(); const int64_t* indices_data = indices->data<int64_t>();
const T* label_data = label->data<T>(); const int64_t* label_data = label->data<int64_t>();
float* accuracy_data = accuracy->mutable_data<float>(ctx.GetPlace()); float* accuracy_data = accuracy->mutable_data<float>(ctx.GetPlace());
size_t num_samples = inference->dims()[0]; size_t num_samples = inference->dims()[0];
@ -69,11 +71,11 @@ class AccuracyOpCUDAKernel : public framework::OpKernel<T> {
return; return;
} }
AccuracyCudaKernel<T, PADDLE_CUDA_NUM_THREADS><<< AccuracyCudaKernel<PADDLE_CUDA_NUM_THREADS><<<
reinterpret_cast<const platform::CUDADeviceContext&>( reinterpret_cast<const platform::CUDADeviceContext&>(
ctx.device_context()) ctx.device_context())
.stream()>>>(num_samples, infer_width, inference_data, label_data, .stream()>>>(num_samples, infer_width, indices_data, label_data,
accuracy_data); accuracy_data);
} }
}; };
@ -81,5 +83,7 @@ class AccuracyOpCUDAKernel : public framework::OpKernel<T> {
} // namespace operators } // namespace operators
} // namespace paddle } // namespace paddle
REGISTER_OP_GPU_KERNEL(accuracy, paddle::operators::AccuracyOpCUDAKernel<int>, // FIXME(typhoonzero): types of T is for infernece data.
paddle::operators::AccuracyOpCUDAKernel<int64_t>); // label data is always int
REGISTER_OP_GPU_KERNEL(accuracy, paddle::operators::AccuracyOpCUDAKernel<float>,

@ -38,14 +38,15 @@ template <typename Place, typename T>
class AccuracyKernel : public framework::OpKernel<T> { class AccuracyKernel : public framework::OpKernel<T> {
public: public:
void Compute(const framework::ExecutionContext& ctx) const override { void Compute(const framework::ExecutionContext& ctx) const override {
auto* inference = ctx.Input<Tensor>("Inference"); auto* inference = ctx.Input<Tensor>("Out");
auto* indices = ctx.Input<Tensor>("Indices");
auto* label = ctx.Input<Tensor>("Label"); auto* label = ctx.Input<Tensor>("Label");
auto* accuracy = ctx.Output<Tensor>("Accuracy"); auto* accuracy = ctx.Output<Tensor>("Accuracy");
float* accuracy_data = accuracy->mutable_data<float>(ctx.GetPlace()); float* accuracy_data = accuracy->mutable_data<float>(ctx.GetPlace());
const T* inference_data = inference->data<T>(); const int64_t* indices_data = indices->data<int64_t>();
const T* label_data = label->data<T>(); const int64_t* label_data = label->data<int64_t>();
size_t num_samples = inference->dims()[0]; size_t num_samples = inference->dims()[0];
size_t class_dim = inference->dims()[1]; size_t class_dim = inference->dims()[1];
@ -60,7 +61,7 @@ class AccuracyKernel : public framework::OpKernel<T> {
for (size_t i = 0; i < num_samples; ++i) { for (size_t i = 0; i < num_samples; ++i) {
PADDLE_ENFORCE_GE(label_data[i], 0, "label must >= 0"); PADDLE_ENFORCE_GE(label_data[i], 0, "label must >= 0");
for (size_t j = 0; j < class_dim; ++j) { for (size_t j = 0; j < class_dim; ++j) {
if (inference_data[i * class_dim + j] == label_data[i]) { if (indices_data[i * class_dim + j] == label_data[i]) {
++num_correct; ++num_correct;
break; break;
} }

@ -547,6 +547,7 @@ struct ELUGradFunctor : public BaseActivationFunctor<T> {
} }
}; };
// FIXME(qijun)
template <typename T> template <typename T>
struct PowFunctor : public BaseActivationFunctor<T> { struct PowFunctor : public BaseActivationFunctor<T> {
float factor; float factor;

@ -23,18 +23,26 @@ class AucOp : public framework::OperatorWithKernel {
protected: protected:
void InferShape(framework::InferShapeContext *ctx) const override { void InferShape(framework::InferShapeContext *ctx) const override {
PADDLE_ENFORCE(ctx->HasInput("Inference"), PADDLE_ENFORCE(ctx->HasInput("Out"), "Input of Out must be initialized.");
"Input of Inference must be initialized."); PADDLE_ENFORCE(ctx->HasInput("Indices"),
"Input of Indices must be initialized.");
PADDLE_ENFORCE(ctx->HasInput("Label"), PADDLE_ENFORCE(ctx->HasInput("Label"),
"Input of Label must be initialized."); "Input of Label must be initialized.");
auto inference_dim = ctx->GetInputDim("Inference"); auto inference_height = ctx->GetInputDim("Out")[0];
auto label_dim = ctx->GetInputDim("Label"); auto label_height = ctx->GetInputDim("Label")[0];
PADDLE_ENFORCE_EQ(inference_dim, label_dim, PADDLE_ENFORCE_EQ(inference_height, label_height,
"inference and label should have same shape"); "Out and Label should have same height.");
ctx->SetOutputDim("AUC", {1}); ctx->SetOutputDim("AUC", {1});
ctx->ShareLoD("Inference", /*->*/ "AUC"); ctx->ShareLoD("Out", /*->*/ "AUC");
// IndicateDataType
framework::DataType IndicateDataType(
const framework::ExecutionContext &ctx) const override {
return framework::ToDataType(ctx.Input<Tensor>("Out")->type());
} }
}; };
@ -42,12 +50,18 @@ class AucOpMaker : public framework::OpProtoAndCheckerMaker {
public: public:
AucOpMaker(framework::OpProto *proto, framework::OpAttrChecker *op_checker) AucOpMaker(framework::OpProto *proto, framework::OpAttrChecker *op_checker)
: OpProtoAndCheckerMaker(proto, op_checker) { : OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("Inference", AddInput("Out",
"A floating point tensor of arbitrary shape and whose values" "A floating point 2D tensor, values are in the range [0, 1]."
"are in the range [0, 1]."); "Each row is descend sorted. This input should be the"
"output of topk."
"Typically, this tensor indicates the probability of each label");
"An int 2D tensor, indicating the indices of original"
"tensor before sort. Typically, this tensor indicates which label"
"the probability stands for.");
AddInput("Label", AddInput("Label",
"A tensor whose shape matches " "A 2D int tensor indicating the label of the training data."
"Inference. Will be cast to bool."); "The height is batch size and width is always 1.");
// TODO(typhoonzero): support weight input // TODO(typhoonzero): support weight input
AddOutput("AUC", AddOutput("AUC",
"A scalar representing the " "A scalar representing the "

@ -29,7 +29,7 @@ template <typename Place, typename T>
class AucKernel : public framework::OpKernel<T> { class AucKernel : public framework::OpKernel<T> {
public: public:
void Compute(const framework::ExecutionContext& ctx) const override { void Compute(const framework::ExecutionContext& ctx) const override {
auto* inference = ctx.Input<Tensor>("Inference"); auto* inference = ctx.Input<Tensor>("Out");
auto* label = ctx.Input<Tensor>("Label"); auto* label = ctx.Input<Tensor>("Label");
auto* auc = ctx.Output<Tensor>("AUC"); auto* auc = ctx.Output<Tensor>("AUC");
@ -46,18 +46,11 @@ class AucKernel : public framework::OpKernel<T> {
thresholds_list[0] = 0.0f - kEpsilon; thresholds_list[0] = 0.0f - kEpsilon;
thresholds_list[num_thresholds - 1] = 1.0f + kEpsilon; thresholds_list[num_thresholds - 1] = 1.0f + kEpsilon;
size_t num_samples = inference->numel(); size_t batch_size = inference->dims()[0];
size_t inference_width = inference->dims()[1];
const T* inference_data = inference->data<T>(); const T* inference_data = inference->data<T>();
Tensor label_casted; const int64_t* label_data = label->data<int64_t>();
bool* label_casted_data = label_casted.mutable_data<bool>(ctx.GetPlace());
const int* label_data = label->data<int>();
// cast label_data to bool
for (size_t i = 0; i < num_samples; i++) {
label_casted_data[i] = static_cast<bool>(label_data[i]);
// Create local tensor for storing the curve: TP, FN, TN, FP // Create local tensor for storing the curve: TP, FN, TN, FP
// TODO(typhoonzero): use eigen op to caculate these values. // TODO(typhoonzero): use eigen op to caculate these values.
@ -68,23 +61,27 @@ class AucKernel : public framework::OpKernel<T> {
true_negative.Resize({num_thresholds}); true_negative.Resize({num_thresholds});
false_positive.Resize({num_thresholds}); false_positive.Resize({num_thresholds});
int* tp_data = true_positive.mutable_data<int>(ctx.GetPlace()); int64_t* tp_data = true_positive.mutable_data<int64_t>(ctx.GetPlace());
int* fn_data = false_negative.mutable_data<int>(ctx.GetPlace()); int64_t* fn_data = false_negative.mutable_data<int64_t>(ctx.GetPlace());
int* tn_data = true_negative.mutable_data<int>(ctx.GetPlace()); int64_t* tn_data = true_negative.mutable_data<int64_t>(ctx.GetPlace());
int* fp_data = false_positive.mutable_data<int>(ctx.GetPlace()); int64_t* fp_data = false_positive.mutable_data<int64_t>(ctx.GetPlace());
for (int idx_thresh = 0; idx_thresh < num_thresholds; idx_thresh++) { for (int idx_thresh = 0; idx_thresh < num_thresholds; idx_thresh++) {
// caculate TP, FN, TN, FP for current thresh // caculate TP, FN, TN, FP for current thresh
int tp = 0, fn = 0, tn = 0, fp = 0; int64_t tp = 0, fn = 0, tn = 0, fp = 0;
for (size_t i = 0; i < num_samples; i++) { for (size_t i = 0; i < batch_size; i++) {
if (label_casted_data[i]) { // NOTE: label_data used as bool, labels >0 will be treated as true.
if (inference_data[i] >= (thresholds_list[idx_thresh])) { if (label_data[i]) {
// use first(max) data in each row
if (inference_data[i * inference_width] >=
(thresholds_list[idx_thresh])) {
tp++; tp++;
} else { } else {
fn++; fn++;
} }
} else { } else {
if (inference_data[i] >= (thresholds_list[idx_thresh])) { if (inference_data[i * inference_width] >=
(thresholds_list[idx_thresh])) {
fp++; fp++;
} else { } else {
tn++; tn++;

File diff suppressed because it is too large Load Diff

@ -48,7 +48,7 @@ class SeqExpandKernel : public framework::OpKernel<T> {
x_t(x_data, 1, element_len); x_t(x_data, 1, element_len);
Eigen::TensorMap<Eigen::Tensor<T, 2, Eigen::RowMajor, Eigen::DenseIndex>> Eigen::TensorMap<Eigen::Tensor<T, 2, Eigen::RowMajor, Eigen::DenseIndex>>
out_t(out_data, scale, element_len); out_t(out_data, scale, element_len);
Eigen::array<int, 2> cast({scale, 1}); Eigen::array<int, 2> cast({{scale, 1}});
out_t.device(place) = x_t.broadcast(cast); out_t.device(place) = x_t.broadcast(cast);
x_data += element_len; x_data += element_len;
out_data += element_len * scale; out_data += element_len * scale;

Some files were not shown because too many files have changed in this diff Show More
