diff --git a/doc/api/v2/config/layer.rst b/doc/api/v2/config/layer.rst index 1329b77bb4..c94627a728 100644 --- a/doc/api/v2/config/layer.rst +++ b/doc/api/v2/config/layer.rst @@ -434,9 +434,9 @@ lambda_cost .. autoclass:: paddle.v2.layer.lambda_cost :noindex: -mse_cost +square_error_cost -------- -.. autoclass:: paddle.v2.layer.mse_cost +.. autoclass:: paddle.v2.layer.square_error_cost :noindex: rank_cost diff --git a/doc/design/functions_operators_layers.md b/doc/design/functions_operators_layers.md new file mode 100644 index 0000000000..7a2e8fd0ac --- /dev/null +++ b/doc/design/functions_operators_layers.md @@ -0,0 +1,99 @@ +# Design Doc: Functions, Operators, and Layers + +In a DL system, we can compose one or more fine grained operators into a coarse grained one. For example, the FC layer can be composed of a multiplication operator and an add operator. + +Historically, some fine grained operations are known as operators, and some coarse level ones are known as layers. But we need a well-defined separation. + +In general, operators are those very fine grained operations, e.g., mul and add. In the implementation, we can write them as C++ functions: + +```c++ +template T add(T x, T y) { return x + y; } +template T mul(T x, T y) { return x * y; } +``` + +Then we can wrap them into operators which are C++ classes and can be created from Python bindings by name. A C macro can do this. For example, the following macro invocation + +```c++ +#define MAKE_FUNCTION_OPERATOR(mul); +``` + +generates + +```c++ +template class mulOp : public OperatorBase {...}; +REGISTER_OP(mulOp, "mul"); +``` + +so that in Python we can create operator mul by: + +```python +X1 = Var() +X2 = Var() +Y = Var() +paddle.cpp.create_operator("mul", input=[X1, X2], output=Y) +``` + +Also, at the same time, we can compose a coarse level C++ operator class by composing functions `mul` and `add`: + +```c++ +template +class FCOp : public OperatorBase { + public: + void Run(...) { + add(mul(Input("X"), Input("W")), Input("b"); + } +}; +REGISTER_OP(FCOp, "fc"); +``` + +We need to support such composition in Python as well. To do so, we need a higher level Python wrapping of operator creation than `paddle.cpp.create_operator`. This higher level operator API should be compatible with the layer API. + +Let's explain using an example. Suppose that we are going to compose the FC using mul and add in Python, we'd like to have Python functions `mul` and `add` defined in module `operator`: + +```python +def operator.mul(X1, X2): + O = Var() + paddle.cpp.create_operator("mul", input={X1, Y1], output=O) + return O + +def operator.add(X1, X2): + O = Var() + paddle.cpp.create_operator("add", input={X1, X2], output=O) + return O +``` + +Above code snippets are automatically generated. Given them, users can define + +```python +def layer.fc(X): + W = Var() + b = Var() + return operator.add(operator.mul(X, W), b) +``` + +If we don't have `operator.mul` and `operator.add`, the definiton of `layer.fc` would be complicated: + +```python +def layer.fc(X): + W = Var() + b = Var() + O1 = Var() + paddle.cpp.create_operator("mul", input=[X, W], output=O1) + O2 = Var() + paddle.cpp.create_operator("add", input=[O1, b], output=O2) + return O2 +``` + +We'd like to have Python bindings to operators in package `paddle.operator`, and Python compositions of operators in package `paddle.layer`. So we have the following concepts in above illustrative example: + +``` +| C++ functions/functors | mul | add | | | +| C++ operator class | mulOp | addOp | FCOp | | +| Python binding | operator.mul | operator.add | operator.fc | | +| Python function | | | | layer.fc | +``` + +This is how we differentiate layer and operators in PaddlePaddle: + +- those defined in C++ and have a lightweighted Python wrapper in module `operators` are operators; whereas +- those who don't have C++ implementations but a Python implementation that compose C++ operators are known as layers. diff --git a/doc/design/graph.md b/doc/design/graph.md new file mode 100644 index 0000000000..87f696f90f --- /dev/null +++ b/doc/design/graph.md @@ -0,0 +1,51 @@ +# Design Doc: Computations as Graphs + +A primary goal of the refactorization of PaddlePaddle is a more flexible representation of deep learning computation, in particular, a graph of operators and variables, instead of sequences of layers as before. + +This document explains that the construction of a graph as three steps: + +- construct the forward part +- construct the backward part +- construct the optimization part + +Let us take the problem of image classification as a simple example. The application program that trains the model looks like: + +```python +x = layer.data("images") +l = layer.data("label") +y = layer.fc(x) +cost = layer.mse(y, l) +optimize(cost) +train(cost, reader=mnist.train()) +``` + +### Forward Part + +The first four lines of above program build the forward part of the graph. + +![](images/graph_construction_example_forward_only.png) + +In particular, the first line `x = layer.data("images")` creates variable x and a Feed operator that copies a column from the minibatch to x. `y = layer.fc(x)` creates not only the FC operator and output variable y, but also two parameters, W and b. + +In this example, all operators are created as `OpDesc` protobuf messages, and all variables are `VarDesc`. These protobuf messages are saved in a `BlockDesc` protobuf message. + +### Backward Part + +The fifth line `optimize(cost)` calls two functions, `ConstructBackwardGraph` and `ConstructOptimizationGraph`. + +`ConstructBackwardGraph` traverses the forward graph in the `BlockDesc` protobuf message and builds the backward part. + +![](images/graph_construction_example_forward_backward.png) + +According to the chain rule of gradient computation, `ConstructBackwardGraph` would + +1. create a gradient operator G for each operator F, +1. make all inputs, outputs, and outputs' gradient of F as inputs of G, +1. create gradients for all inputs of F, except for those who don't have gradients, like x and l, and +1. make all these gradients as outputs of G. + +### Optimization Part + +For each parameter, like W and b created by `layer.fc`, marked as double circles in above graphs, `ConstructOptimizationGraph` creates an optimization operator to apply its gradient. Here results in the complete graph: + +![](images/graph_construction_example_all.png) diff --git a/doc/design/if_else_op.md b/doc/design/if_else_op.md new file mode 100644 index 0000000000..7370c2a24f --- /dev/null +++ b/doc/design/if_else_op.md @@ -0,0 +1,59 @@ +IfOp should have only one branch. An IfOp operator takes a `cond` variable whose value must be a vector of N boolean elements. Its return value has M (M<=N) instances, each corresponds to a true element in `cond`. + +```python +import paddle as pd + +x = var() +y = var() +cond = var() + +b = pd.create_ifop(inputs=[x], output_num=1) +with b.true_block(): + x = b.inputs(0) + z = operator.add(x, y) + b.set_output(0, operator.softmax(z)) + +out = b(cond) +``` + +If we want the output still has N instances, we can use IfElseOp with a default value, whose minibatch size must be N: + +```python +import paddle as pd + +x = var() +y = var() +cond = var() +default_value = var() +b = pd.create_ifelseop(inputs=[x], output_num=1) +with b.true_block(): + x = b.inputs(0) + z = operator.add(x, y) + b.set_output(0, operator.softmax(z)) + +with b.false_block(): + x = b.inputs(0) + z = layer.fc(x) + b.set_output(0, operator.softmax(z)) + +out = b(cond) +``` + +If only true_block is set in an IfElseOp, we can have a default value for false as: +```python +import paddle as pd + +x = var() +y = var() +cond = var() +default_value = var() +b = pd.create_ifelseop(inputs=[x], output_num=1, default_value) + +with b.true_block(): + x = b.inputs(0) + z = operator.add(x, y) + b.set_output(0, operator.softmax(z)) + +out = b(cond) +``` +where default_value is a list of vars for `cond` == False. diff --git a/doc/design/images/graph_construction_example.bash b/doc/design/images/graph_construction_example.bash new file mode 100755 index 0000000000..35e6997abd --- /dev/null +++ b/doc/design/images/graph_construction_example.bash @@ -0,0 +1,11 @@ +cat ./graph_construction_example.dot | \ + sed 's/color=red/color=red, style=invis/g' | \ + sed 's/color=green/color=green, style=invis/g' | \ + dot -Tpng > graph_construction_example_forward_only.png + +cat ./graph_construction_example.dot | \ + sed 's/color=green/color=green, style=invis/g' | \ + dot -Tpng > graph_construction_example_forward_backward.png + +cat ./graph_construction_example.dot | \ + dot -Tpng > graph_construction_example_all.png diff --git a/doc/design/images/graph_construction_example.dot b/doc/design/images/graph_construction_example.dot new file mode 100644 index 0000000000..bedb6de011 --- /dev/null +++ b/doc/design/images/graph_construction_example.dot @@ -0,0 +1,65 @@ +digraph ImageClassificationGraph { + ///////// The forward part ///////// + FeedX [label="Feed", color=blue, shape=box]; + FeedY [label="Feed", color=blue, shape=box]; + FC [label="FC", color=blue, shape=box]; + MSE [label="MSE", color=blue, shape=box]; + + x [label="x", color=blue, shape=oval]; + l [label="l", color=blue, shape=oval]; + y [label="y", color=blue, shape=oval]; + W [label="W", color=blue, shape=doublecircle]; + b [label="b", color=blue, shape=doublecircle]; + cost [label="cost", color=blue, shape=oval]; + + FeedX -> x -> FC -> y -> MSE -> cost [color=blue]; + FeedY -> l [color=blue]; + W -> FC [color=blue]; + b -> FC [color=blue]; + l -> MSE [color=blue]; + + ////////// The backward part ///////// + MSE_Grad [label="MSE_grad", color=red, shape=box]; + FC_Grad [label="FC_grad", color=red, shape=box]; + + d_cost [label="d cost", color=red, shape=oval]; + d_y [label="d y", color=red, shape=oval]; + d_b [label="d b", color=red, shape=oval]; + d_W [label="d W", color=red, shape=oval]; + + cost -> MSE_Grad [color=red]; + d_cost -> MSE_Grad [color=red]; + x -> MSE_Grad [color=red]; + l -> MSE_Grad [color=red]; + y -> MSE_Grad -> d_y [color=red]; + + x -> FC_Grad [color=red]; + y -> FC_Grad [color=red]; + d_y -> FC_Grad [color=red]; + W -> FC_Grad -> d_W [color=red]; + b -> FC_Grad -> d_b [color=red]; + + ////////// The optimizaiton part ////////// + + OPT_W [label="SGD", color=green, shape=box]; + OPT_b [label="SGD", color=green, shape=box]; + + W -> OPT_W [color=green]; + b -> OPT_b [color=green]; + d_W -> OPT_W -> W [color=green]; + d_b -> OPT_b -> b [color=green]; + + ////////// Groupings ////////// + + subgraph clusterMSE { + style=invis; + MSE; + MSE_Grad; + } + + subgraph clusterFC { + style=invis; + FC; + FC_Grad; + } +} diff --git a/doc/design/images/graph_construction_example_all.png b/doc/design/images/graph_construction_example_all.png new file mode 100644 index 0000000000..18d8330b60 Binary files /dev/null and b/doc/design/images/graph_construction_example_all.png differ diff --git a/doc/design/images/graph_construction_example_forward_backward.png b/doc/design/images/graph_construction_example_forward_backward.png new file mode 100644 index 0000000000..61c3a02a04 Binary files /dev/null and b/doc/design/images/graph_construction_example_forward_backward.png differ diff --git a/doc/design/images/graph_construction_example_forward_only.png b/doc/design/images/graph_construction_example_forward_only.png new file mode 100644 index 0000000000..14805df11f Binary files /dev/null and b/doc/design/images/graph_construction_example_forward_only.png differ diff --git a/doc/getstarted/basic_usage/index_cn.rst b/doc/getstarted/basic_usage/index_cn.rst index 428f58830e..b473944fc7 100644 --- a/doc/getstarted/basic_usage/index_cn.rst +++ b/doc/getstarted/basic_usage/index_cn.rst @@ -55,7 +55,7 @@ PaddlePaddle是源于百度的一个深度学习平台。这份简短的介绍 # 线性计算网络层: ȳ = wx + b ȳ = fc_layer(input=x, param_attr=ParamAttr(name='w'), size=1, act=LinearActivation(), bias_attr=ParamAttr(name='b')) # 计算误差函数,即 ȳ 和真实 y 之间的距离 - cost = mse_cost(input= ȳ, label=y) + cost = square_error_cost(input= ȳ, label=y) outputs(cost) @@ -69,7 +69,7 @@ PaddlePaddle是源于百度的一个深度学习平台。这份简短的介绍 - **数据层**:数据层 `data_layer` 是神经网络的入口,它读入数据并将它们传输到接下来的网络层。这里数据层有两个,分别对应于变量 `x` 和 `y`。 - **全连接层**:全连接层 `fc_layer` 是基础的计算单元,这里利用它建模变量之间的线性关系。计算单元是神经网络的核心,PaddlePaddle支持大量的计算单元和任意深度的网络连接,从而可以拟合任意的函数来学习复杂的数据关系。 - - **回归误差代价层**:回归误差代价层 `mse_cost` 是众多误差代价函数层的一种,它们在训练过程作为网络的出口,用来计算模型的误差,是模型参数优化的目标函数。 + - **回归误差代价层**:回归误差代价层 `square_error_cost` 是众多误差代价函数层的一种,它们在训练过程作为网络的出口,用来计算模型的误差,是模型参数优化的目标函数。 定义了网络结构并保存为 `trainer_config.py` 之后,运行以下训练命令: diff --git a/doc/getstarted/basic_usage/index_en.rst b/doc/getstarted/basic_usage/index_en.rst index 6775da20c2..2cc438ebbe 100644 --- a/doc/getstarted/basic_usage/index_en.rst +++ b/doc/getstarted/basic_usage/index_en.rst @@ -49,7 +49,7 @@ To recover this relationship between ``X`` and ``Y``, we use a neural network wi x = data_layer(name='x', size=1) y = data_layer(name='y', size=1) y_predict = fc_layer(input=x, param_attr=ParamAttr(name='w'), size=1, act=LinearActivation(), bias_attr=ParamAttr(name='b')) - cost = mse_cost(input=y_predict, label=y) + cost = square_error_cost(input=y_predict, label=y) outputs(cost) Some of the most fundamental usages of PaddlePaddle are demonstrated: diff --git a/doc/getstarted/concepts/src/train.py b/doc/getstarted/concepts/src/train.py index 7e604f23de..8aceb23406 100644 --- a/doc/getstarted/concepts/src/train.py +++ b/doc/getstarted/concepts/src/train.py @@ -8,7 +8,7 @@ paddle.init(use_gpu=False) x = paddle.layer.data(name='x', type=paddle.data_type.dense_vector(2)) y_predict = paddle.layer.fc(input=x, size=1, act=paddle.activation.Linear()) y = paddle.layer.data(name='y', type=paddle.data_type.dense_vector(1)) -cost = paddle.layer.mse_cost(input=y_predict, label=y) +cost = paddle.layer.square_error_cost(input=y_predict, label=y) # create parameters parameters = paddle.parameters.create(cost) diff --git a/doc/getstarted/concepts/use_concepts_cn.rst b/doc/getstarted/concepts/use_concepts_cn.rst index f15b11bd78..c243083794 100644 --- a/doc/getstarted/concepts/use_concepts_cn.rst +++ b/doc/getstarted/concepts/use_concepts_cn.rst @@ -81,9 +81,9 @@ PaddlePaddle支持不同类型的输入数据,主要包括四种类型,和 .. code-block:: bash y_predict = paddle.layer.fc(input=x, size=1, act=paddle.activation.Linear()) - cost = paddle.layer.mse_cost(input=y_predict, label=y) + cost = paddle.layer.square_error_cost(input=y_predict, label=y) -其中,x与y为之前描述的输入层;而y_predict是接收x作为输入,接上一个全连接层;cost接收y_predict与y作为输入,接上均方误差层。 +其中,x与y为之前描述的输入层;而y_predict是接收x作为输入,接上一个全连接层;cost接收y_predict与y作为输入,接上平方误差层。 最后一层cost中记录了神经网络的所有拓扑结构,通过组合不同的layer,我们即可完成神经网络的搭建。 @@ -147,4 +147,4 @@ PaddlePaddle支持不同类型的输入数据,主要包括四种类型,和 .. literalinclude:: src/train.py :linenos: -有关线性回归的实际应用,可以参考PaddlePaddle book的 `第一章节 `_。 \ No newline at end of file +有关线性回归的实际应用,可以参考PaddlePaddle book的 `第一章节 `_。 diff --git a/doc/howto/dev/new_op_cn.md b/doc/howto/dev/new_op_cn.md index 7f8da2da5a..58665e9f2b 100644 --- a/doc/howto/dev/new_op_cn.md +++ b/doc/howto/dev/new_op_cn.md @@ -23,17 +23,20 @@ - `framework::OperatorWithKernel`:继承自OperatorBase,Op有计算函数,称作有Kernel。 - `class OpProtoAndCheckerMaker`:描述该Op的输入、输出、属性、注释,主要用于Python API接口生成 -依据是否包含kernel,将Op分为两种:包含Kernel的Op和不包含kernel的Op,前者Op的定义继承自`OperatorBase`,后者继承自`OperatorWithKernel`。本教程主要介绍带Kernel的Op如何写,简单总结Op需要包含的内容如下: +依据是否包含kernel,可以将Op分为两种:包含Kernel的Op和不包含kernel的Op,前者Op的定义继承自`OperatorBase`,后者继承自`OperatorWithKernel`。本教程主要介绍带Kernel的Op如何写,简单总结Op需要包含的内容如下: - - 内容 | 定义位置 --------------- | :---------------------- + + 内容 | 定义位置 +-------------- | :---------------------- OpProtoMake定义 | `.cc`文件,Backward Op不需要定义OpProtoMake -Op定义 | `.cc`文件 -Kernel实现 | CPU、GPU共享Kernel在`.h`文件,否则,CPU可以在`.cc`文件,GPU可在`.cu`文件。 -注册Op | Op注册在`.cc`文件;Kernel注册CPU在`.cc`文件,GPU在`.cu`文件 - - +Op定义 | `.cc`文件 +Kernel实现 | CPU、GPU共享Kernel实现在`.h`文件中,否则,CPU 实现在`.cc`文件中,GPU 实现在`.cu`文件中。 +注册Op | Op注册实现在`.cc`文件;Kernel注册CPU实现在`.cc`文件中,GPU实现在`.cu`文件中 + + +实现新的op都添加至目录[paddle/operators](https://github.com/PaddlePaddle/Paddle/tree/develop/paddle/operators)下,文件命名以`*_op.h`(如有) 、 `*_op.cc` 、`*_op.cu`(如有)结尾。 + + 下面以矩阵乘操作,即[MulOp](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/operators/mul_op.cc)为例来介绍如何写带Kernel的Operator。 @@ -42,9 +45,11 @@ Kernel实现 | CPU、GPU共享Kernel在`.h`文件,否则,CPU可以在` ### 1. 定义ProtoMaker类 -矩阵乘的公式:$Out = X * Y$, 可见该计算由两个输入,一个输出组成。首先定义`ProtoMaker`来描述该Op的输入、输出及注释: - -``` +矩阵乘法的公式:$Out = X * Y$, 可见该计算由两个输入,一个输出组成。 + +首先定义`ProtoMaker`来描述该Op的输入、输出,并添加注释: + +```cpp class MulOpMaker : public framework::OpProtoAndCheckerMaker { public: MulOpMaker(framework::OpProto *proto, framework::OpAttrChecker *op_checker) @@ -59,20 +64,20 @@ The equation is: Out = X * Y } }; ``` - -[`MulOpMaker`](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/operators/mul_op.cc#L43)继承自`framework::OpProtoAndCheckerMaker`,构造函数包括2个: + +[`MulOpMaker`](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/operators/mul_op.cc#L43)继承自`framework::OpProtoAndCheckerMaker`,构造函数含有2个参数: - `framework::OpProto` : 前者存储Op的输入输出和参数属性,将用于Python API接口的生成。 - `framework::OpAttrChecker` :后者用于检查参数属性的合法性。 - -构造函数里通过`AddInput`添加输入参数,通过`AddOutput`添加输出参数,通过`AddComment`添加该Op的注释,这些函数会将对应内容添加到`OpProto`中。 -在`MulOp`中添加两个输入`X`和`Y`,添加了一个输出`Out`,并解释了各自含义,该命名尽可能的规范。 +构造函数里通过`AddInput`添加输入参数,通过`AddOutput`添加输出参数,通过`AddComment`添加Op的注释。这些函数会将对应内容添加到`OpProto`中。 - -再举个[`ScaleOp`](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/operators/scale_op.cc#L37)的例子: - -``` +上面的代码在`MulOp`中添加两个输入`X`和`Y`,添加了一个输出`Out`,并解释了各自含义,命名请遵守命名规范。 + + +再以[`ScaleOp`](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/operators/scale_op.cc#L37)为例: + +```cpp template class ScaleOpMaker : public framework::OpProtoAndCheckerMaker { public: @@ -87,17 +92,19 @@ The equation is: Out = scale*X } }; ``` - - 在这个例子里,两处不同: - - - `AddInput("X","...").NotInGradient()` : 表示`X`这个输入不参与`ScaleOp`对应的梯度Op计算之中。 - - `AddAttr("scale", "...").SetDefault(1.0);` : 增加`scale`系数,作为参数属性,并且设置默认值为1.0。 - + +这个例子有两处不同: + +- `AddInput("X","...").NotInGradient()` : 表示`X`这个输入不参与`ScaleOp`对应的梯度Op计算之中,如果Op的某个输入不参与反向梯度的计算,请显示地调用`.NotInGradient()`进行设置。 + +- `AddAttr("scale", "...").SetDefault(1.0);` : 增加`scale`系数,作为参数属性,并且设置默认值为1.0。 + ### 2. 定义Operator类 +下面的点实现了MulOp的定义: -```c++ +```cpp class MulOp : public framework::OperatorWithKernel { public: using framework::OperatorWithKernel::OperatorWithKernel; @@ -121,33 +128,46 @@ class MulOp : public framework::OperatorWithKernel { ``` [`MulOp`](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/operators/mul_op.cc#L22)继承自`OperatorWithKernel`。`public`成员: - -```c++ + +```cpp using framework::OperatorWithKernel::OperatorWithKernel; ``` 这句表示使用基类`OperatorWithKernel`的构造函数,也可写成: - -```c++ + +```cpp MulOp(const std::string &type, const framework::VariableNameMap &inputs, const framework::VariableNameMap &outputs, const framework::AttributeMap &attrs) : OperatorWithKernel(type, inputs, outputs, attrs) {} -``` - +``` + 还需要重写`InferShape`接口。`InferShape`为const函数,不能修改Op的成员变量,参数为`const framework::InferShapeContext &ctx`,通过该参数可获取到输入输出以及属性。它的功能是: - 1). 做检查, 尽早报错:检查输入数据维度、类型等是否合法。 - 2). 设置输出Tensor的形状。 -通常`OpProtoMaker`和`Op`类的定义写在`.cc`文件中,和要讲到的注册函数一起放在`.cc`中 +通常`OpProtoMaker`和`Op`类的定义写在`.cc`文件中,和下面将要介绍的注册函数一起放在`.cc`中 ### 3. 定义OpKernel类 -```C++ -template -class MulKernel : public framework::OpKernel { - public: +`MulKernel`继承自`framework::OpKernel`,带有下面两个模板参数: + +- `typename Place`: 表示设备类型,不同设备(CPU、GPU)共享同一个Kernel时,需加该模板参数,不共享则不加,一个不共享的例子是[`OnehotCrossEntropyOpKernel`](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/operators/cross_entropy_op.h#L43)。 + +- `typename T` : 表示数据类型,如`float`, `double`等。 + +需要为`MulKernel`类重写`Compute`接口。 +- `Compute`接受一个输入参数:`const framework::ExecutionContext& context`。 +- 与`InferShapeContext`相比,`ExecutionContext`增加了设备类型,同样可获取到输入输出和属性参数。 +- `Compute`函数里实现`OpKernel`的具体计算逻辑。 + +下面是 `MulKernel` `Compute`的实现: + + ```cpp + template + class MulKernel : public framework::OpKernel { + public: void Compute(const framework::ExecutionContext& context) const override { auto* X = context.Input("X"); auto* Y = context.Input("Y"); @@ -157,168 +177,197 @@ class MulKernel : public framework::OpKernel { const_cast(context.device_context_); math::matmul(*X, false, *Y, false, 1, Z, 0, device_context); } -}; -``` + }; + ``` + +需要注意:**不同设备(CPU、GPU)共享一个Op定义,是否则共享同一个`OpKernel`,取决于`Compute`调用的函数是否支持不同设备。** + +`MulOp`的CPU、GPU实现共享同一个`Kernel`。`OpKernel`不共享的例子可以参考:[`OnehotCrossEntropyOpKernel`](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/operators/cross_entropy_op.h#L43)。 + +为了使`OpKernel`的计算过程书写更加简单,并且CPU、GPU的代码可以复用,我们通常借助 Eigen unsupported Tensor模块来实现`Compute`接口。关于在PaddlePaddle中如何使用Eigen库,请参考[使用文档](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/howto/dev/use_eigen_cn.md)。 + + +到此,前向Op实现完成。接下来,需要在`.cc`文件中注册该op和kernel。 +反向Op类的定义,反向OpKernel的定义与前向Op类似,这里不再赘述。**但需注意反向Op没有`ProtoMaker`**。 -`MulKernel`继承自`framework::OpKernel`,带有模板参数: - - - `typename Place`: 表示设备类型,不同设备(CPU、GPU)共享同一个Kernel时,需加该模板参数,不共享则不加,一个不共享的例子是[`OnehotCrossEntropyOpKernel`](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/operators/cross_entropy_op.h#L43)。 - - - `typename T` : 表示数据类型,如`float`, `double`等。 - -`MulKernel`需要重写`Compute`接口,该接口参数为`const framework::ExecutionContext& context`, `ExecutionContext`相比`InferShapeContext`增加了设备类型,同样可获取到输入输出和属性参数,`Compute`函数里写具体实现时。 - -注意,不同设备(CPU、GPU)共享一个Op定义,是否则共享同一个`OpKernel`,取决于`Compute`调用的函数是否支持不同设备。`MulOp`的CPU、GPU实现共享同一个`Kernel`,`OpKernel`不共享的例子可以参考[`OnehotCrossEntropyOpKernel`](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/operators/cross_entropy_op.h#L43)。 - -到此前向Op实现完成,需要在`.cc`文件中注册该op和kernel。反向Op类的定义和Kernel定义与前向Op类似,这里不再重复。但注意,反向Op没有`ProtoMaker`。 - ### 4. 注册Operator -在`.cc`文件中注册前向、反向Op类,注册CPU Kernel。 +- 在`.cc`文件中注册前向、反向Op类,注册CPU Kernel。 -```c++ -namespace ops = paddle::operators; -REGISTER_OP(mul, ops::MulOp, ops::MulOpMaker, mul_grad, ops::MulOpGrad); -REGISTER_OP_CPU_KERNEL(mul, ops::MulKernel); -REGISTER_OP_CPU_KERNEL(mul_grad, - ops::MulGradKernel); -``` - - - `REGISTER_OP` : 注册`ops::MulOp`类,类型名为`mul`,该类的`ProtoMaker`为`ops::MulOpMaker`,注册`ops::MulOpGrad`,类型名为`mul_grad`, - - `REGISTER_OP_WITHOUT_GRADIENT` : 用于注册没有反向的Op。 - - `REGISTER_OP_CPU_KERNEL` :注册`ops::MulKernel`类,并特化模板参数为`paddle::platform::CPUPlace`和`float`类型,同理,注册`ops::MulKernel`类。 - -在 `.cu`文件中注册GPU Kernel。 - -```c++ -namespace ops = paddle::operators; -REGISTER_OP_GPU_KERNEL(mul, ops::MulKernel); -REGISTER_OP_GPU_KERNEL(mul_grad, - ops::MulGradKernel); -``` + ```cpp + namespace ops = paddle::operators; + REGISTER_OP(mul, ops::MulOp, ops::MulOpMaker, mul_grad, ops::MulOpGrad); + REGISTER_OP_CPU_KERNEL(mul, ops::MulKernel); + REGISTER_OP_CPU_KERNEL(mul_grad, + ops::MulGradKernel); + ``` + + 在上面的代码中: + + - `REGISTER_OP` : 注册`ops::MulOp`类,类型名为`mul`,该类的`ProtoMaker`为`ops::MulOpMaker`,注册`ops::MulOpGrad`,类型名为`mul_grad`。 + - `REGISTER_OP_WITHOUT_GRADIENT` : 用于注册没有反向的Op。 + - `REGISTER_OP_CPU_KERNEL` :注册`ops::MulKernel`类,并特化模板参数为`paddle::platform::CPUPlace`和`float`类型,同理,注册`ops::MulKernel`类。 + + +- 在 `.cu`文件中注册GPU Kernel。 + - 请注意,如果GPU Kernel的实现基于Eigen unsupported模块,那么在 `.cu`的开始请加上宏定义 `#define EIGEN_USE_GPU`,代码示例如下: + + ```cpp + // if use Eigen unsupported module before include head files + #define EIGEN_USE_GPU + + namespace ops = paddle::operators; + REGISTER_OP_GPU_KERNEL(mul, ops::MulKernel); + REGISTER_OP_GPU_KERNEL(mul_grad, + ops::MulGradKernel); + ``` ### 5. 编译 -在[paddle/operators/CMakeLists.txt](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/operators/CMakeLists.txt)文件中添加编译。 - -``` -op_library(mul_op SRCS mul_op.cc mul_op.cu DEPS math_function) -``` - -下面命令可以编译: - -``` -make mul_op -``` +- 简单**无特殊依赖**的OP无需修改CMakeList.txt文件。[paddle/operators/CMakeLists.txt](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/operators/CMakeLists.txt) 会自动将 `paddle/operators` 目录下新增的 `*_op.cc` 文件加入编译。 +- 较为复杂、**有额外依赖** 的operator仍需要修改[paddle/operators/CMakeLists.txt](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/operators/CMakeLists.txt)。如,`mul_op` 依赖 `math_function`,需要在`CMakeLists.txt`中添加如下内容: + + ``` + op_library(mul_op SRCS mul_op.cc mul_op.cu DEPS math_function) + + ``` + +- 运行下面命令可以进行编译: + + ``` + make mul_op + ``` ## 绑定Python -- 绑定Python - - 在 [`paddle/pybind/pybind.cc -`](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/pybind/pybind.cc)文件中添加该类: +- 绑定Python + + 在 [`paddle/pybind/pybind.cc +`](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/pybind/pybind.cc) 使用`USE_OP`告知编译器需要链接的Op,具体解释参考[代码注释](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/op_registry.h#L81)。 ``` USE_OP(mul); ``` 如果只实现了CPU版本,则使用`USE_CPU_ONLY_OP`: - + ``` USE_CPU_ONLY_OP(gather); ``` - - 使用`USE_OP`告知编译器需要链接该Op的目标文件,具体解释参考[代码注释](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/op_registry.h#L81)。 - - + + 如果OP不带Kernel,则使用`USE_NO_KENREL_OP`: + + ``` + USE_NO_KENREL_OP(recurrent); + ``` + + - 生成库 - 在 [`paddle/pybind/CMakeLists.txt`](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/pybind/CMakeLists.txt)文件添加类到`DEPS`中,使得该Op可以链接到生成的lib库中。 - - ``` - if(WITH_PYTHON) - cc_library(paddle_pybind SHARED - SRCS pybind.cc - DEPS pybind python backward - mul_op - minus_op) - endif(WITH_PYTHON) - ``` + 无需修改 [`paddle/pybind/CMakeLists.txt`](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/pybind/CMakeLists.txt)文件,`paddle/operators` 目录下新增的 `*_op.cc` 文件会被自动添加链接到生成的lib库中。 ## 实现单元测试 -单测包括对比前向Op不同设备(CPU、GPU)的实现、对比反向OP不同设备(CPU、GPU)的实现、反向Op的梯度测试。下面介绍介绍[`MulOp`的单测](https://github.com/PaddlePaddle/Paddle/blob/develop/python/paddle/v2/framework/tests/test_mul_op.py)。 +单测包括对比前向Op不同设备(CPU、GPU)的实现、对比反向OP不同设备(CPU、GPU)的实现、反向Op的梯度测试。下面介绍介绍[`MulOp`的单元测试](https://github.com/PaddlePaddle/Paddle/blob/develop/python/paddle/v2/framework/tests/test_mul_op.py)。 -### 前向Operator单测 +### 前向Operator单元测试 -前向Op单测继承自`unittest.TestCase`,并定义元类`__metaclass__ = OpTestMeta`,具体单测流程在`OpTestMeta`里完成。需在`setUp`函数定义输入输出和属性参数,以及Python对比的输出值。 +前向Op单元测试继承自`unittest.TestCase`,并定义元类`__metaclass__ = OpTestMeta`。各项更加具体的单元测试在`OpTestMeta`里完成。测试前向Operator,需要: -``` -import unittest -import numpy as np -from gradient_checker import GradientChecker, create_op -from op_test_util import OpTestMeta +1. 在`setUp`函数定义输入、输出,以及相关的属性参数。 +2. 生成随机的输入数据。 +3. 在Python脚本中实现与前向operator相同的计算逻辑,得到输出值,与operator前向计算的输出进行对比。 + + + ```python + import unittest + import numpy as np + from gradient_checker import GradientChecker, create_op + from op_test_util import OpTestMeta -class TestMulOp(unittest.TestCase): - __metaclass__ = OpTestMeta + class TestMulOp(unittest.TestCase): + __metaclass__ = OpTestMeta + def setUp(self): + self.type = "mul" + self.inputs = { + 'X': np.random.random((32, 84)).astype("float32"), + 'Y': np.random.random((84, 100)).astype("float32") + } + self.outputs = {'Out': np.dot(self.inputs['X'], self.inputs['Y'])} + ``` + +上面的代码首先导入依赖的包,下面是对`setUp`函数中操作的重要变量的详细解释: + +- `self.type = "mul" ` : 定义类型,与operator注册时注册的类型一致。 +- `self.inputs` : 定义输入,类型为`numpy.array`,并初始化。 +- `self.outputs` : 定义输出,并在Python脚本中完成与operator同样的计算逻辑,返回Python端的计算结果。 + + +### 反向Operator单元测试 + +反向Op单元测试继承自`GradientChecker`,而`GradientChecker`继承自`unittest.TestCase`,因此,**反向单元测试函数需要以`test_`开头**。 + +```python +class TestMulGradOp(GradientChecker): def setUp(self): - self.type = "mul" + self.op = create_op("mul") self.inputs = { 'X': np.random.random((32, 84)).astype("float32"), 'Y': np.random.random((84, 100)).astype("float32") } - self.outputs = {'Out': np.dot(self.inputs['X'], self.inputs['Y'])} -``` - 首先需要`import`必要的包,下面详细解释其他值: - - - `self.type = "mul" ` : 定义类型,和注册的类型一致。 - - `self.inputs` : 定义输入,类型为Numpy.array,并初始化。 - - `self.outputs` : 定义输出,并得到Python结算结果。 - - -### 反向Operator单测 - -反向Op单测继承自`GradientChecker`,而`GradientChecker`集成自`unittest.TestCase`,所以反向单测函数需要`test_`开头。 - - ``` - class MulGradOpTest(GradientChecker): - def test_mul(self): - op = create_op("mul") - inputs = { - 'X': np.random.random((32, 84)).astype("float32"), - 'Y': np.random.random((84, 100)).astype("float32") - } - self.compare_grad(op, inputs) + + def test_cpu_gpu_compare(self): + self.compare_grad(self.op, self.inputs) + + def test_normal(self): # mul op will enlarge the relative error self.check_grad( - op, inputs, set(["X", "Y"]), "Out", max_relative_error=0.5) - ``` + self.op, self.inputs, ["X", "Y"], "Out", max_relative_error=0.5) + + def test_ignore_x(self): + self.check_grad( + self.op, + self.inputs, ["Y"], + "Out", + max_relative_error=0.5, + no_grad_set={"X"}) + + def test_ignore_y(self): + self.check_grad( + self.op, + self.inputs, ["X"], + "Out", + max_relative_error=0.5, + no_grad_set={"Y"}) +``` + +下面解释代码中一些关键的地方: - - 调用`create_op("mul")`创建反向Op对应的前向Op。 - - 定义输入`inputs`。 - - 调用`compare_grad`函数对比CPU、GPU计算结果。 - - 调用`check_grad`检查梯度稳定性,这里采用数值法检测梯度正确性。 - - 第一个参数`op` : 前向op。 - - 第二个参数`inputs` : 输入词典,词典的Key和`ProtoMaker`定义保持一致。 - - 第三个参数`set(["X", "Y"])` : 指定对输入变量`X`、`Y`做梯度检测。 - - 第四个参数`"Out"` : 指定前向网络最终的输出目标变量`Out` +- 调用`create_op("mul")`创建反向Op对应的前向Op。 +- 调用`compare_grad`函数对比CPU、GPU计算结果。 +- `test_normal`中调用`check_grad`使用数值法检测梯度正确性和稳定性。 + - 第一个参数`self.op` : 前向Op。 + - 第二个参数`self.inputs` : 输入词典,词典的Key和`ProtoMaker`定义保持一致。 + - 第三个参数`["X", "Y"]` : 指定对输入变量`X`、`Y`做梯度检测。 + - 第四个参数`"Out"` : 指定前向网络最终的输出目标变量`Out` +- `test_ignore_x`和`test_ignore_y`分支用来测试只需要计算一个输入梯度的情况。 -### 编译和执行 +### 编译和执行单元测试 -单测完成之后,在[`python/paddle/v2/framework/tests/CMakeLists.txt`](https://github.com/PaddlePaddle/Paddle/blob/develop/python/paddle/v2/framework/tests/CMakeLists.txt)里添加编译: +单元测试编写完成之后,在[`python/paddle/v2/framework/tests/CMakeLists.txt`](https://github.com/PaddlePaddle/Paddle/blob/develop/python/paddle/v2/framework/tests/CMakeLists.txt)中添加以下内容,将单元测试加入工程: ``` py_test(test_mul_op SRCS test_mul_op.py) ``` -编译时需要打开`WITH_TESTING`, 即 `cmake paddle_dir -DWITH_TESTING=ON`,编译成功之后执行单测命令为: +请注意,**不同于Op的编译测试,运行单元测试测时需要编译整个工程**,并且编译时需要打开`WITH_TESTING`, 即`cmake paddle_dir -DWITH_TESTING=ON`。编译成功后,执行下面的命令来运行单元测试: -``` +```bash make test ARGS="-R test_mul_op -V" ``` + 或者: -``` +```bash ctest -R test_mul_op ``` diff --git a/doc/howto/dev/use_eigen_cn.md b/doc/howto/dev/use_eigen_cn.md new file mode 100644 index 0000000000..1367323b71 --- /dev/null +++ b/doc/howto/dev/use_eigen_cn.md @@ -0,0 +1,146 @@ +## 在Paddle中如何使用Eigen + +神经网络本质上是一个计算图,计算需要的数据存放在`Tensor`中,而计算过程是由`Operartor`来描述的。在执行时,`Operator`调用对应`OpKernel`中的`Compute`接口,实现对`Tensor`的操作。 + + +### Eigen Tensor模块 + +Eigen Tensor模块对element-wise计算提供了强大的支持,并且书写一份代码,可以同时在CPU、GPU执行。但Eigen Tensor是一个正在开发中的模块,因此可能测试不够完备,文档较少。 + +关于Eigen Tensor模块的详细介绍请参考[文档1](https://github.com/RLovelett/eigen/blob/master/unsupported/Eigen/CXX11/src/Tensor/README.md) 和[文档2](https://bitbucket.org/eigen/eigen/src/default/unsupported/Eigen/CXX11/src/Tensor/README.md) + + +### paddle::framework::Tensor + +Paddle Tensor定义在framework目录下,其主要接口如下: + +```cpp +class Tensor { + public: + /*! Return a pointer to mutable memory block. */ + template + inline T* data(); + + /** + * @brief Return a pointer to mutable memory block. + * @note If not exist, then allocation. + */ + template + inline T* mutable_data(platform::Place place); + + /** + * @brief Return a pointer to mutable memory block. + * + * @param[in] dims The dimensions of the memory block. + * @param[in] place The place of the memory block. + * + * @note If not exist, then allocation. + */ + template + inline T* mutable_data(DDim dims, platform::Place place); + + /*! Resize the dimensions of the memory block. */ + inline Tensor& Resize(const DDim& dims); + + /*! Return the dimensions of the memory block. */ + inline const DDim& dims() const; + + private: + /*! holds the memory block if allocated. */ + std::shared_ptr holder_; + + /*! points to dimensions of memory block. */ + DDim dim_; +}; +``` + +`Placeholder`的作用是延迟分配内存,即我们可以先定义一个Tensor,然后使用Resize接口设置Tensor的大小,最后再调用mutable_data接口分配实际的内存。 + +```cpp +paddle::framework::Tensor t; +paddle::platform::CPUPlace place; +// set size first +t.Resize({2, 3}); +// allocate memory on CPU later +t.mutable_data(place); +``` + +### paddle::framework::Tensor使用样例 +下面以AddOp为例说明Tensor的使用过程: + +- InferShape + +在运行神经网络计算图时,我们先调用每个`Operator`的`InferShape`接口,根据输入Tensor的大小来设置输出Tensor的大小,`Resize`接口会被调用。 + +```cpp +void InferShape(const framework::InferShapeContext &ctx) const override { + PADDLE_ENFORCE_EQ(ctx.Input("X")->dims(), + ctx.Input("Y")->dims(), + "Two input of Add Op's dimension must be same."); + ctx.Output("Out")->Resize(ctx.Input("X")->dims()); +} +``` + + +- Run + +`Operator`的`Run`接口最终会调用对应`OpKernel`的`Compute`接口,在这时真正的分配内存,`mutable_data`接口会被调用。 + +```cpp +void Compute(const framework::ExecutionContext& context) const override { + auto* input0 = context.Input("X"); + auto* input1 = context.Input("Y"); + auto* output = context.Output("Out"); + + output->mutable_data(context.GetPlace()); + + auto x = EigenVector::Flatten(*input0); + auto y = EigenVector::Flatten(*input1); + auto z = EigenVector::Flatten(*output); + + auto place = context.GetEigenDevice(); + + z.device(place) = x + y; +} +``` + + +### paddle::framework::Tensor到EigenTensor的转换 + +如上一小节所示,在具体的计算中,我们需要先把输入Tensor和输出Tensor转换为Eigen支持的格式。我们在[eigen.h](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/eigen.h)中提供了一些全局函数用来实现paddle::framework::Tensor到EigenTensor/EigenMatrix/EigenVector/EigenScalar的转换。 + +以EigenTensor为例,做一个介绍 + +```cpp +Tensor t; +float* p = t.mutable_data(make_ddim({1, 2, 3}), platform::CPUPlace()); +for (int i = 0; i < 1 * 2 * 3; i++) { + p[i] = static_cast(i); +} + +EigenTensor::Type et = EigenTensor::From(t); +``` + +From是EigenTensor模板提供的一个接口,可以实现从paddle::framework::Tensor到对EigenTensor的转换。由于Tensor的rank是模板参数,因此在转换时需要显示的指定。 + +在Eigen中,不同rank的Tensor是不同类型,Vector是rank为1的Tensor。需要额外注意的是,EigenVector::From方法是把paddle中的一维Tensor转为Eigen的一维Tensor,在这里用EigenVector来表示;而EigenVector::Flatten方法是把paddle中的一个Tensor进行reshape操作,压扁成为Eigen的一维Tensor,类型仍然为EigenVector。 + +更多的转换方法请参考eigen_test.cc中的[单元测试](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/eigen_test.cc)。 + + + +### 实现计算 + +当需要完成计算时,我们需要等式左边的EigenTensor调用device接口。在这里需要注意的是,这里的EigenTensor之间的运算只是改变了原有Tensor中的数据,而不会改变原有Tensor的shape信息。 + +```cpp +auto x = EigenVector::Flatten(*input0); +auto y = EigenVector::Flatten(*input1); +auto z = EigenVector::Flatten(*output); +auto place = context.GetEigenDevice(); +z.device(place) = x + y; +``` + +在这段代码中,input0/input1/output可以是任意维度的Tensor。我们调用了EigenVector的Flatten接口,把任意维度的Tensor转为了一维的EigenVector。而在计算结束之后,input0/input1/output的原有shape信息不变。如果想改变原有Tensor的shape信息,可以调用Resize接口进行改变。 + +由于Eigen Tensor模块的文档较少,我们可以参考TensorFlow的[kernels](https://github.com/tensorflow/tensorflow/tree/master/tensorflow/core/kernels)模块下的相关`OpKernel`的计算代码。 diff --git a/doc/howto/usage/k8s/k8s_distributed_cn.md b/doc/howto/usage/k8s/k8s_distributed_cn.md index 3121b3f59d..a9bebf0955 100644 --- a/doc/howto/usage/k8s/k8s_distributed_cn.md +++ b/doc/howto/usage/k8s/k8s_distributed_cn.md @@ -213,7 +213,7 @@ I1116 09:10:17.123440 50 Util.cpp:130] Calling runInitFunctions I1116 09:10:17.123764 50 Util.cpp:143] Call runInitFunctions done. [WARNING 2016-11-16 09:10:17,227 default_decorators.py:40] please use keyword arguments in paddle config. [INFO 2016-11-16 09:10:17,239 networks.py:1282] The input order is [movie_id, title, genres, user_id, gender, age, occupation, rating] -[INFO 2016-11-16 09:10:17,239 networks.py:1289] The output order is [__mse_cost_0__] +[INFO 2016-11-16 09:10:17,239 networks.py:1289] The output order is [__square_error_cost_0__] I1116 09:10:17.392917 50 Trainer.cpp:170] trainer mode: Normal I1116 09:10:17.613910 50 PyDataProvider2.cpp:257] loading dataprovider dataprovider::process I1116 09:10:17.680917 50 PyDataProvider2.cpp:257] loading dataprovider dataprovider::process diff --git a/paddle/framework/attribute.cc b/paddle/framework/attribute.cc index 9eb07acdff..27132eaa0b 100644 --- a/paddle/framework/attribute.cc +++ b/paddle/framework/attribute.cc @@ -43,6 +43,10 @@ template <> AttrType AttrTypeID>() { return STRINGS; } +template <> +AttrType AttrTypeID>>() { + return INT_PAIRS; +} Attribute GetAttrValue(const OpDesc::Attr& attr_desc) { switch (attr_desc.type()) { @@ -76,6 +80,14 @@ Attribute GetAttrValue(const OpDesc::Attr& attr_desc) { } return val; } + case paddle::framework::AttrType::INT_PAIRS: { + std::vector> val(attr_desc.int_pairs_size()); + for (int i = 0; i < attr_desc.int_pairs_size(); ++i) { + val[i].first = attr_desc.int_pairs(i).first(); + val[i].second = attr_desc.int_pairs(i).second(); + } + return val; + } } PADDLE_ENFORCE(false, "Unknown OpDesc::AttrDesc::type !"); return boost::blank(); diff --git a/paddle/framework/attribute.h b/paddle/framework/attribute.h index 08b47cabd4..071879a9d4 100644 --- a/paddle/framework/attribute.h +++ b/paddle/framework/attribute.h @@ -28,7 +28,8 @@ namespace paddle { namespace framework { typedef boost::variant, - std::vector, std::vector> + std::vector, std::vector, + std::vector>> Attribute; typedef std::unordered_map AttributeMap; diff --git a/paddle/framework/backward.cc b/paddle/framework/backward.cc index 6b4c612cd8..c5d4662215 100644 --- a/paddle/framework/backward.cc +++ b/paddle/framework/backward.cc @@ -182,7 +182,7 @@ static std::unique_ptr BackwardRecursive( }); // process recurrent gradient op as a special operator. - if (forwardOp.Type() == "recurrent_op") { + if (forwardOp.Type() == "recurrent") { // NOTE clean up cycle call somewhere (RNN's stepnet constains itself), or // this will result in infinite loop. const auto& rnnop = diff --git a/paddle/framework/backward_test.cc b/paddle/framework/backward_test.cc index f100c4d054..ad8003420d 100644 --- a/paddle/framework/backward_test.cc +++ b/paddle/framework/backward_test.cc @@ -127,8 +127,8 @@ class FillZeroOpMaker : public OpProtoAndCheckerMaker { public: FillZeroOpMaker(OpProto *proto, OpAttrChecker *op_checker) : OpProtoAndCheckerMaker(proto, op_checker) { - AddInput("x", "x"); - AddOutput("out", "out"); + AddInput("Src", "x"); + AddOutput("Dst", "out"); AddComment(""); } }; @@ -138,7 +138,7 @@ class AddOpMaker : public OpProtoAndCheckerMaker { AddOpMaker(OpProto *proto, OpAttrChecker *op_checker) : OpProtoAndCheckerMaker(proto, op_checker) { AddInput("X", "x").AsDuplicable(); - AddOutput("Y", "y"); + AddOutput("Out", "out"); AddComment(""); } }; diff --git a/paddle/framework/ddim.cc b/paddle/framework/ddim.cc index cfd3e8dfde..85b7de7974 100644 --- a/paddle/framework/ddim.cc +++ b/paddle/framework/ddim.cc @@ -21,16 +21,16 @@ namespace framework { /// @cond HIDDEN template -Dim make_dim(const int* d) { +Dim make_dim(const int64_t* d) { return Dim(*d, make_dim(d + 1)); } template <> -Dim<1> make_dim<1>(const int* d) { +Dim<1> make_dim<1>(const int64_t* d) { return Dim<1>(*d); } -void make_ddim(DDim& ddim, const int* dims, int n) { +void make_ddim(DDim& ddim, const int64_t* dims, int n) { switch (n) { case 1: ddim = make_dim<1>(dims); @@ -67,13 +67,13 @@ void make_ddim(DDim& ddim, const int* dims, int n) { /// @endcond -DDim make_ddim(std::initializer_list dims) { +DDim make_ddim(std::initializer_list dims) { DDim result(make_dim(0)); make_ddim(result, dims.begin(), dims.size()); return result; } -DDim make_ddim(const std::vector& dims) { +DDim make_ddim(const std::vector& dims) { DDim result(make_dim(0)); make_ddim(result, &dims[0], dims.size()); return result; @@ -81,12 +81,12 @@ DDim make_ddim(const std::vector& dims) { /// @cond HIDDEN // XXX For some reason, putting this in an anonymous namespace causes errors -class DynamicMutableIndexer : public boost::static_visitor { +class DynamicMutableIndexer : public boost::static_visitor { public: explicit DynamicMutableIndexer(int idx) : idx_(idx) {} template - int& operator()(Dim& dim) const { + int64_t& operator()(Dim& dim) const { return dim[idx_]; } @@ -94,12 +94,12 @@ class DynamicMutableIndexer : public boost::static_visitor { int idx_; }; -class DynamicConstIndexer : public boost::static_visitor { +class DynamicConstIndexer : public boost::static_visitor { public: explicit DynamicConstIndexer(int idx) : idx_(idx) {} template - int operator()(const Dim& dim) const { + int64_t operator()(const Dim& dim) const { return dim[idx_]; } @@ -109,22 +109,22 @@ class DynamicConstIndexer : public boost::static_visitor { /// @endcond -int& DDim::operator[](int idx) { +int64_t& DDim::operator[](int idx) { return boost::apply_visitor(DynamicMutableIndexer(idx), var); } -int DDim::operator[](int idx) const { +int64_t DDim::operator[](int idx) const { return boost::apply_visitor(DynamicConstIndexer(idx), var); } -ssize_t DDim::size() const { return arity(*this); } +int64_t DDim::size() const { return arity(*this); } bool DDim::operator==(DDim d) const { if (var.which() != d.getVar().which()) { return false; } else { - std::vector v1 = vectorize(*this); - std::vector v2 = vectorize(d); + std::vector v1 = vectorize(*this); + std::vector v2 = vectorize(d); for (unsigned int i = 0; i < v1.size(); i++) { if (v1[i] != v2[i]) { @@ -139,10 +139,10 @@ bool DDim::operator==(DDim d) const { bool DDim::operator!=(DDim d) const { return !(*this == d); } DDim DDim::operator+(DDim d) const { - std::vector v1 = vectorize(*this); - std::vector v2 = vectorize(d); + std::vector v1 = vectorize(*this); + std::vector v2 = vectorize(d); - std::vector v3; + std::vector v3; assert(v1.size() == v2.size()); @@ -154,10 +154,10 @@ DDim DDim::operator+(DDim d) const { } DDim DDim::operator*(DDim d) const { - std::vector v1 = vectorize(*this); - std::vector v2 = vectorize(d); + std::vector v1 = vectorize(*this); + std::vector v2 = vectorize(d); - std::vector v3; + std::vector v3; assert(v1.size() == v2.size()); @@ -168,15 +168,15 @@ DDim DDim::operator*(DDim d) const { return make_ddim(v3); } -int get(const DDim& ddim, int idx) { return ddim[idx]; } +int64_t get(const DDim& ddim, int idx) { return ddim[idx]; } void set(DDim& ddim, int idx, int value) { ddim[idx] = value; } /// @cond HIDDEN struct VectorizeVisitor : public boost::static_visitor<> { - std::vector& vector; + std::vector& vector; - explicit VectorizeVisitor(std::vector& v) : vector(v) {} + explicit VectorizeVisitor(std::vector& v) : vector(v) {} template void operator()(const T& t) { @@ -188,31 +188,31 @@ struct VectorizeVisitor : public boost::static_visitor<> { }; /// @endcond -std::vector vectorize(const DDim& ddim) { - std::vector result; +std::vector vectorize(const DDim& ddim) { + std::vector result; VectorizeVisitor visitor(result); boost::apply_visitor(visitor, ddim); return result; } -struct ProductVisitor : public boost::static_visitor { +struct ProductVisitor : public boost::static_visitor { template - ssize_t operator()(const Dim& dim) { + int64_t operator()(const Dim& dim) { return product(dim); } }; -ssize_t product(const DDim& ddim) { +int64_t product(const DDim& ddim) { ProductVisitor visitor; return boost::apply_visitor(visitor, ddim); } struct SliceVectorizeVisitor : public boost::static_visitor<> { - std::vector& vector; + std::vector& vector; int begin; int end; - SliceVectorizeVisitor(std::vector& v, int b, int e) + SliceVectorizeVisitor(std::vector& v, int b, int e) : vector(v), begin(b), end(e) { PADDLE_ENFORCE(begin < end, "Begin index must be less than end index in ddim slice."); @@ -240,7 +240,7 @@ struct SliceVectorizeVisitor : public boost::static_visitor<> { }; DDim slice_ddim(const DDim& dim, int begin, int end) { - std::vector vec; + std::vector vec; vec.reserve(end - begin); SliceVectorizeVisitor visitor(vec, begin, end); boost::apply_visitor(visitor, dim); @@ -280,7 +280,7 @@ std::ostream& operator<<(std::ostream& os, const DDim& ddim) { return os; } -DDim::DDim(std::initializer_list init_list) { +DDim::DDim(std::initializer_list init_list) { *this = make_ddim(init_list); } } // namespace framework diff --git a/paddle/framework/ddim.h b/paddle/framework/ddim.h index 95f294b627..db30c52394 100644 --- a/paddle/framework/ddim.h +++ b/paddle/framework/ddim.h @@ -40,7 +40,7 @@ struct DDim { template explicit DDim(const Dim& in) : var(in) {} - /*implicit*/ DDim(std::initializer_list init_list); + /*implicit*/ DDim(std::initializer_list init_list); template DDim& operator=(const Dim& in) { @@ -48,8 +48,8 @@ struct DDim { return *this; } - int& operator[](int idx); - int operator[](int idx) const; + int64_t& operator[](int idx); + int64_t operator[](int idx) const; template typename Visitor::result_type apply_visitor(Visitor& visitor) { @@ -71,15 +71,15 @@ struct DDim { DDim operator*(DDim d) const; - ssize_t size() const; + int64_t size() const; }; /** - * \brief Make a DDim from std::vector + * \brief Make a DDim from std::vector * * \param dims An vector of ints. Must be sized between [1, 9] */ -DDim make_ddim(const std::vector& dims); +DDim make_ddim(const std::vector& dims); /** * \brief Make a DDim from an initializer list @@ -87,14 +87,14 @@ DDim make_ddim(const std::vector& dims); * \param dims An initializer list of ints. Must be sized between [1, 9] * */ -DDim make_ddim(std::initializer_list dims); +DDim make_ddim(std::initializer_list dims); -int get(const DDim& dim, int idx); +int64_t get(const DDim& dim, int idx); void set(DDim& dim, int idx, int val); -std::vector vectorize(const DDim& ddim); +std::vector vectorize(const DDim& ddim); -ssize_t product(const DDim& ddim); +int64_t product(const DDim& ddim); /** * \brief Slice a ddim diff --git a/paddle/framework/ddim_test.cc b/paddle/framework/ddim_test.cc index 9d18a2972c..756232b1b5 100644 --- a/paddle/framework/ddim_test.cc +++ b/paddle/framework/ddim_test.cc @@ -12,7 +12,7 @@ TEST(DDim, Equality) { EXPECT_EQ(ddim[2], 5); // construct a DDim from a vector - std::vector vec({9, 1, 5}); + std::vector vec({9, 1, 5}); paddle::framework::DDim vddim = paddle::framework::make_ddim(vec); EXPECT_EQ(ddim[0], 9); EXPECT_EQ(ddim[1], 1); @@ -25,7 +25,7 @@ TEST(DDim, Equality) { EXPECT_EQ(paddle::framework::get(ddim, 0), 6); // vectorize a DDim - std::vector res_vec = paddle::framework::vectorize(vddim); + std::vector res_vec = paddle::framework::vectorize(vddim); EXPECT_EQ(res_vec[0], 9); EXPECT_EQ(res_vec[1], 1); EXPECT_EQ(res_vec[2], 5); diff --git a/paddle/framework/dim.h b/paddle/framework/dim.h index 883fdc55eb..04d4b0e604 100644 --- a/paddle/framework/dim.h +++ b/paddle/framework/dim.h @@ -17,13 +17,13 @@ struct Dim { static constexpr int dimensions = i; template - HOSTDEVICE Dim(int _head, Args... _tail) : head(_head), tail(_tail...) { + HOSTDEVICE Dim(int64_t _head, Args... _tail) : head(_head), tail(_tail...) { static_assert(sizeof...(_tail) == i - 1, "Dim initialized with the wrong number of parameters"); } HOSTDEVICE - Dim(int _head, const Dim& _tail) : head(_head), tail(_tail) {} + Dim(int64_t _head, const Dim& _tail) : head(_head), tail(_tail) {} HOSTDEVICE Dim() : head(0), tail() {} @@ -31,12 +31,12 @@ struct Dim { /** Construct a Dim from a linear index and size. Uses Fortran order * indexing. */ HOSTDEVICE - Dim(int idx, const Dim& size) + Dim(int64_t idx, const Dim& size) : head(idx % size.head), tail(idx / size.head, size.tail) {} /** Construct a Dim with each dimension set to the given index */ HOSTDEVICE - Dim(int idx) : head(idx), tail(idx) {} + Dim(int64_t idx) : head(idx), tail(idx) {} HOSTDEVICE bool operator==(const Dim& o) const { @@ -47,13 +47,13 @@ struct Dim { bool operator!=(const Dim& o) const { return !(*this == o); } HOSTDEVICE - int& operator[](int idx); + int64_t& operator[](int idx); HOSTDEVICE - int operator[](int idx) const; + int64_t operator[](int idx) const; HOST std::string to_string() const; - int head; + int64_t head; Dim tail; }; @@ -63,7 +63,7 @@ struct Dim<1> { static constexpr int dimensions = 1; HOSTDEVICE - Dim(int _head) : head(_head) {} + Dim(int64_t _head) : head(_head) {} HOSTDEVICE Dim() : head(0) {} @@ -86,11 +86,11 @@ struct Dim<1> { bool operator!=(const Dim<1>& o) const { return !(*this == o); } HOSTDEVICE - int& operator[](int idx); + int64_t& operator[](int idx); HOSTDEVICE - int operator[](int idx) const; + int64_t operator[](int idx) const; - int head; + int64_t head; }; namespace { @@ -100,12 +100,12 @@ template struct DimGetter { // Return a copy if Dim is const template - HOSTDEVICE static int impl(const D& d) { + HOSTDEVICE static int64_t impl(const D& d) { return DimGetter::impl(d.tail); } // Return a reference if Dim is mutable template - HOSTDEVICE static int& impl(D& d) { + HOSTDEVICE static int64_t& impl(D& d) { return DimGetter::impl(d.tail); } }; @@ -115,18 +115,18 @@ template <> struct DimGetter<0> { // Return a copy if Dim is const template - HOSTDEVICE static int impl(const D& d) { + HOSTDEVICE static int64_t impl(const D& d) { return d.head; } // Return a reference if Dim is mutable template - HOSTDEVICE static int& impl(D& d) { + HOSTDEVICE static int64_t& impl(D& d) { return d.head; } }; template -HOSTDEVICE int& indexer(Dim& dim, int idx) { +HOSTDEVICE int64_t& indexer(Dim& dim, int idx) { #ifndef __CUDA_ARCH__ if (idx < 0) { throw std::invalid_argument("Tried to access a negative dimension"); @@ -141,7 +141,7 @@ HOSTDEVICE int& indexer(Dim& dim, int idx) { } template <> -HOSTDEVICE int& indexer<1>(Dim<1>& dim, int idx) { +HOSTDEVICE int64_t& indexer<1>(Dim<1>& dim, int idx) { #ifndef __CUDA_ARCH__ if (idx != 0) { throw std::invalid_argument("Invalid index"); @@ -153,7 +153,7 @@ HOSTDEVICE int& indexer<1>(Dim<1>& dim, int idx) { } template -HOSTDEVICE int indexer(const Dim& dim, int idx) { +HOSTDEVICE int64_t indexer(const Dim& dim, int idx) { #ifndef __CUDA_ARCH__ if (idx < 0) { throw std::invalid_argument("Tried to access a negative dimension"); @@ -168,7 +168,7 @@ HOSTDEVICE int indexer(const Dim& dim, int idx) { } template <> -HOSTDEVICE int indexer<1>(const Dim<1>& dim, int idx) { +HOSTDEVICE int64_t indexer<1>(const Dim<1>& dim, int idx) { #ifndef __CUDA_ARCH__ if (idx != 0) { throw std::invalid_argument("Invalid index"); @@ -182,73 +182,76 @@ HOSTDEVICE int indexer<1>(const Dim<1>& dim, int idx) { } // namespace // Static access to constant Dim template -HOSTDEVICE int get(const Dim& d) { +HOSTDEVICE int64_t get(const Dim& d) { return DimGetter::impl(d); } // Static access to mutable Dim template -HOSTDEVICE int& get(Dim& d) { +HOSTDEVICE int64_t& get(Dim& d) { return DimGetter::impl(d); } // Dynamic access to constant Dim template -HOSTDEVICE int Dim::operator[](int i) const { +HOSTDEVICE int64_t Dim::operator[](int i) const { return indexer(*this, i); } // Dynamic access to mutable Dim template -HOSTDEVICE int& Dim::operator[](int i) { +HOSTDEVICE int64_t& Dim::operator[](int i) { return indexer(*this, i); } // Dynamic access to constant Dim -inline HOSTDEVICE int Dim<1>::operator[](int i) const { +inline HOSTDEVICE int64_t Dim<1>::operator[](int i) const { return indexer(*this, i); } // Dynamic access to mutable Dim -inline HOSTDEVICE int& Dim<1>::operator[](int i) { return indexer(*this, i); } +inline HOSTDEVICE int64_t& Dim<1>::operator[](int i) { + return indexer(*this, i); +} // Dynamic access to constant Dim // without std::enable_if will try to instantiate this on get<0>(d) template -HOSTDEVICE typename std::enable_if<(l > 0), int>::type get(const Dim& d, - int i) { +HOSTDEVICE typename std::enable_if<(l > 0), int64_t>::type get(const Dim& d, + int i) { return d[i]; } // Dynamic access to mutable Dim template -HOSTDEVICE typename std::enable_if<(l > 0), int&>::type get(Dim& d, int i) { +HOSTDEVICE typename std::enable_if<(l > 0), int64_t&>::type get(Dim& d, + int i) { return d[i]; } // Dot product of two dims template -HOSTDEVICE int linearize(const Dim& a, const Dim& b) { +HOSTDEVICE int64_t linearize(const Dim& a, const Dim& b) { return a.head * b.head + linearize(a.tail, b.tail); } // Base case dot product of two Dims // Notice it is inline because it is no longer a template template <> -HOSTDEVICE inline int linearize(const Dim<1>& a, const Dim<1>& b) { +HOSTDEVICE inline int64_t linearize(const Dim<1>& a, const Dim<1>& b) { return a.head * b.head; } // Product of a Dim template -HOSTDEVICE int product(const Dim& a, int prod = 1) { +HOSTDEVICE int64_t product(const Dim& a, int prod = 1) { return prod * a.head * product(a.tail); } // Base case product of a Dim // Notice it is inline because it is no longer a template template <> -HOSTDEVICE inline int product(const Dim<1>& a, int prod) { +HOSTDEVICE inline int64_t product(const Dim<1>& a, int prod) { return prod * a.head; } diff --git a/paddle/framework/dim_test.cu b/paddle/framework/dim_test.cu index 3898d0a447..0a6a87669c 100644 --- a/paddle/framework/dim_test.cu +++ b/paddle/framework/dim_test.cu @@ -8,7 +8,7 @@ __global__ void test(paddle::framework::Dim<2>* o) { o[0] = paddle::framework::make_dim(5, 6); } -__global__ void dyn_idx_gpu(int* o) { +__global__ void dyn_idx_gpu(int64_t* o) { auto d = paddle::framework::make_dim(5, 6); o[0] = d[1]; } @@ -47,9 +47,9 @@ TEST(Dim, Equality) { EXPECT_EQ(b[1], 11); // dynamic access on GPU - thrust::device_vector r(1); + thrust::device_vector r(1); dyn_idx_gpu<<<1, 1>>>(thrust::raw_pointer_cast(r.data())); - int res = r[0]; + int64_t res = r[0]; EXPECT_EQ(res, 6); // ex_prefix_mul diff --git a/paddle/framework/eigen.h b/paddle/framework/eigen.h index a4667cc51f..2d8d9ae10c 100644 --- a/paddle/framework/eigen.h +++ b/paddle/framework/eigen.h @@ -28,7 +28,7 @@ struct EigenDim { static Type From(const DDim& dims) { PADDLE_ENFORCE(arity(dims) == D, "D must match arity(DDim)"); Type ret; - for (int d = 0; d < arity(dims); d++) { + for (int64_t d = 0; d < arity(dims); d++) { ret[d] = dims[d]; } return ret; diff --git a/paddle/framework/framework.proto b/paddle/framework/framework.proto index ae44a1ffd4..368136a972 100644 --- a/paddle/framework/framework.proto +++ b/paddle/framework/framework.proto @@ -22,8 +22,14 @@ enum AttrType { INTS = 3; FLOATS = 4; STRINGS = 5; + INT_PAIRS = 6; } +message IntPair { + required int32 first = 1; + required int32 second = 2; +}; + // OpDesc describes an instance of a C++ framework::OperatorBase // derived class type. message OpDesc { @@ -37,6 +43,7 @@ message OpDesc { repeated int32 ints = 6; repeated float floats = 7; repeated string strings = 8; + repeated IntPair int_pairs = 9; }; message Var { diff --git a/paddle/framework/grad_op_builder_test.cc b/paddle/framework/grad_op_builder_test.cc index 902c2655e9..9e3ca563c6 100644 --- a/paddle/framework/grad_op_builder_test.cc +++ b/paddle/framework/grad_op_builder_test.cc @@ -3,7 +3,7 @@ #include "paddle/framework/op_registry.h" #include "paddle/framework/operator.h" -USE_OP(add_two); +USE_OP(add); namespace paddle { namespace framework { @@ -41,7 +41,7 @@ namespace f = paddle::framework; TEST(GradOpBuilder, AddTwo) { std::shared_ptr add_op(f::OpRegistry::CreateOp( - "add_two", {{"X", {"x"}}, {"Y", {"y"}}}, {{"Out", {"out"}}}, {})); + "add", {{"X", {"x"}}, {"Y", {"y"}}}, {{"Out", {"out"}}}, {})); std::shared_ptr grad_add_op = f::OpRegistry::CreateGradOp(*add_op); EXPECT_EQ(grad_add_op->Inputs().size(), 4UL); diff --git a/paddle/framework/lod_tensor.cc b/paddle/framework/lod_tensor.cc index 2b17890774..71eac4a10b 100644 --- a/paddle/framework/lod_tensor.cc +++ b/paddle/framework/lod_tensor.cc @@ -19,25 +19,24 @@ namespace paddle { namespace framework { -LODTensor::LOD LODTensor::LOD::SliceLevels(size_t level_begin, - size_t level_end) const { +LOD SliceLevels(const LOD& in, size_t level_begin, size_t level_end) { LOD new_lod; new_lod.reserve(level_end - level_begin); for (size_t i = level_begin; i < level_end; i++) { - new_lod.emplace_back(at(i)); + new_lod.emplace_back(in.at(i)); } return new_lod; } -LODTensor::LOD LODTensor::LOD::SliceInLevel(size_t level, size_t elem_begin, - size_t elem_end) const { +LOD SliceInLevel(const LOD& in, size_t level, size_t elem_begin, + size_t elem_end) { // slice the lod. LOD new_lod; - new_lod.reserve(size() - level); - auto start = this->at(level)[elem_begin]; - auto end = this->at(level)[elem_end]; + new_lod.reserve(in.size() - level); + auto start = in.at(level)[elem_begin]; + auto end = in.at(level)[elem_end]; - for (auto it = this->begin() + level; it != this->end(); it++) { + for (auto it = in.begin() + level; it != in.end(); it++) { auto it_begin = std::find(it->begin(), it->end(), start); auto it_end = std::find(it_begin, it->end(), end); PADDLE_ENFORCE(it_begin != it->end(), "error in parsing lod info"); @@ -49,11 +48,11 @@ LODTensor::LOD LODTensor::LOD::SliceInLevel(size_t level, size_t elem_begin, [start](int v) { return v - start; }); PADDLE_ENFORCE_EQ(new_lod.back().front(), 0, "error in slice LOD"); } - PADDLE_ENFORCE_LE(new_lod.size(), this->size()); + PADDLE_ENFORCE_LE(new_lod.size(), in.size()); return new_lod; } -bool operator==(const LODTensor::LOD& a, const LODTensor::LOD& b) { +bool operator==(const LOD& a, const LOD& b) { if (a.size() != b.size()) { return false; } @@ -70,9 +69,27 @@ bool operator==(const LODTensor::LOD& a, const LODTensor::LOD& b) { } } } - return true; } +void LODTensor::SliceLevels(size_t level_begin, size_t level_end) { + auto new_lod = framework::SliceLevels(lod_, level_begin, level_end); + lod_ = new_lod; +} + +void LODTensor::SliceInLevel(size_t level, size_t elem_begin, size_t elem_end) { + PADDLE_ENFORCE(level < NumLevels(), "level [%d] out of range [%d]", level, + NumLevels()); + PADDLE_ENFORCE(elem_begin < NumElements(level), + "element begin [%d] out of range [%d]", elem_begin, + NumElements(level)); + PADDLE_ENFORCE(elem_end < NumElements(level) + 1, + "element end [%d] out of range [%d]", elem_end, + NumElements(level)); + + auto new_lod = framework::SliceInLevel(lod_, level, elem_begin, elem_end); + lod_ = new_lod; +} + } // namespace framework } // namespace paddle diff --git a/paddle/framework/lod_tensor.h b/paddle/framework/lod_tensor.h index 9e27aec38d..9e6b6b4aca 100644 --- a/paddle/framework/lod_tensor.h +++ b/paddle/framework/lod_tensor.h @@ -15,7 +15,7 @@ #pragma once #include -#if !defined(PADDLE_ONLY_CPU) +#ifndef PADDLE_ONLY_CPU #include #include #endif @@ -27,33 +27,39 @@ namespace paddle { namespace framework { +#ifdef PADDLE_ONLY_CPU +template +using Vector = std::vector; +#else +template +using Vector = thrust::host_vector; +#endif + +using LOD = std::vector>; + +LOD SliceLevels(const LOD& in, size_t level_begin, size_t level_end); + +LOD SliceInLevel(const LOD& in, size_t level, size_t elem_begin, + size_t elem_end); + +bool operator==(const LOD& a, const LOD& b); + /* * LODTensor (Level of details Tensor) * see https://en.wikipedia.org/wiki/Level_of_details for reference. */ -class LODTensor : public Tensor { +class LODTensor { public: -// Level save offsets of each unit. -#ifdef PADDLE_ONLY_CPU - template - using Vector = std::vector; -#else - template - using Vector = thrust::host_vector; -#endif - // LoD stores offsets of each level of units, the largest units level first, - // then the smaller units level. Each Level stores the offsets of units in - // Tesor. - class LOD : public std::vector> { - public: - LOD SliceLevels(size_t level_begin, size_t level_end) const; - LOD SliceInLevel(size_t level, size_t elem_begin, size_t elem_end) const; - }; - LODTensor() {} - explicit LODTensor(const LOD &lod) : lod_(lod) {} + LODTensor(const LOD& lod, Tensor* t) : lod_(lod), tensor_(t) {} + + void set_lod(const LOD& lod) { lod_ = lod; } - virtual Tensor *Clone() const { return new LODTensor(lod_); } + void set_tensor(Tensor* tensor) { tensor_ = tensor; } + + Tensor& tensor() { return *tensor_; } + + LOD lod() { return lod_; } /* * Get a element from LOD. @@ -79,71 +85,23 @@ class LODTensor : public Tensor { PADDLE_ENFORCE(level < NumLevels(), "level [%d] out of range [%d]", level, NumLevels()); // the last offset is the end of last element - return lod_[level].size() - 1; + return (lod_)[level].size() - 1; } /* - * Slice of levels[level_begin:level_end], with tensor shared. + * Slice of levels[level_begin:level_end] */ - template - LODTensor SliceLevels(size_t level_begin, size_t level_end) const; + void SliceLevels(size_t level_begin, size_t level_end); /* - * Slice of elements of a level, [elem_begin: elem_end], with tensor shared. + * Slice of elements of a level, [elem_begin: elem_end] * @note: low performance in slice lod_. */ - template - LODTensor SliceInLevel(size_t level, size_t elem_begin, - size_t elem_end) const; - - /* - * Copy other's lod_'s content, free to mutate. - */ - void CopyLOD(const LODTensor &other) { lod_ = other.lod_; } - /* - * Determine whether LODTensor has a valid LOD info. - */ - const LOD &lod() const { return lod_; } - LOD *mutable_lod() { return &lod_; } - - virtual ~LODTensor() {} + void SliceInLevel(size_t level, size_t elem_begin, size_t elem_end); private: LOD lod_; + Tensor* tensor_; // not owned }; - -bool operator==(const LODTensor::LOD &a, const LODTensor::LOD &b); - -template -LODTensor LODTensor::SliceLevels(size_t level_begin, size_t level_end) const { - auto new_lod = lod_.SliceLevels(level_begin, level_end); - // slice levels just need to update LOD info, each level will contains the - // whole tensor_, so no need to modify tensor_. - LODTensor new_tensor(new_lod); - new_tensor.ShareDataWith(*this); - return new_tensor; -} - -template -LODTensor LODTensor::SliceInLevel(size_t level, size_t elem_begin, - size_t elem_end) const { - PADDLE_ENFORCE(level < NumLevels(), "level [%d] out of range [%d]", level, - NumLevels()); - PADDLE_ENFORCE(elem_begin < NumElements(level), - "element begin [%d] out of range [%d]", elem_begin, - NumElements(level)); - PADDLE_ENFORCE(elem_end < NumElements(level) + 1, - "element end [%d] out of range [%d]", elem_end, - NumElements(level)); - - auto new_lod = lod_.SliceInLevel(level, elem_begin, elem_end); - - // slice elements just need to update LOD info, because offsets are not - // changed, so the original tensor_ can be reused. - LODTensor new_tensor(new_lod); - new_tensor.ShareDataWith(*this); - return new_tensor; -} - } // namespace framework } // namespace paddle diff --git a/paddle/framework/lod_tensor.md b/paddle/framework/lod_tensor.md new file mode 100644 index 0000000000..8dfe3ee823 --- /dev/null +++ b/paddle/framework/lod_tensor.md @@ -0,0 +1,122 @@ +# Design Doc: LoD (Level-of-Detail) Tensor + +PaddlePaddle's RNN doesn't require that all instances have the same length. To do so, we introduce an extension to Tensor, namely, LoD Tensor. + +## Challenge of Variable-length Inputs + +People usually represent a mini-batch by a Tensor. For example, a mini-batch of 32 images, each of size 32x32, is a 10x32x32 Tensor. So a transformation, T, of all images can be a matrix multiplication of the 32x32xO-dimensional tensor T and the 10x32x32 Tensor. + +Another example is that each mini-batch contains 32 sentences, where each word is a D-dimensional one-hot vector. If all sentences have the same length L, we can represent this mini-batch by a 32xLxD tensor. However, in most cases, sentences have variable lengths, and we will need an index data structure to record these variable lengths. + +## LoD as a Solution + +### Mini-Batch of variable-length sentenses + +Let's imagine a mini-batch of 3 variable lengths sentences, containing 3, 1, and 2 words respectively. We can represent it by a (3+1+2)xD tensor plus some index information: + +``` + 3 +3 1 2 +||| | || +``` + +Each `|` represents a D-dimensional word vectors. The number 3 on top indicate 3 sentences, and numbers 3, 1, and 2 on the second level represent the number of words in each sentence. + +### Mini-Batch of variable-length videos + +This approach generalizes to the case where elements are not words, but higher dimensional objects, like images. Suppose that a mini-batch contains videos of the same frame size 640x480. If a mini-batch contains 3 videos of 3, 1, and 2 frames respectively. The underlying tensor is of size (3+1+2)x640x480. The index information illustrates as: + +``` + 3 +3 1 2 +口口口 口 口口 +``` + +where each `口` represents an image. + +### Mini-Batch of fixed-size images + +Let's get back to a typical example, image classification, where each mini-batch has M fixed-sized images. The LoD Tensor representation is + +``` + M +1 1 1 1 1 +口口口口 ... 口 +``` + +The many 1's on the second level seem duplicated. For this particular case of 2 levels and the second level always have length 1, we can ignore the LoD index. + +### Design and summarization + +In summary, as long as that the essential elements (words or images) have the same size, we can represent mini-batches by a LoD Tensor: + +- The underlying tensor has size LxD1xD2x..., where D1xD2... is the size of the essential elements, and +- the first dimension size L has an additon property -- a LoD index as a nested vector: + + ```c++ + typedef std::vector > LoD; + ``` + +- The LoD index can is not necessary when there are only two levels and all elements of the second level have length 1. + +## Slicing of LoD Tensor + +Consider that we have a network with three levels of RNN: the top level one handles articles, the second level one handles sentences, and the basic level one handles words. This network requires that mini-batches represented by 4 level LoD Tensor, for example, + +``` + 3 +3 1 2 +3 2 4 1 2 3 +||| || |||| | || ||| +``` + +To allow each level of RNN to handle its input, we define **the slicing of a LoD Tensor is defined as getting the j-th sequence on level i, or the -slice** + +For example, the <2,1>-slice of above slice is + +``` +2 +|| +``` + +and the <1,2>-slice of above example is + +``` +2 +2 3 +|| ||| +``` + +Let's go on slicing this slice. Its <1,1>-slice is + +``` +3 +||| +``` + +### The General Slicing Algorithm + +The algorithm, with over-simplified data structure, is defined as + +```c++ +typedef vector > LoD; + +struct LoDTensor { + LoD lod_; + float* tensor_; +}; + +LoDTensor Slice(const LoDTensor& lodt, int level, int sequence) { + +} +``` + +### Slicing the Top Level + +Please be aware that an RNN operator only slices the top level of a LoD Tensor to get the step inputs. + +```c++ +LoDTensor Slice(const LoDTensor& lodt, int sequence) { + +} +``` diff --git a/paddle/framework/lod_tensor_test.cc b/paddle/framework/lod_tensor_test.cc index 2881136ced..9a351605ed 100644 --- a/paddle/framework/lod_tensor_test.cc +++ b/paddle/framework/lod_tensor_test.cc @@ -24,13 +24,12 @@ namespace framework { class LODTensorTester : public ::testing::Test { public: virtual void SetUp() override { - lod_tensor.reset(new LODTensor); // tensor's batch_size: 30 // 3 levels // 0 10 20 // 0 5 10 15 20 // 0 2 5 7 10 12 15 20 - LODTensor::LOD lod; + LOD lod; lod.push_back(std::vector{0, 10, 20}); lod.push_back(std::vector{0, 5, 10, 15, 20}); lod.push_back(std::vector{0, 2, 5, 7, 10, 12, 15, 17, 20}); @@ -41,75 +40,65 @@ class LODTensorTester : public ::testing::Test { // malloc memory tensor.mutable_data(place); - lod_tensor.reset(new LODTensor(lod)); - lod_tensor->Resize({20 /*batch size*/, 128 /*dim*/}); - - lod_tensor->ShareDataWith(tensor); - // lod_tensor->ShareDataWith(tensor); + lod_tensor.set_lod(lod); + lod_tensor.set_tensor(&tensor); } protected: - std::unique_ptr lod_tensor; platform::CPUPlace place; Tensor tensor; + LODTensor lod_tensor; }; -TEST_F(LODTensorTester, NumLevels) { ASSERT_EQ(lod_tensor->NumLevels(), 3UL); } +TEST_F(LODTensorTester, NumLevels) { ASSERT_EQ(lod_tensor.NumLevels(), 3UL); } TEST_F(LODTensorTester, NumElements) { - ASSERT_EQ(lod_tensor->NumElements(0), 2UL); - ASSERT_EQ(lod_tensor->NumElements(1), 4UL); - ASSERT_EQ(lod_tensor->NumElements(2), 8UL); + ASSERT_EQ(lod_tensor.NumElements(0), 2UL); + ASSERT_EQ(lod_tensor.NumElements(1), 4UL); + ASSERT_EQ(lod_tensor.NumElements(2), 8UL); } TEST_F(LODTensorTester, SliceLevels) { // slice 1 level for (size_t level = 0; level < 3UL; ++level) { - auto new_lod_tensor = lod_tensor->SliceLevels(level, level + 1); + LODTensor new_lod_tensor = lod_tensor; + new_lod_tensor.SliceLevels(level, level + 1); ASSERT_EQ(new_lod_tensor.NumLevels(), 1UL); - ASSERT_EQ(new_lod_tensor.NumElements(0UL), lod_tensor->NumElements(level)); - // ASSERT_EQ(new_lod_tensor, *lod_tensor); + ASSERT_EQ(new_lod_tensor.NumElements(0), lod_tensor.NumElements(level)); + ASSERT_EQ(new_lod_tensor.tensor().data(), + lod_tensor.tensor().data()); } // slice 2 level for (size_t level = 0; level < 2UL; ++level) { - auto new_lod_tensor = lod_tensor->SliceLevels(level, level + 2); + LODTensor new_lod_tensor = lod_tensor; + new_lod_tensor.SliceLevels(level, level + 2); ASSERT_EQ(new_lod_tensor.NumLevels(), 2UL); - ASSERT_EQ(new_lod_tensor.NumElements(0), lod_tensor->NumElements(level)); - ASSERT_EQ(new_lod_tensor.NumElements(1), - lod_tensor->NumElements(level + 1)); - ASSERT_EQ(new_lod_tensor.data(), lod_tensor->data()); + ASSERT_EQ(new_lod_tensor.NumElements(0), lod_tensor.NumElements(level)); + ASSERT_EQ(new_lod_tensor.NumElements(1), lod_tensor.NumElements(level + 1)); + ASSERT_EQ(new_lod_tensor.tensor().data(), + lod_tensor.tensor().data()); } } TEST_F(LODTensorTester, SliceInLevel) { size_t level = 0; - auto new_lod_tensor = lod_tensor->SliceInLevel(level, 0, 2); + LODTensor new_lod_tensor = lod_tensor; + new_lod_tensor.SliceInLevel(level, 0, 2); EXPECT_EQ(new_lod_tensor.NumLevels(), 3UL); EXPECT_EQ(new_lod_tensor.NumElements(0), 2UL); EXPECT_EQ(new_lod_tensor.NumElements(1), 4UL); EXPECT_EQ(new_lod_tensor.NumElements(2), 8UL); - ASSERT_EQ(new_lod_tensor.data(), lod_tensor->data()); + ASSERT_EQ(new_lod_tensor.tensor().data(), + lod_tensor.tensor().data()); level = 1; - new_lod_tensor = lod_tensor->SliceInLevel(level, 0, 2); + new_lod_tensor = lod_tensor; + new_lod_tensor.SliceInLevel(level, 0, 2); ASSERT_EQ(new_lod_tensor.NumLevels(), 2UL); ASSERT_EQ(new_lod_tensor.NumElements(0), 2UL); ASSERT_EQ(new_lod_tensor.NumElements(1), 4UL); - ASSERT_EQ(new_lod_tensor.data(), lod_tensor->data()); -} - -TEST_F(LODTensorTester, ShareLOD) { - LODTensor new_lod_tensor; - new_lod_tensor.CopyLOD(*lod_tensor); - ASSERT_EQ(new_lod_tensor.lod(), lod_tensor->lod()); -} - -TEST_F(LODTensorTester, CopyLOD) { - LODTensor new_lod_tensor; - new_lod_tensor.CopyLOD(*lod_tensor); - bool equals = std::equal(lod_tensor->lod().begin(), lod_tensor->lod().end(), - new_lod_tensor.lod().begin()); - ASSERT_TRUE(equals); + ASSERT_EQ(new_lod_tensor.tensor().data(), + lod_tensor.tensor().data()); } } // namespace framework diff --git a/paddle/framework/op_info.h b/paddle/framework/op_info.h index 94245c6c44..b98d8f23a1 100644 --- a/paddle/framework/op_info.h +++ b/paddle/framework/op_info.h @@ -80,9 +80,19 @@ class OpInfoMap { } const OpInfo& Get(const std::string& type) const { + auto op_info_ptr = GetNullable(type); + PADDLE_ENFORCE_NOT_NULL(op_info_ptr, "Operator %s has not been registered", + type); + return *op_info_ptr; + } + + const OpInfo* GetNullable(const std::string& type) const { auto it = map_.find(type); - PADDLE_ENFORCE(it != map_.end(), "Operator %s are not found", type); - return it->second; + if (it == map_.end()) { + return nullptr; + } else { + return &it->second; + } } template diff --git a/paddle/framework/op_registry.h b/paddle/framework/op_registry.h index 2d09cde41e..572dff860a 100644 --- a/paddle/framework/op_registry.h +++ b/paddle/framework/op_registry.h @@ -199,6 +199,8 @@ class OpKernelRegistrar : public Registrar { USE_OP_DEVICE_KERNEL(op_type, GPU) #endif +#define USE_NO_KERNEL_OP(op_type) USE_OP_ITSELF(op_type); + #define USE_CPU_ONLY_OP(op_type) \ USE_OP_ITSELF(op_type); \ USE_OP_DEVICE_KERNEL(op_type, CPU); diff --git a/paddle/framework/op_registry_test.cc b/paddle/framework/op_registry_test.cc index 50c45919c5..b43f6a8cc5 100644 --- a/paddle/framework/op_registry_test.cc +++ b/paddle/framework/op_registry_test.cc @@ -174,36 +174,4 @@ TEST(OpRegistry, CustomChecker) { op->Run(scope, dev_ctx); int test_attr = op->GetAttr("test_attr"); ASSERT_EQ(test_attr, 4); -} - -class TestAttrProtoMaker : public pd::OpProtoAndCheckerMaker { - public: - TestAttrProtoMaker(pd::OpProto* proto, pd::OpAttrChecker* op_checker) - : OpProtoAndCheckerMaker(proto, op_checker) { - AddAttr("scale", "scale of test op"); - AddAttr("scale", "scale of test op"); - } -}; - -TEST(ProtoMaker, DuplicatedAttr) { - pd::OpProto op_proto; - pd::OpAttrChecker op_checker; - auto proto_maker = TestAttrProtoMaker(&op_proto, &op_checker); - ASSERT_THROW(proto_maker.Validate(), paddle::platform::EnforceNotMet); -} - -class TestInOutProtoMaker : public pd::OpProtoAndCheckerMaker { - public: - TestInOutProtoMaker(pd::OpProto* proto, pd::OpAttrChecker* op_checker) - : OpProtoAndCheckerMaker(proto, op_checker) { - AddInput("input", "input of test op"); - AddInput("input", "input of test op"); - } -}; - -TEST(ProtoMaker, DuplicatedInOut) { - pd::OpProto op_proto; - pd::OpAttrChecker op_checker; - auto proto_maker = TestInOutProtoMaker(&op_proto, &op_checker); - ASSERT_THROW(proto_maker.Validate(), paddle::platform::EnforceNotMet); -} +} \ No newline at end of file diff --git a/paddle/framework/operator.cc b/paddle/framework/operator.cc index 7abbde610f..790cfc4746 100644 --- a/paddle/framework/operator.cc +++ b/paddle/framework/operator.cc @@ -33,12 +33,12 @@ ExecutionContext::GetEigenDevice() const { } #endif -const std::string& OperatorBase::Input(const std::string& name) const { +std::string OperatorBase::Input(const std::string& name) const { auto& ins = Inputs(name); - PADDLE_ENFORCE_EQ(ins.size(), 1UL, + PADDLE_ENFORCE_LE(ins.size(), 1UL, "Op %s input %s should contain only one variable", type_, name); - return ins[0]; + return ins.empty() ? kEmptyVarName : ins[0]; } const std::vector& OperatorBase::Inputs( @@ -49,12 +49,12 @@ const std::vector& OperatorBase::Inputs( return it->second; } -const std::string& OperatorBase::Output(const std::string& name) const { +std::string OperatorBase::Output(const std::string& name) const { auto& outs = Outputs(name); - PADDLE_ENFORCE_EQ(outs.size(), 1UL, + PADDLE_ENFORCE_LE(outs.size(), 1UL, "Op %s output %s should contain only one variable", type_, name); - return outs[0]; + return outs.empty() ? kEmptyVarName : outs[0]; } const std::vector& OperatorBase::Outputs( @@ -119,16 +119,8 @@ OperatorBase::OperatorBase(const std::string& type, const VariableNameMap& outputs, const AttributeMap& attrs) : type_(type), inputs_(inputs), outputs_(outputs), attrs_(attrs) { - static std::atomic gUniqId(0UL); - for (auto& output : outputs_) { - for (auto& output_name : output.second) { - if (output_name == kTempVarName) { - output_name += type_; - output_name += "@"; - output_name += std::to_string(gUniqId.fetch_add(1)); - } - } - } + GenerateTemporaryNames(); + CheckAllInputOutputSet(); } std::vector OperatorBase::OutputVars(bool has_intermediate) const { @@ -156,6 +148,35 @@ std::vector OperatorBase::OutputVars(bool has_intermediate) const { return ret_val; } +void OperatorBase::CheckAllInputOutputSet() const { + auto& info_map = OpInfoMap::Instance(); + auto* op_info = info_map.GetNullable(Type()); + if (op_info == nullptr || op_info->proto_ == nullptr) return; + + for (auto& in : op_info->Proto().inputs()) { + PADDLE_ENFORCE(inputs_.find(in.name()) != inputs_.end(), + "Type %s's input %s is not set", Type(), in.name()); + } + + for (auto& out : op_info->Proto().outputs()) { + PADDLE_ENFORCE(outputs_.find(out.name()) != outputs_.end(), + "Type %s's output %s is not set", Type(), out.name()); + } +} + +void OperatorBase::GenerateTemporaryNames() { + static std::atomic gUniqId(0UL); + for (auto& output : outputs_) { + for (auto& output_name : output.second) { + if (output_name == kTempVarName) { + output_name += type_; + output_name += "@"; + output_name += std::to_string(gUniqId.fetch_add(1)); + } + } + } +} + void OpProtoAndCheckerMaker::Validate() { validated_ = true; CheckNoDuplicatedInOutAttrs(); diff --git a/paddle/framework/operator.h b/paddle/framework/operator.h index 8397570d26..da92220b04 100644 --- a/paddle/framework/operator.h +++ b/paddle/framework/operator.h @@ -95,12 +95,12 @@ class OperatorBase { const VariableNameMap& Inputs() const { return inputs_; } const VariableNameMap& Outputs() const { return outputs_; } //! Get a input with argument's name described in `op_proto` - const std::string& Input(const std::string& name) const; + std::string Input(const std::string& name) const; //! Get a input which has multiple variables. const std::vector& Inputs(const std::string& name) const; //! Get a output with argument's name described in `op_proto` - const std::string& Output(const std::string& name) const; + std::string Output(const std::string& name) const; //! Get an output which has multiple variables. //! TODO add a vector_view to prevent memory copy. const std::vector& Outputs(const std::string& name) const; @@ -127,6 +127,10 @@ class OperatorBase { // IG (Inputs Gradients) VariableNameMap outputs_; AttributeMap attrs_; + + private: + void GenerateTemporaryNames(); + void CheckAllInputOutputSet() const; }; // Macro for define a clone method. @@ -229,6 +233,15 @@ class InferShapeContext { InferShapeContext(const OperatorBase& op, const Scope& scope) : op_(op), scope_(scope) {} + const OperatorBase& op() const { return op_; } + + const Scope& scope() const { return scope_; } + + template + inline const T& GetAttr(const std::string& name) const { + return op_.GetAttr(name); + } + size_t InputSize(const std::string& name) const { return op_.Inputs(name).size(); } @@ -238,11 +251,13 @@ class InferShapeContext { } const Variable* InputVar(const std::string& name) const { - return scope_.FindVar(op_.Input(name)); + auto ipt = op_.Input(name); + return ipt == kEmptyVarName ? nullptr : scope_.FindVar(ipt); } Variable* OutputVar(const std::string& name) const { - return scope_.FindVar(op_.Output(name)); + auto opt = op_.Output(name); + return opt == kEmptyVarName ? nullptr : scope_.FindVar(opt); } const std::vector MultiInputVar( @@ -250,9 +265,11 @@ class InferShapeContext { auto names = op_.Inputs(name); std::vector res; res.reserve(names.size()); - std::transform( - names.begin(), names.end(), std::back_inserter(res), - [this](const std::string& name) { return scope_.FindVar(name); }); + std::transform(names.begin(), names.end(), std::back_inserter(res), + [this](const std::string& name) { + return name == kEmptyVarName ? nullptr + : scope_.FindVar(name); + }); return res; } @@ -260,24 +277,24 @@ class InferShapeContext { auto names = op_.Outputs(name); std::vector res; res.reserve(names.size()); - std::transform( - names.begin(), names.end(), std::back_inserter(res), - [this](const std::string& name) { return scope_.FindVar(name); }); + std::transform(names.begin(), names.end(), std::back_inserter(res), + [this](const std::string& name) { + return name == kEmptyVarName ? nullptr + : scope_.FindVar(name); + }); return res; } template const T* Input(const std::string& name) const { auto* var = InputVar(name); - PADDLE_ENFORCE_NOT_NULL(var, "Input(%s) should not be nullptr", name); - return &var->Get(); + return var == nullptr ? nullptr : &var->Get(); } template T* Output(const std::string& name) const { auto var = OutputVar(name); - PADDLE_ENFORCE_NOT_NULL(var, "Output(%s) should not be nullptr", name); - return var->GetMutable(); + return var == nullptr ? nullptr : var->GetMutable(); } template @@ -288,10 +305,7 @@ class InferShapeContext { std::transform(names.begin(), names.end(), std::back_inserter(res), [&](const std::string& sub_name) { auto var = scope_.FindVar(sub_name); - PADDLE_ENFORCE_NOT_NULL( - var, "MultiInput(%s:%s) should not be nullptr", name, - sub_name); - return &var->Get(); + return var == nullptr ? nullptr : &var->Get(); }); return res; } @@ -304,14 +318,12 @@ class InferShapeContext { std::transform(names.begin(), names.end(), std::back_inserter(res), [&](const std::string& sub_name) { auto var = scope_.FindVar(sub_name); - PADDLE_ENFORCE_NOT_NULL( - var, "MultiOutput(%s:%s) should not be nullptr.", name, - sub_name); - return var->GetMutable(); + return var == nullptr ? nullptr : var->GetMutable(); }); return res; } + private: const OperatorBase& op_; const Scope& scope_; }; diff --git a/paddle/framework/operator_test.cc b/paddle/framework/operator_test.cc index 1d7efb7b94..8a1970c7a8 100644 --- a/paddle/framework/operator_test.cc +++ b/paddle/framework/operator_test.cc @@ -122,10 +122,10 @@ class CPUKernelTest : public OpKernel { public: void Compute(const ExecutionContext& ctx) const { std::cout << "this is cpu kernel" << std::endl; - std::cout << ctx.op_.DebugString() << std::endl; + std::cout << ctx.op().DebugString() << std::endl; cpu_kernel_run_num++; - ASSERT_EQ(ctx.op_.Input("x"), "IN1"); - ASSERT_EQ(ctx.op_.Output("y"), "OUT1"); + ASSERT_EQ(ctx.op().Input("x"), "IN1"); + ASSERT_EQ(ctx.op().Output("y"), "OUT1"); } }; @@ -148,7 +148,7 @@ class OpKernelTestMultiInputsProtoAndCheckerMaker class CPUKernalMultiInputsTest : public OpKernel { public: void Compute(const ExecutionContext& ctx) const { - auto xs = ctx.op_.Inputs("xs"); + auto xs = ctx.op().Inputs("xs"); ASSERT_EQ(xs.size(), 3UL); ASSERT_EQ(xs[0], "x0"); ASSERT_EQ(xs[1], "x1"); @@ -172,10 +172,10 @@ class CPUKernalMultiInputsTest : public OpKernel { auto outTensor0 = ctx.MultiOutput("ys"); ASSERT_EQ(outTensor0.size(), 2U); - auto k = ctx.op_.Input("k"); + auto k = ctx.op().Input("k"); ASSERT_EQ(k, "k0"); - auto ys = ctx.op_.Outputs("ys"); + auto ys = ctx.op().Outputs("ys"); ASSERT_EQ(ys.size(), 2UL); ASSERT_EQ(ys[0], "y0"); ASSERT_EQ(ys[1], "y1"); @@ -263,4 +263,38 @@ TEST(Operator, Clone) { OperatorClone a("ABC", {}, {}, {}); auto b = a.Clone(); ASSERT_EQ(a.Type(), b->Type()); +} + +class TestAttrProtoMaker : public paddle::framework::OpProtoAndCheckerMaker { + public: + TestAttrProtoMaker(paddle::framework::OpProto* proto, + paddle::framework::OpAttrChecker* op_checker) + : OpProtoAndCheckerMaker(proto, op_checker) { + AddAttr("scale", "scale of test op"); + AddAttr("scale", "scale of test op"); + } +}; + +TEST(ProtoMaker, DuplicatedAttr) { + paddle::framework::OpProto op_proto; + paddle::framework::OpAttrChecker op_checker; + auto proto_maker = TestAttrProtoMaker(&op_proto, &op_checker); + ASSERT_THROW(proto_maker.Validate(), paddle::platform::EnforceNotMet); +} + +class TestInOutProtoMaker : public paddle::framework::OpProtoAndCheckerMaker { + public: + TestInOutProtoMaker(paddle::framework::OpProto* proto, + paddle::framework::OpAttrChecker* op_checker) + : OpProtoAndCheckerMaker(proto, op_checker) { + AddInput("input", "input of test op"); + AddInput("input", "input of test op"); + } +}; + +TEST(ProtoMaker, DuplicatedInOut) { + paddle::framework::OpProto op_proto; + paddle::framework::OpAttrChecker op_checker; + auto proto_maker = TestInOutProtoMaker(&op_proto, &op_checker); + ASSERT_THROW(proto_maker.Validate(), paddle::platform::EnforceNotMet); } \ No newline at end of file diff --git a/paddle/framework/tensor_impl.h b/paddle/framework/tensor_impl.h index 7893e233b7..94f436294f 100644 --- a/paddle/framework/tensor_impl.h +++ b/paddle/framework/tensor_impl.h @@ -58,7 +58,7 @@ inline T* Tensor::mutable_data(platform::Place place) { "Tensor's numel must be larger than zero to call " "Tensor::mutable_data. Call Tensor::set_dim first."); /* some versions of boost::variant don't have operator!= */ - size_t size = product(dims_) * sizeof(T); + int64_t size = product(dims_) * sizeof(T); if (holder_ == nullptr || !(holder_->place() == place) || holder_->size() < size + offset_) { if (platform::is_cpu_place(place)) { @@ -131,7 +131,7 @@ inline Tensor Tensor::Slice(const int& begin_idx, const int& end_idx) const { PADDLE_ENFORCE_LT(begin_idx, end_idx, "Begin index must be less than end index."); PADDLE_ENFORCE_NE(dims_[0], 1, "Can not slice a tensor with dims_[0] = 1."); - int base = product(dims_) / dims_[0]; + size_t base = product(dims_) / dims_[0]; Tensor dst; dst.holder_ = holder_; DDim dst_dims = dims_; diff --git a/paddle/gserver/evaluators/CTCErrorEvaluator.cpp b/paddle/gserver/evaluators/CTCErrorEvaluator.cpp index 132119015f..92087fa32b 100644 --- a/paddle/gserver/evaluators/CTCErrorEvaluator.cpp +++ b/paddle/gserver/evaluators/CTCErrorEvaluator.cpp @@ -14,18 +14,20 @@ limitations under the License. */ #include "Evaluator.h" #include "paddle/gserver/gradientmachines/NeuralNetwork.h" +#include "paddle/utils/StringUtil.h" namespace paddle { /** * calculate sequence-to-sequence edit distance */ -class CTCErrorEvaluator : public NotGetableEvaluator { +class CTCErrorEvaluator : public Evaluator { private: MatrixPtr outActivations_; int numTimes_, numClasses_, numSequences_, blank_; real deletions_, insertions_, substitutions_; int seqClassficationError_; + mutable std::unordered_map evalResults_; std::vector path2String(const std::vector& path) { std::vector str; @@ -183,6 +185,18 @@ private: return stringAlignment(gtStr, recogStr); } + void storeLocalValues() const { + evalResults_["error"] = numSequences_ ? totalScore_ / numSequences_ : 0; + evalResults_["deletion_error"] = + numSequences_ ? deletions_ / numSequences_ : 0; + evalResults_["insertion_error"] = + numSequences_ ? insertions_ / numSequences_ : 0; + evalResults_["substitution_error"] = + numSequences_ ? substitutions_ / numSequences_ : 0; + evalResults_["sequence_error"] = + (real)seqClassficationError_ / numSequences_; + } + public: CTCErrorEvaluator() : numTimes_(0), @@ -245,16 +259,12 @@ public: } virtual void printStats(std::ostream& os) const { - os << config_.name() << "=" - << (numSequences_ ? totalScore_ / numSequences_ : 0); - os << " deletions error" - << "=" << (numSequences_ ? deletions_ / numSequences_ : 0); - os << " insertions error" - << "=" << (numSequences_ ? insertions_ / numSequences_ : 0); - os << " substitutions error" - << "=" << (numSequences_ ? substitutions_ / numSequences_ : 0); - os << " sequences error" - << "=" << (real)seqClassficationError_ / numSequences_; + storeLocalValues(); + os << config_.name() << " error = " << evalResults_["error"]; + os << " deletions error = " << evalResults_["deletion_error"]; + os << " insertions error = " << evalResults_["insertion_error"]; + os << " substitution error = " << evalResults_["substitution_error"]; + os << " sequence error = " << evalResults_["sequence_error"]; } virtual void distributeEval(ParameterClient2* client) { @@ -272,6 +282,37 @@ public: seqClassficationError_ = (int)buf[4]; numSequences_ = (int)buf[5]; } + + void getNames(std::vector* names) { + storeLocalValues(); + names->reserve(names->size() + evalResults_.size()); + for (auto it = evalResults_.begin(); it != evalResults_.end(); ++it) { + names->push_back(config_.name() + "." + it->first); + } + } + + real getValue(const std::string& name, Error* err) const { + storeLocalValues(); + + std::vector buffers; + paddle::str::split(name, '.', &buffers); + auto it = evalResults_.find(buffers[buffers.size() - 1]); + + if (it == evalResults_.end()) { + *err = Error("Evaluator does not have the key %s", name.c_str()); + return 0.0f; + } + + return it->second; + } + + std::string getType(const std::string& name, Error* err) const { + this->getValue(name, err); + if (!err->isOK()) { + return ""; + } + return "ctc_edit_distance"; + } }; REGISTER_EVALUATOR(ctc_edit_distance, CTCErrorEvaluator); diff --git a/paddle/gserver/evaluators/ChunkEvaluator.cpp b/paddle/gserver/evaluators/ChunkEvaluator.cpp index 1658282f3a..a2ab15eede 100644 --- a/paddle/gserver/evaluators/ChunkEvaluator.cpp +++ b/paddle/gserver/evaluators/ChunkEvaluator.cpp @@ -268,7 +268,13 @@ public: } // get type of evaluator - std::string getTypeImpl() const { return "chunk"; } + std::string getType(const std::string& name, Error* err) const { + this->getValue(name, err); + if (!err->isOK()) { + return ""; + } + return "chunk"; + } private: void storeLocalValues() const { diff --git a/paddle/gserver/evaluators/Evaluator.h b/paddle/gserver/evaluators/Evaluator.h index b114500e2b..90203553e0 100644 --- a/paddle/gserver/evaluators/Evaluator.h +++ b/paddle/gserver/evaluators/Evaluator.h @@ -211,6 +211,7 @@ public: *err = Error("Not implemented"); return .0f; } + std::string getType(const std::string& name, Error* err) const { *err = Error("Not implemented"); return ""; @@ -331,6 +332,7 @@ private: protected: std::string getTypeImpl() const; }; + /** * @brief precision, recall and f1 score Evaluator * \f[ @@ -358,6 +360,12 @@ public: virtual void distributeEval(ParameterClient2* client); + void getNames(std::vector* names); + + real getValue(const std::string& name, Error* err) const; + + std::string getType(const std::string& name, Error* err) const; + struct StatsInfo { /// numbers of true positives double TP; @@ -428,11 +436,6 @@ private: mutable std::unordered_map values_; void storeLocalValues() const; - // Evaluator interface -public: - void getNames(std::vector* names); - real getValue(const std::string& name, Error* err) const; - std::string getType(const std::string& name, Error* err) const; }; /* diff --git a/paddle/gserver/layers/Conv3DLayer.cpp b/paddle/gserver/layers/Conv3DLayer.cpp index 7cc9937cce..3887aa58b2 100644 --- a/paddle/gserver/layers/Conv3DLayer.cpp +++ b/paddle/gserver/layers/Conv3DLayer.cpp @@ -42,10 +42,10 @@ bool Conv3DLayer::init(const LayerMap &layerMap, if (sharedBiases_) { CHECK_EQ((size_t)numFilters_, biasParameter_->getSize()); biases_ = - std::unique_ptr(new Weight(1, numFilters_, biasParameter_)); + std::unique_ptr(new Weight(numFilters_, 1, biasParameter_)); } else { biases_ = - std::unique_ptr(new Weight(1, getSize(), biasParameter_)); + std::unique_ptr(new Weight(getSize(), 1, biasParameter_)); } } return true; @@ -224,20 +224,31 @@ void Conv3DLayer::bpropData(int i) { } void Conv3DLayer::bpropBiases() { + MatrixPtr biases = Matrix::create(biases_->getWGrad()->getData(), + 1, + biases_->getWGrad()->getElementCnt(), + false, + useGpu_); MatrixPtr outGradMat = getOutputGrad(); + if (this->sharedBiases_) { - biases_->getWGrad()->collectSharedBias(*outGradMat, 1.0f); + biases->collectSharedBias(*outGradMat, 1.0f); } else { - biases_->getWGrad()->collectBias(*outGradMat, 1.0f); + biases->collectBias(*outGradMat, 1.0f); } } void Conv3DLayer::addBias() { MatrixPtr outMat = getOutputValue(); + MatrixPtr bias = Matrix::create(biases_->getW()->getData(), + 1, + biases_->getW()->getElementCnt(), + false, + useGpu_); if (this->sharedBiases_) { - outMat->addSharedBias(*(biases_->getW()), 1.0f); + outMat->addSharedBias(*(bias), 1.0f); } else { - outMat->addBias(*(biases_->getW()), 1.0f); + outMat->addBias(*(bias), 1.0f); } } diff --git a/paddle/gserver/layers/DeConv3DLayer.cpp b/paddle/gserver/layers/DeConv3DLayer.cpp index 7d5c772c89..2838980a97 100644 --- a/paddle/gserver/layers/DeConv3DLayer.cpp +++ b/paddle/gserver/layers/DeConv3DLayer.cpp @@ -42,10 +42,10 @@ bool DeConv3DLayer::init(const LayerMap &layerMap, if (sharedBiases_) { CHECK_EQ((size_t)numFilters_, biasParameter_->getSize()); biases_ = - std::unique_ptr(new Weight(1, numFilters_, biasParameter_)); + std::unique_ptr(new Weight(numFilters_, 1, biasParameter_)); } else { biases_ = - std::unique_ptr(new Weight(1, getSize(), biasParameter_)); + std::unique_ptr(new Weight(getSize(), 1, biasParameter_)); } } return true; @@ -191,21 +191,31 @@ void DeConv3DLayer::bpropWeights(int i) {} void DeConv3DLayer::bpropData(int i) {} void DeConv3DLayer::bpropBiases() { + MatrixPtr biases = Matrix::create(biases_->getWGrad()->getData(), + 1, + biases_->getWGrad()->getElementCnt(), + false, + useGpu_); const MatrixPtr &outGradMat = getOutputGrad(); if (this->sharedBiases_) { - biases_->getWGrad()->collectSharedBias(*outGradMat, 1.0f); + biases->collectSharedBias(*outGradMat, 1.0f); } else { - biases_->getWGrad()->collectBias(*outGradMat, 1.0f); + biases->collectBias(*outGradMat, 1.0f); } } void DeConv3DLayer::addBias() { MatrixPtr outMat = getOutputValue(); + MatrixPtr bias = Matrix::create(biases_->getW()->getData(), + 1, + biases_->getW()->getElementCnt(), + false, + useGpu_); if (this->sharedBiases_) { - outMat->addSharedBias(*(biases_->getW()), 1.0f); + outMat->addSharedBias(*(bias), 1.0f); } else { - outMat->addBias(*(biases_->getW()), 1.0f); + outMat->addBias(*(bias), 1.0f); } } diff --git a/paddle/gserver/layers/PrintLayer.cpp b/paddle/gserver/layers/PrintLayer.cpp index 0a1e17b9aa..e83ae34bbe 100644 --- a/paddle/gserver/layers/PrintLayer.cpp +++ b/paddle/gserver/layers/PrintLayer.cpp @@ -48,7 +48,16 @@ public: << inputLayers_.size() << ") at " << getName(); } s << format.substr(pos); - LOG(INFO) << s.str(); + + const std::string delimiter("\n"); + std::string content = s.str(); + std::string::size_type foundPos = 0; + std::string::size_type prevPos = 0; + while ((foundPos = content.find(delimiter, prevPos)) != std::string::npos) { + LOG(INFO) << content.substr(prevPos, foundPos - prevPos); + prevPos = foundPos + delimiter.size(); + } + LOG(INFO) << content.substr(prevPos); } void backward(const UpdateCallback& callback) override {} diff --git a/paddle/operators/CMakeLists.txt b/paddle/operators/CMakeLists.txt index e5efcccb0e..8a0ff1eb53 100644 --- a/paddle/operators/CMakeLists.txt +++ b/paddle/operators/CMakeLists.txt @@ -14,27 +14,31 @@ function(op_library TARGET) cmake_parse_arguments(op_library "${options}" "${oneValueArgs}" "${multiValueArgs}" ${ARGN}) - foreach(src ${op_library_SRCS}) - if (${src} MATCHES ".*\\.cu$") - list(APPEND cu_srcs ${src}) - elseif(${src} MATCHES ".*\\.cc$") - list(APPEND cc_srcs ${src}) - else() - message(FATAL_ERROR "${TARGET} Source file ${src} should only be .cc or .cu") + list(LENGTH op_library_SRCS op_library_SRCS_len) + if (${op_library_SRCS_len} EQUAL 0) + if (EXISTS ${CMAKE_CURRENT_SOURCE_DIR}/${TARGET}.cc) + list(APPEND cc_srcs ${TARGET}.cc) endif() - endforeach() + if (EXISTS ${CMAKE_CURRENT_SOURCE_DIR}/${TARGET}.cu) + list(APPEND cu_srcs ${TARGET}.cu) + endif() + else() + foreach(src ${op_library_SRCS}) + if (${src} MATCHES ".*\\.cu$") + list(APPEND cu_srcs ${src}) + elseif(${src} MATCHES ".*\\.cc$") + list(APPEND cc_srcs ${src}) + else() + message(FATAL_ERROR "${TARGET} Source file ${src} should only be .cc or .cu") + endif() + endforeach() + endif() list(LENGTH cc_srcs cc_srcs_len) if (${cc_srcs_len} EQUAL 0) message(FATAL_ERROR "The op library ${TARGET} should contains at least one .cc file") endif() - list(LENGTH cu_srcs cu_srcs_len) - list(LENGTH op_library_DEPS dep_len) - if (${cu_srcs_len} EQUAL 0 AND ${dep_len} EQUAL 0) - message(WARNING "The op library ${TARGET} not support GPU!") - endif() - if (WITH_GPU) nv_library(${TARGET} SRCS ${cc_srcs} ${cu_srcs} DEPS ${op_library_DEPS} ${op_common_deps}) @@ -46,22 +50,22 @@ endfunction() add_subdirectory(math) -list(REMOVE_ITEM GENERAL_OPS - net_op - minus_op - mul_op - recurrent_op - scale_op) - -op_library(net_op SRCS net_op.cc) -op_library(minus_op SRCS minus_op.cc minus_op.cu DEPS scale_op) -op_library(mul_op SRCS mul_op.cc mul_op.cu DEPS math_function) +set(DEPS_OPS + identity_op + minus_op + mul_op + recurrent_op + scale_op) +op_library(identity_op DEPS scale_op) +op_library(minus_op DEPS scale_op) +op_library(mul_op DEPS math_function) op_library(recurrent_op SRCS recurrent_op.cc rnn/recurrent_op_utils.cc DEPS framework_proto tensor operator net_op) -op_library(scale_op SRCS scale_op.cc scale_op.cu DEPS net_op) +op_library(scale_op DEPS net_op) +list(REMOVE_ITEM GENERAL_OPS ${DEPS_OPS}) foreach(src ${GENERAL_OPS}) - op_library(${src} SRCS ${src}.cc ${src}.cu) + op_library(${src}) endforeach() set(GLOB_OP_LIB ${OP_LIBRARY} CACHE INTERNAL "Global OP library") diff --git a/paddle/operators/add_op.cc b/paddle/operators/add_op.cc index 8ab748ed71..8dbd47cf0d 100644 --- a/paddle/operators/add_op.cc +++ b/paddle/operators/add_op.cc @@ -57,7 +57,6 @@ class AddOpGrad : public framework::OperatorWithKernel { } // namespace paddle namespace ops = paddle::operators; -REGISTER_OP(add_two, ops::AddOp, ops::AddOpMaker, add_two_grad, ops::AddOpGrad); +REGISTER_OP(add, ops::AddOp, ops::AddOpMaker, add_grad, ops::AddOpGrad); -REGISTER_OP_CPU_KERNEL(add_two, - ops::AddKernel); +REGISTER_OP_CPU_KERNEL(add, ops::AddKernel); diff --git a/paddle/operators/add_op.cu b/paddle/operators/add_op.cu index cec5f558cb..d9c6d20a6c 100644 --- a/paddle/operators/add_op.cu +++ b/paddle/operators/add_op.cu @@ -12,10 +12,7 @@ See the License for the specific language governing permissions and limitations under the License. */ -#define EIGEN_USE_GPU -#include "paddle/framework/op_registry.h" #include "paddle/operators/add_op.h" namespace ops = paddle::operators; -REGISTER_OP_GPU_KERNEL(add_two, - ops::AddKernel); +REGISTER_OP_GPU_KERNEL(add, ops::AddKernel); diff --git a/paddle/operators/cos_sim_op.cc b/paddle/operators/cos_sim_op.cc new file mode 100644 index 0000000000..c033af3b74 --- /dev/null +++ b/paddle/operators/cos_sim_op.cc @@ -0,0 +1,107 @@ +/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve. + + Licensed under the Apache License, Version 2.0 (the "License"); + you may not use this file except in compliance with the License. + You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. */ + +#include "paddle/operators/cos_sim_op.h" + +namespace paddle { +namespace operators { + +using framework::Tensor; + +class CosSimOp : public framework::OperatorWithKernel { + public: + using framework::OperatorWithKernel::OperatorWithKernel; + + protected: + void InferShape(const framework::InferShapeContext &ctx) const override { + PADDLE_ENFORCE_NOT_NULL(ctx.InputVar("X"), "Input(X) must not be null."); + PADDLE_ENFORCE_NOT_NULL(ctx.InputVar("Y"), "Input(Y) must not be null."); + PADDLE_ENFORCE_EQ(ctx.Input("X")->dims(), + ctx.Input("Y")->dims(), + "Dimensions of Input(X) and Input(Y) must be the same."); + + auto dims = ctx.Input("X")->dims(); + ctx.Output("Out")->Resize({dims[0], 1}); + ctx.Output("XNorm")->Resize({dims[0], 1}); + ctx.Output("YNorm")->Resize({dims[0], 1}); + } +}; + +class CosSimOpMaker : public framework::OpProtoAndCheckerMaker { + public: + CosSimOpMaker(framework::OpProto *proto, framework::OpAttrChecker *op_checker) + : OpProtoAndCheckerMaker(proto, op_checker) { + AddInput("X", "The first input of cos_sim op."); + AddInput("Y", "The second input of cos_sim op."); + AddOutput("Out", "The output of cos_sim op."); + AddOutput("XNorm", "Row norm of the first input.").AsIntermediate(); + AddOutput("YNorm", "Row norm of the second input.").AsIntermediate(); + + AddComment(R"DOC( +Cosine Similarity Operator. + +The equation is: Out = X^T * Y / (sqrt(X^T * X) * sqrt(Y^T * Y)) +)DOC"); + } +}; + +class CosSimOpGrad : public framework::OperatorWithKernel { + public: + using framework::OperatorWithKernel::OperatorWithKernel; + + protected: + void InferShape(const framework::InferShapeContext &ctx) const override { + PADDLE_ENFORCE_NOT_NULL(ctx.InputVar("X"), "Input(X) must not be null."); + PADDLE_ENFORCE_NOT_NULL(ctx.InputVar("Y"), "Input(Y) must not be null."); + PADDLE_ENFORCE_NOT_NULL(ctx.InputVar("XNorm"), + "Input(XNorm) must not be null."); + PADDLE_ENFORCE_NOT_NULL(ctx.InputVar("YNorm"), + "Input(YNorm) must not be null."); + PADDLE_ENFORCE_NOT_NULL(ctx.InputVar(framework::GradVarName("Out")), + "Input(Out@GRAD) must not be null."); + + auto x_dims = ctx.Input("X")->dims(); + auto y_dims = ctx.Input("Y")->dims(); + auto xnorm_dims = ctx.Input("XNorm")->dims(); + auto ynorm_dims = ctx.Input("YNorm")->dims(); + auto out_dims = ctx.Input(framework::GradVarName("Out"))->dims(); + PADDLE_ENFORCE_EQ(x_dims, y_dims, + "Dimensions of Input(X) and Input(Y) must be the same."); + PADDLE_ENFORCE_EQ(xnorm_dims[0], x_dims[0], + "1st dimension of XNorm must equal that of Input(X)."); + PADDLE_ENFORCE_EQ(xnorm_dims[1], 1, "2st dimension of XNorm must be one."); + PADDLE_ENFORCE_EQ(ynorm_dims[0], y_dims[0], + "1st dimension of YNorm must equal that of Input(Y)."); + PADDLE_ENFORCE_EQ(ynorm_dims[1], 1, "2st dimension of YNorm must be one."); + PADDLE_ENFORCE_EQ(out_dims[0], x_dims[0], + "1st dimension of Out@GRAD must equal that of Input(X)"); + PADDLE_ENFORCE_EQ(out_dims[1], 1, "1st dimension of Out@GRAD must be one."); + + auto *x_grad = ctx.Output(framework::GradVarName("X")); + auto *y_grad = ctx.Output(framework::GradVarName("Y")); + if (x_grad) x_grad->Resize(x_dims); + if (y_grad) y_grad->Resize(y_dims); + } +}; + +} // namespace operators +} // namespace paddle + +namespace ops = paddle::operators; +REGISTER_OP(cos_sim, ops::CosSimOp, ops::CosSimOpMaker, cos_sim_grad, + ops::CosSimOpGrad); +REGISTER_OP_CPU_KERNEL(cos_sim, + ops::CosSimKernel); +REGISTER_OP_CPU_KERNEL( + cos_sim_grad, ops::CosSimGradKernel); diff --git a/paddle/operators/gather_op.cu b/paddle/operators/cos_sim_op.cu similarity index 72% rename from paddle/operators/gather_op.cu rename to paddle/operators/cos_sim_op.cu index 3f04a7b3f8..0cb8fd26de 100644 --- a/paddle/operators/gather_op.cu +++ b/paddle/operators/cos_sim_op.cu @@ -13,8 +13,10 @@ limitations under the License. */ #define EIGEN_USE_GPU -#include "paddle/operators/gather_op.h" +#include "paddle/operators/cos_sim_op.h" namespace ops = paddle::operators; -REGISTER_OP_GPU_KERNEL(gather, - ops::GatherOpKernel); +REGISTER_OP_GPU_KERNEL(cos_sim, + ops::CosSimKernel); +REGISTER_OP_GPU_KERNEL( + cos_sim_grad, ops::CosSimGradKernel); diff --git a/paddle/operators/cos_sim_op.h b/paddle/operators/cos_sim_op.h new file mode 100644 index 0000000000..9e2bcebe3b --- /dev/null +++ b/paddle/operators/cos_sim_op.h @@ -0,0 +1,107 @@ +/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve. + + Licensed under the Apache License, Version 2.0 (the "License"); + you may not use this file except in compliance with the License. + You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. */ + +#pragma once +#include "paddle/framework/eigen.h" +#include "paddle/framework/op_registry.h" + +namespace paddle { +namespace operators { + +using Tensor = framework::Tensor; +template +using EigenMatrix = framework::EigenMatrix; +template +using EigenVector = framework::EigenVector; + +template +class CosSimKernel : public framework::OpKernel { + public: + void Compute(const framework::ExecutionContext& context) const override { + auto* input_x = context.Input("X"); + auto* input_y = context.Input("Y"); + auto* output_z = context.Output("Out"); + auto* output_x_norm = context.Output("XNorm"); + auto* output_y_norm = context.Output("YNorm"); + + output_z->mutable_data(context.GetPlace()); + output_x_norm->mutable_data(context.GetPlace()); + output_y_norm->mutable_data(context.GetPlace()); + + auto dims = input_x->dims(); + int size = static_cast(framework::product(dims)); + auto new_dims = framework::make_ddim({dims[0], size / dims[0]}); + auto x = EigenMatrix::From(*input_x, new_dims); + auto y = EigenMatrix::From(*input_y, new_dims); + auto z = EigenVector::Flatten(*output_z); + auto x_norm = EigenVector::Flatten(*output_x_norm); + auto y_norm = EigenVector::Flatten(*output_y_norm); + + auto place = context.GetEigenDevice(); + auto xy = (x * y).sum(Eigen::array({{1}})); + x_norm.device(place) = x.square().sum(Eigen::array({{1}})).sqrt(); + y_norm.device(place) = y.square().sum(Eigen::array({{1}})).sqrt(); + z.device(place) = xy / x_norm / y_norm; + } +}; + +template +class CosSimGradKernel : public framework::OpKernel { + public: + void Compute(const framework::ExecutionContext& context) const override { + auto* input_x = context.Input("X"); + auto* input_y = context.Input("Y"); + auto* input_z = context.Input("Out"); + auto* input_x_norm = context.Input("XNorm"); + auto* input_y_norm = context.Input("YNorm"); + auto* output_grad_x = context.Output(framework::GradVarName("X")); + auto* output_grad_y = context.Output(framework::GradVarName("Y")); + auto* input_grad_z = context.Input(framework::GradVarName("Out")); + + auto dims = input_x->dims(); + int size = static_cast(framework::product(dims)); + auto new_dims = framework::make_ddim({dims[0], size / dims[0]}); + auto x = EigenMatrix::From(*input_x, new_dims); + auto y = EigenMatrix::From(*input_y, new_dims); + auto z = EigenMatrix::From(*input_z); + auto x_norm = EigenMatrix::From(*input_x_norm); + auto y_norm = EigenMatrix::From(*input_y_norm); + auto dz = EigenMatrix::From(*input_grad_z); + + Eigen::DSizes bcast(1, new_dims[1]); + auto z_bcast = z.broadcast(bcast); + auto dz_bcast = dz.broadcast(bcast); + auto place = context.GetEigenDevice(); + auto x_snorm_bcast = x_norm.square().eval().broadcast(bcast); + auto y_snorm_bcast = y_norm.square().eval().broadcast(bcast); + auto norm_prod_bcast = (x_norm * y_norm).eval().broadcast(bcast); + if (output_grad_x) { + output_grad_x->mutable_data(context.GetPlace()); + auto dx = EigenMatrix::From(*output_grad_x, new_dims); + dx.device(place) = + dz_bcast * (y / norm_prod_bcast - z_bcast * x / x_snorm_bcast); + } + if (output_grad_y) { + output_grad_y->mutable_data(context.GetPlace()); + auto dy = EigenMatrix::From(*output_grad_y, new_dims); + dy.device(place) = + dz_bcast * (x / norm_prod_bcast - z_bcast * y / y_snorm_bcast); + } + } +}; + +} // namespace operators +} // namespace paddle diff --git a/paddle/operators/dropout_op.cc b/paddle/operators/dropout_op.cc index dc416f6194..cd67beed94 100644 --- a/paddle/operators/dropout_op.cc +++ b/paddle/operators/dropout_op.cc @@ -25,7 +25,11 @@ class DropoutOp : public framework::OperatorWithKernel { protected: void InferShape(const framework::InferShapeContext &ctx) const override { + // validity check PADDLE_ENFORCE_NOT_NULL(ctx.InputVar("X"), "Input(X) must not be null."); + PADDLE_ENFORCE_GE(ctx.GetAttr("dropout_prob"), 0); + PADDLE_ENFORCE_LE(ctx.GetAttr("dropout_prob"), 1); + // resize auto dims = ctx.Input("X")->dims(); ctx.Output("Out")->Resize(dims); ctx.Output("Mask")->Resize(dims); @@ -37,13 +41,24 @@ class DropoutOpMaker : public framework::OpProtoAndCheckerMaker { DropoutOpMaker(framework::OpProto *proto, framework::OpAttrChecker *op_checker) : OpProtoAndCheckerMaker(proto, op_checker) { - AddAttr("dropout_prob", "Dropout probability.").SetDefault(.5f); + AddAttr("dropout_prob", + "Probability of randomly setting elements " + "to zero.") + .SetDefault(.5f); AddAttr("seed", "Dropout random seed.").SetDefault(0); AddInput("X", "The input of dropout op."); AddOutput("Out", "The output of dropout op."); - AddOutput("Mask", "The dropout mask.").AsIntermediate(); + AddOutput("Mask", "The random sampled dropout mask.").AsIntermediate(); - AddComment(R"DOC(Dropout Operator.)DOC"); + AddComment(R"DOC( +Dropout Operator. + +"Dropout" refers to randomly dropping out units in a nerual network. It is a +regularization technique for reducing overfitting by preventing neuron +co-adaption during training. The dropout operator randomly set (according to +the given dropout probability) the output of some units to zero, while others +being set to their inputs. +)DOC"); } }; @@ -53,11 +68,13 @@ class DropoutOpGrad : public framework::OperatorWithKernel { protected: void InferShape(const framework::InferShapeContext &ctx) const override { + // validity check PADDLE_ENFORCE_NOT_NULL(ctx.InputVar("X"), "Input(X) must not be null."); PADDLE_ENFORCE_NOT_NULL(ctx.InputVar("Mask"), "Mask must not be null."); PADDLE_ENFORCE_NOT_NULL(ctx.InputVar(framework::GradVarName("Out")), "Input(Out@GRAD) must not be null."); - + PADDLE_ENFORCE_GE(ctx.GetAttr("dropout_prob"), 0); + PADDLE_ENFORCE_LE(ctx.GetAttr("dropout_prob"), 1); auto x_dims = ctx.Input("X")->dims(); auto mask_dims = ctx.Input("Mask")->dims(); auto out_dims = ctx.Input(framework::GradVarName("Out"))->dims(); @@ -65,7 +82,7 @@ class DropoutOpGrad : public framework::OperatorWithKernel { "Dimensions of Input(X) and Out@Grad must be the same."); PADDLE_ENFORCE_EQ(x_dims, mask_dims, "Dimensions of Input(X) and Mask must be the same."); - + // resize auto *x_grad = ctx.Output(framework::GradVarName("X")); x_grad->Resize(x_dims); } diff --git a/paddle/operators/dropout_op.h b/paddle/operators/dropout_op.h index becf89aca3..0d78c80948 100644 --- a/paddle/operators/dropout_op.h +++ b/paddle/operators/dropout_op.h @@ -40,8 +40,8 @@ class CPUDropoutKernel : public framework::OpKernel { T* y_data = y->mutable_data(context.GetPlace()); const T* x_data = x->data(); - float dropout_prob = context.op_.GetAttr("dropout_prob"); - int seed = context.op_.GetAttr("seed"); + float dropout_prob = context.GetAttr("dropout_prob"); + int seed = context.GetAttr("seed"); std::minstd_rand engine; engine.seed(seed); @@ -53,26 +53,27 @@ class CPUDropoutKernel : public framework::OpKernel { y_data[i] = 0; } else { mask_data[i] = 1; - y_data[i] = (1 - dropout_prob) * x_data[i]; + y_data[i] = x_data[i]; } } + // TODO: add test time logits. } }; template struct MaskGenerator { - float dropout_prob_; - int seed_; + float dropout_prob; + int seed; __host__ __device__ MaskGenerator(float dropout_prob, int seed) - : dropout_prob_(dropout_prob), seed_(seed) {} + : dropout_prob(dropout_prob), seed(seed) {} __host__ __device__ T operator()(const unsigned int n) const { thrust::minstd_rand rng; - rng.seed(seed_); + rng.seed(seed); thrust::uniform_real_distribution dist(0, 1); rng.discard(n); - if (dist(rng) < dropout_prob_) { + if (dist(rng) < dropout_prob) { return static_cast(0); } else { return static_cast(1); @@ -92,8 +93,8 @@ class GPUDropoutKernel : public framework::OpKernel { auto* mask = context.Output("Mask"); y->mutable_data(context.GetPlace()); - float dropout_prob = context.op_.GetAttr("dropout_prob"); - int seed = context.op_.GetAttr("seed"); + float dropout_prob = context.GetAttr("dropout_prob"); + int seed = context.GetAttr("seed"); thrust::counting_iterator index_sequence_begin(0); int size = framework::product(mask->dims()); T* mask_data = mask->mutable_data(context.GetPlace()); @@ -108,7 +109,8 @@ class GPUDropoutKernel : public framework::OpKernel { auto M = EigenMatrix::From(*mask, new_dims); auto place = context.GetEigenDevice(); - Y.device(place) = X * M * (1 - dropout_prob); + Y.device(place) = X * M; + // TODO: add test time logits. } }; @@ -129,8 +131,8 @@ class DropoutGradKernel : public framework::OpKernel { auto dY = EigenMatrix::From(*grad_y, new_dims); auto place = context.GetEigenDevice(); - float dropout_prob = context.op_.GetAttr("dropout_prob"); - dX.device(place) = dY * M * (1 - dropout_prob); + dX.device(place) = dY * M; + // TODO: add test time logits. } }; diff --git a/paddle/operators/gaussian_random_op.cc b/paddle/operators/gaussian_random_op.cc index a85363ad81..8bb61275ba 100644 --- a/paddle/operators/gaussian_random_op.cc +++ b/paddle/operators/gaussian_random_op.cc @@ -19,21 +19,20 @@ template class CPUGaussianRandomKernel : public framework::OpKernel { public: void Compute(const framework::ExecutionContext& context) const override { - float mean = context.op_.GetAttr("mean"); - float std = context.op_.GetAttr("std"); + float mean = context.GetAttr("mean"); + float std = context.GetAttr("std"); auto* tensor = context.Output("Out"); T* data = tensor->mutable_data(context.GetPlace()); - unsigned int seed = - static_cast(context.op_.GetAttr("seed")); + unsigned int seed = static_cast(context.GetAttr("seed")); std::minstd_rand engine; if (seed == 0) { seed = std::random_device()(); } engine.seed(seed); std::normal_distribution dist(mean, std); - ssize_t size = framework::product(tensor->dims()); - for (ssize_t i = 0; i < size; ++i) { + int64_t size = framework::product(tensor->dims()); + for (int64_t i = 0; i < size; ++i) { data[i] = dist(engine); } } @@ -47,9 +46,14 @@ class GaussianRandomOp : public framework::OperatorWithKernel { void InferShape(const framework::InferShapeContext& context) const override { auto* tensor = context.Output("Out"); auto dims = GetAttr>("dims"); + std::vector temp; + temp.reserve(dims.size()); + for (auto dim : dims) { + temp.push_back(static_cast(dim)); + } PADDLE_ENFORCE(dims.size() > 0UL, "dims can be one int or array. dims must be set."); - tensor->Resize(framework::make_ddim(dims)); + tensor->Resize(framework::make_ddim(temp)); } }; diff --git a/paddle/operators/gaussian_random_op.cu b/paddle/operators/gaussian_random_op.cu index 018a4bfcb2..833a82bbf2 100644 --- a/paddle/operators/gaussian_random_op.cu +++ b/paddle/operators/gaussian_random_op.cu @@ -42,14 +42,13 @@ class GPUGaussianRandomKernel : public framework::OpKernel { void Compute(const framework::ExecutionContext& context) const override { auto* tensor = context.Output("Out"); T* data = tensor->mutable_data(context.GetPlace()); - unsigned int seed = - static_cast(context.op_.GetAttr("seed")); + unsigned int seed = static_cast(context.GetAttr("seed")); if (seed == 0) { std::random_device rd; seed = rd(); } - T mean = static_cast(context.op_.GetAttr("mean")); - T std = static_cast(context.op_.GetAttr("std")); + T mean = static_cast(context.GetAttr("mean")); + T std = static_cast(context.GetAttr("std")); thrust::counting_iterator index_sequence_begin(0); ssize_t N = framework::product(tensor->dims()); thrust::transform(index_sequence_begin, index_sequence_begin + N, diff --git a/paddle/operators/identity_op.cc b/paddle/operators/identity_op.cc new file mode 100644 index 0000000000..be956bf3b3 --- /dev/null +++ b/paddle/operators/identity_op.cc @@ -0,0 +1,54 @@ +/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve. + + Licensed under the Apache License, Version 2.0 (the "License"); + you may not use this file except in compliance with the License. + You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. */ + +#include "paddle/operators/net_op.h" +#include "paddle/operators/scale_op.h" + +namespace paddle { +namespace operators { + +// identity is a alias of scale op. This is also a example for creating a alias +// operator. +template +class IdentityOpMaker : public framework::OpProtoAndCheckerMaker { + public: + IdentityOpMaker(framework::OpProto *proto, + framework::OpAttrChecker *op_checker) + : OpProtoAndCheckerMaker(proto, op_checker) { + AddInput("X", "input tensor of identity op"); + AddOutput("Out", "output tensor of identity op"); + AddComment("identity operator. Just a alias of scale op which scale = 1.0"); + } +}; + +template +class IdentityOp : public NetOp { + public: + IdentityOp(const std::string &type, const framework::VariableNameMap &inputs, + const framework::VariableNameMap &outputs, + const framework::AttributeMap &attrs) + : NetOp(type, inputs, outputs, attrs) { + AppendOp(framework::OpRegistry::CreateOp( + "scale", {{"X", {Input("X")}}}, {{"Out", {Output("Out")}}}, + {{"scale", static_cast(1)}})); + } +}; + +} // namespace operators +} // namespace paddle + +namespace ops = paddle::operators; + +REGISTER_OP_WITHOUT_GRADIENT(identity, ops::IdentityOp, + ops::IdentityOpMaker); diff --git a/paddle/operators/lookup_table_op.h b/paddle/operators/lookup_table_op.h index 4da8079b91..877b36cef4 100644 --- a/paddle/operators/lookup_table_op.h +++ b/paddle/operators/lookup_table_op.h @@ -30,12 +30,12 @@ class LookupTableKernel : public framework::OpKernel { auto ids_t = context.Input("Ids"); // int tensor auto output_t = context.Output("Out"); // float tensor - size_t N = table_t->dims()[0]; - size_t D = table_t->dims()[1]; + int N = table_t->dims()[0]; + int D = table_t->dims()[1]; auto ids = ids_t->data(); auto table = table_t->data(); auto output = output_t->mutable_data(context.GetPlace()); - for (size_t i = 0; i < product(ids_t->dims()); ++i) { + for (ssize_t i = 0; i < product(ids_t->dims()); ++i) { PADDLE_ENFORCE_LT(ids[i], N); PADDLE_ENFORCE_GE(ids[i], 0); memcpy(output + i * D, table + ids[i] * D, D * sizeof(T)); @@ -51,8 +51,8 @@ class LookupTableGradKernel : public framework::OpKernel { auto d_output_t = context.Input(framework::GradVarName("Out")); auto d_table_t = context.Output(framework::GradVarName("W")); - size_t N = d_table_t->dims()[0]; - size_t D = d_table_t->dims()[1]; + int N = d_table_t->dims()[0]; + int D = d_table_t->dims()[1]; auto ids = ids_t->data(); const T* d_output = d_output_t->data(); T* d_table = d_table_t->mutable_data(context.GetPlace()); @@ -61,10 +61,10 @@ class LookupTableGradKernel : public framework::OpKernel { t.device(context.GetEigenDevice()) = t.constant(static_cast(0)); - for (size_t i = 0; i < product(ids_t->dims()); ++i) { + for (ssize_t i = 0; i < product(ids_t->dims()); ++i) { PADDLE_ENFORCE_LT(ids[i], N); PADDLE_ENFORCE_GE(ids[i], 0); - for (size_t j = 0; j < D; ++j) { + for (int j = 0; j < D; ++j) { d_table[ids[i] * D + j] += d_output[i * D + j]; } } diff --git a/paddle/operators/minus_op.cc b/paddle/operators/minus_op.cc index 1eee9644ba..069fb5e1ab 100644 --- a/paddle/operators/minus_op.cc +++ b/paddle/operators/minus_op.cc @@ -79,7 +79,7 @@ class MinusGradOp : public NetOp { } // namespace paddle USE_OP(scale); -USE_OP_ITSELF(identity); +USE_NO_KERNEL_OP(identity); namespace ops = paddle::operators; REGISTER_OP(minus, ops::MinusOp, ops::MinusOpMaker, minus_grad, ops::MinusGradOp); diff --git a/paddle/operators/mul_op.cc b/paddle/operators/mul_op.cc index 173cc3850c..28a47cdff2 100644 --- a/paddle/operators/mul_op.cc +++ b/paddle/operators/mul_op.cc @@ -29,10 +29,10 @@ class MulOp : public framework::OperatorWithKernel { auto dim1 = ctx.Input("Y")->dims(); PADDLE_ENFORCE_EQ(dim0.size(), 2, "input X(%s) should be a tensor with 2 dims, a matrix", - ctx.op_.Input("X")); + ctx.op().Input("X")); PADDLE_ENFORCE_EQ(dim1.size(), 2, "input Y(%s) should be a tensor with 2 dims, a matrix", - ctx.op_.Input("Y")); + ctx.op().Input("Y")); PADDLE_ENFORCE_EQ( dim0[1], dim1[0], "First matrix's width must be equal with second matrix's height."); @@ -75,8 +75,8 @@ class MulOpGrad : public framework::OperatorWithKernel { PADDLE_ENFORCE(y_dims[1] == out_dims[1], "Out@GRAD M X N must equal to Y dims 1, N "); - x_grad->Resize(x_dims); - y_grad->Resize(y_dims); + if (x_grad) x_grad->Resize(x_dims); + if (y_grad) y_grad->Resize(y_dims); } }; diff --git a/paddle/operators/mul_op.h b/paddle/operators/mul_op.h index 8facc02814..05a79e13b3 100644 --- a/paddle/operators/mul_op.h +++ b/paddle/operators/mul_op.h @@ -31,13 +31,13 @@ template class MulKernel : public framework::OpKernel { public: void Compute(const framework::ExecutionContext& context) const override { - auto* X = context.Input("X"); - auto* Y = context.Input("Y"); - auto* Z = context.Output("Out"); - Z->mutable_data(context.GetPlace()); + auto* x = context.Input("X"); + auto* y = context.Input("Y"); + auto* z = context.Output("Out"); + z->mutable_data(context.GetPlace()); auto* device_context = const_cast(context.device_context_); - math::matmul(*X, false, *Y, false, 1, Z, 0, device_context); + math::matmul(*x, false, *y, false, 1, z, 0, device_context); } }; @@ -45,20 +45,24 @@ template class MulGradKernel : public framework::OpKernel { public: void Compute(const framework::ExecutionContext& ctx) const override { - auto* X = ctx.Input("X"); - auto* Y = ctx.Input("Y"); - auto* dOut = ctx.Input(framework::GradVarName("Out")); + auto* x = ctx.Input("X"); + auto* y = ctx.Input("Y"); + auto* dout = ctx.Input(framework::GradVarName("Out")); - auto* dX = ctx.Output(framework::GradVarName("X")); - auto* dY = ctx.Output(framework::GradVarName("Y")); - dX->mutable_data(ctx.GetPlace()); - dY->mutable_data(ctx.GetPlace()); + auto* dx = ctx.Output(framework::GradVarName("X")); + auto* dy = ctx.Output(framework::GradVarName("Y")); auto* device_context = const_cast(ctx.device_context_); - // dX = dOut * Y'. dX: M x K, dOut : M x N, Y : K x N - math::matmul(*dOut, false, *Y, true, 1, dX, 0, device_context); - // dY = X' * dOut. dY: K x N, dOut : M x N, X : M x K - math::matmul(*X, true, *dOut, false, 1, dY, 0, device_context); + if (dx) { + dx->mutable_data(ctx.GetPlace()); + // dx = dout * y'. dx: M x K, dout : M x N, y : K x N + math::matmul(*dout, false, *y, true, 1, dx, 0, device_context); + } + if (dy) { + dy->mutable_data(ctx.GetPlace()); + // dy = x' * dout. dy K x N, dout : M x N, x : M x K + math::matmul(*x, true, *dout, false, 1, dy, 0, device_context); + } } }; diff --git a/paddle/operators/recurrent_op.cc b/paddle/operators/recurrent_op.cc index 16bd249cb3..e826703c60 100644 --- a/paddle/operators/recurrent_op.cc +++ b/paddle/operators/recurrent_op.cc @@ -235,5 +235,5 @@ RecurrentGradientOp::RecurrentGradientOp( } // namespace paddle REGISTER_OP_WITHOUT_GRADIENT( - recurrent_op, paddle::operators::RecurrentOp, + recurrent, paddle::operators::RecurrentOp, paddle::operators::RecurrentAlgorithmProtoAndCheckerMaker); diff --git a/paddle/operators/rnn/recurrent_op_utils.cc b/paddle/operators/rnn/recurrent_op_utils.cc index a9b65c30f2..69e723b401 100644 --- a/paddle/operators/rnn/recurrent_op_utils.cc +++ b/paddle/operators/rnn/recurrent_op_utils.cc @@ -61,7 +61,7 @@ void ConcatOutputs(const std::vector& step_scopes, PADDLE_ENFORCE(step_scope_var != nullptr, "%s not in scope", outlinks[i].internal); f::DDim step_dims = step_scope_var->template GetMutable()->dims(); - std::vector dims_vec = vectorize(step_dims); + std::vector dims_vec = vectorize(step_dims); dims_vec.insert(dims_vec.begin(), seq_len); output->Resize(f::make_ddim(dims_vec)); } else { diff --git a/paddle/operators/rowwise_add_op.cc b/paddle/operators/rowwise_add_op.cc index 6825dce332..30b4b40431 100644 --- a/paddle/operators/rowwise_add_op.cc +++ b/paddle/operators/rowwise_add_op.cc @@ -64,8 +64,10 @@ class RowwiseAddGradOp : public framework::OperatorWithKernel { auto dims0 = ctx.Input("X")->dims(); auto dims1 = ctx.Input("b")->dims(); PADDLE_ENFORCE_EQ(1, dims1.size(), "b dims should be 1") - ctx.Output(framework::GradVarName("X"))->Resize(dims0); - ctx.Output(framework::GradVarName("b"))->Resize(dims1); + auto *dx = ctx.Output(framework::GradVarName("X")); + auto *db = ctx.Output(framework::GradVarName("b")); + if (dx) dx->Resize(dims0); + if (db) db->Resize(dims1); } }; diff --git a/paddle/operators/rowwise_add_op.h b/paddle/operators/rowwise_add_op.h index 1cbd8bb31a..4e926d9f29 100644 --- a/paddle/operators/rowwise_add_op.h +++ b/paddle/operators/rowwise_add_op.h @@ -51,20 +51,24 @@ template class RowwiseAddGradKernel : public framework::OpKernel { public: void Compute(const framework::ExecutionContext& context) const override { - auto* dOut = context.Input(framework::GradVarName("Out")); - auto* dX = context.Output(framework::GradVarName("X")); + auto* dout = context.Input(framework::GradVarName("Out")); + auto* dx = context.Output(framework::GradVarName("X")); auto* db = context.Output(framework::GradVarName("b")); - dX->mutable_data(context.GetPlace()); - db->mutable_data(context.GetPlace()); - auto OutGrad = EigenMatrix::From(*dOut); + auto out_grad = EigenMatrix::From(*dout); auto place = context.GetEigenDevice(); - EigenMatrix::From(*dX).device(place) = OutGrad; + if (dx) { + dx->mutable_data(context.GetPlace()); + EigenMatrix::From(*dx).device(place) = out_grad; + } - // https://eigen.tuxfamily.org/dox/unsupported/TensorBase_8h_source.html - // colwise add - Eigen::array dims{{0}}; /* dimension to reduce */ - EigenVector::Flatten(*db).device(place) = OutGrad.sum(dims); + if (db) { + db->mutable_data(context.GetPlace()); + // https://eigen.tuxfamily.org/dox/unsupported/TensorBase_8h_source.html + // colwise add + Eigen::array dims{{0}}; /* dimension to reduce */ + EigenVector::Flatten(*db).device(place) = out_grad.sum(dims); + } } }; } // namespace operators diff --git a/paddle/operators/scale_op.cc b/paddle/operators/scale_op.cc index 8e96a74c94..005152ed71 100644 --- a/paddle/operators/scale_op.cc +++ b/paddle/operators/scale_op.cc @@ -48,7 +48,7 @@ The equation is: Out = scale*X } }; -// Identity Op's gradient is identity op, too. +// Scale Op's gradient is scale op, too. // Grad(Out=scale(X)) => Grad(X) = scale(Grad(Out)) template class ScaleGradOp : public NetOp { @@ -65,33 +65,6 @@ class ScaleGradOp : public NetOp { } }; -// identity is a alias of scale op. This is also a example for creating a alias -// operator. -template -class IdentityOpMaker : public framework::OpProtoAndCheckerMaker { - public: - IdentityOpMaker(framework::OpProto *proto, - framework::OpAttrChecker *op_checker) - : OpProtoAndCheckerMaker(proto, op_checker) { - AddInput("X", "input tensor of identity op"); - AddOutput("Out", "output tensor of identity op"); - AddComment("identity operator. Just a alias of scale op which scale = 1.0"); - } -}; - -template -class IdentityOp : public NetOp { - public: - IdentityOp(const std::string &type, const framework::VariableNameMap &inputs, - const framework::VariableNameMap &outputs, - const framework::AttributeMap &attrs) - : NetOp(type, inputs, outputs, attrs) { - AppendOp(framework::OpRegistry::CreateOp( - "scale", {{"X", {Input("X")}}}, {{"Out", {Output("Out")}}}, - {{"scale", static_cast(1)}})); - } -}; - } // namespace operators } // namespace paddle @@ -101,5 +74,3 @@ REGISTER_OP(scale, ops::ScaleOp, ops::ScaleOpMaker, scale_grad, ops::ScaleGradOp); REGISTER_OP_CPU_KERNEL(scale, ops::ScaleKernel); -REGISTER_OP_WITHOUT_GRADIENT(identity, ops::IdentityOp, - ops::IdentityOpMaker); diff --git a/paddle/operators/scale_op.h b/paddle/operators/scale_op.h index aea64f1b04..65fb77eefa 100644 --- a/paddle/operators/scale_op.h +++ b/paddle/operators/scale_op.h @@ -27,7 +27,7 @@ class ScaleKernel : public framework::OpKernel { auto* in = context.Input("X"); tensor->mutable_data(in->place()); - auto scale = static_cast(context.op_.GetAttr("scale")); + auto scale = static_cast(context.GetAttr("scale")); auto eigen_out = framework::EigenVector::Flatten(*tensor); auto eigen_in = framework::EigenVector::Flatten(*in); diff --git a/paddle/operators/sgd_op.h b/paddle/operators/sgd_op.h index a0b5000ffb..8422b622ee 100644 --- a/paddle/operators/sgd_op.h +++ b/paddle/operators/sgd_op.h @@ -31,7 +31,7 @@ class SGDOpKernel : public framework::OpKernel { auto param = ctx.Input("param"); auto grad = ctx.Input("grad"); auto param_out = ctx.Output("param_out"); - float lr = ctx.op_.GetAttr("learning_rate"); + float lr = ctx.GetAttr("learning_rate"); param_out->mutable_data(ctx.GetPlace()); diff --git a/paddle/operators/softmax_op.cc b/paddle/operators/softmax_op.cc index 40c51a64c4..7d062ad67c 100644 --- a/paddle/operators/softmax_op.cc +++ b/paddle/operators/softmax_op.cc @@ -24,7 +24,7 @@ class SoftmaxOp : public framework::OperatorWithKernel { protected: void InferShape(const framework::InferShapeContext &ctx) const override { PADDLE_ENFORCE(ctx.Input("X")->dims().size() == 2UL, - "The input of softmax op must be matrix"); + "The input of softmax op must be a matrix."); ctx.Output("Y")->Resize(ctx.Input("X")->dims()); } }; @@ -34,9 +34,27 @@ class SoftmaxOpMaker : public framework::OpProtoAndCheckerMaker { SoftmaxOpMaker(framework::OpProto *proto, framework::OpAttrChecker *op_checker) : OpProtoAndCheckerMaker(proto, op_checker) { - AddInput("X", "input of softmax"); - AddOutput("Y", "output of softmax"); - AddComment("Softmax Op"); + AddInput("X", + "The input tensor of softmax. " + "2-D with shape [batch_size, input_feature_dimensions]."); + AddOutput("Y", "The normalized values with the same shape as X."); + AddComment(R"DOC( +The input of softmax operator is a 2-D tensor with shape N x K (N is the +batch_size, K is the dimension of input feature). The output tensor has the +same shape as the input tensor. + +For each row of the input tensor, the softmax operator squashes the +K-dimensional vector of arbitrary real values to a K-dimensional vector of real +values in the range [0, 1] that add up to 1. Specifically, it computes the +exponential of the given dimension and the sum of exponential values of all +the other dimensions in the K-dimensional vector input. Then the ratio of the +exponential of the given dimension and the sum of exponential values of all +the other dimensions is the output of the softmax operator. + +For each row `i` and each column `j` in X, we have: + Y[i, j] = exp(X[i, j]) / sum_j(exp(X[i, j])) + +)DOC"); } }; diff --git a/paddle/operators/uniform_random_op.cc b/paddle/operators/uniform_random_op.cc index 29491137e6..40cef8942a 100644 --- a/paddle/operators/uniform_random_op.cc +++ b/paddle/operators/uniform_random_op.cc @@ -26,18 +26,17 @@ class CPUUniformRandomKernel : public framework::OpKernel { void Compute(const framework::ExecutionContext& context) const override { auto* tensor = context.Output("Out"); T* data = tensor->mutable_data(context.GetPlace()); - unsigned int seed = - static_cast(context.op_.GetAttr("seed")); + unsigned int seed = static_cast(context.GetAttr("seed")); std::minstd_rand engine; if (seed == 0) { seed = std::random_device()(); } engine.seed(seed); std::uniform_real_distribution dist( - static_cast(context.op_.GetAttr("min")), - static_cast(context.op_.GetAttr("max"))); - ssize_t size = framework::product(tensor->dims()); - for (ssize_t i = 0; i < size; ++i) { + static_cast(context.GetAttr("min")), + static_cast(context.GetAttr("max"))); + int64_t size = framework::product(tensor->dims()); + for (int64_t i = 0; i < size; ++i) { data[i] = dist(engine); } } @@ -53,7 +52,12 @@ class UniformRandomOp : public framework::OperatorWithKernel { "uniform_random's min must less then max"); auto* tensor = ctx.Output("Out"); auto dims = GetAttr>("dims"); - tensor->Resize(framework::make_ddim(dims)); + std::vector temp; + temp.reserve(dims.size()); + for (auto dim : dims) { + temp.push_back(static_cast(dim)); + } + tensor->Resize(framework::make_ddim(temp)); } }; diff --git a/paddle/operators/uniform_random_op.cu b/paddle/operators/uniform_random_op.cu index 1d6709934c..df993c0779 100644 --- a/paddle/operators/uniform_random_op.cu +++ b/paddle/operators/uniform_random_op.cu @@ -45,14 +45,13 @@ class GPUUniformRandomKernel : public framework::OpKernel { void Compute(const framework::ExecutionContext& context) const override { auto* tensor = context.Output("Out"); T* data = tensor->mutable_data(context.GetPlace()); - unsigned int seed = - static_cast(context.op_.GetAttr("seed")); + unsigned int seed = static_cast(context.GetAttr("seed")); if (seed == 0) { std::random_device rd; seed = rd(); } - T min = static_cast(context.op_.GetAttr("min")); - T max = static_cast(context.op_.GetAttr("max")); + T min = static_cast(context.GetAttr("min")); + T max = static_cast(context.GetAttr("max")); thrust::counting_iterator index_sequence_begin(0); ssize_t N = framework::product(tensor->dims()); thrust::transform(index_sequence_begin, index_sequence_begin + N, diff --git a/paddle/platform/CMakeLists.txt b/paddle/platform/CMakeLists.txt index 120eb1e4af..17bdac8749 100644 --- a/paddle/platform/CMakeLists.txt +++ b/paddle/platform/CMakeLists.txt @@ -22,3 +22,5 @@ ENDIF() cc_library(device_context SRCS device_context.cc DEPS memory buddy_allocator system_allocator memory_block meta_data meta_cache place eigen3 ${GPU_CTX_DEPS}) nv_test(device_context_test SRCS device_context_test.cc DEPS device_context gpu_info) + +nv_test(cudnn_helper_test SRCS cudnn_helper_test.cc DEPS dynload_cuda) diff --git a/paddle/platform/cudnn_helper.h b/paddle/platform/cudnn_helper.h new file mode 100644 index 0000000000..24ddf3441c --- /dev/null +++ b/paddle/platform/cudnn_helper.h @@ -0,0 +1,200 @@ +/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve. + +Licensed under the Apache License, Version 2.0 (the "License"); +you may not use this file except in compliance with the License. +You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. */ + +#pragma once + +#include "paddle/platform/dynload/cudnn.h" +#include "paddle/platform/enforce.h" +#include "paddle/platform/macros.h" + +namespace paddle { +namespace platform { + +enum class DataLayout { + kNHWC, + kNCHW, + kNCHW_VECT_C, +}; + +enum class PoolingMode { + kMaximum, + kAverage, +}; + +template +class CudnnDataType; + +template <> +class CudnnDataType { + public: + static const cudnnDataType_t type = CUDNN_DATA_FLOAT; +}; + +template <> +class CudnnDataType { + public: + static const cudnnDataType_t type = CUDNN_DATA_DOUBLE; +}; + +inline cudnnTensorFormat_t GetCudnnTensorFormat(const DataLayout& order) { + switch (order) { + case DataLayout::kNHWC: + return CUDNN_TENSOR_NHWC; + case DataLayout::kNCHW: + return CUDNN_TENSOR_NCHW; + default: + PADDLE_THROW("Unknown cudnn equivalent for order"); + } + return CUDNN_TENSOR_NCHW; +} + +class ScopedTensorDescriptor { + public: + ScopedTensorDescriptor() { + PADDLE_ENFORCE(dynload::cudnnCreateTensorDescriptor(&desc_)); + } + ~ScopedTensorDescriptor() { + PADDLE_ENFORCE(dynload::cudnnDestroyTensorDescriptor(desc_)); + } + + inline cudnnTensorDescriptor_t descriptor(const cudnnTensorFormat_t format, + const cudnnDataType_t type, + const std::vector& dims) { + // the format is not used now, but it maybe useful feature + std::vector strides(dims.size()); + strides[dims.size() - 1] = 1; + for (int i = dims.size() - 2; i >= 0; i--) { + strides[i] = dims[i + 1] * strides[i + 1]; + } + PADDLE_ENFORCE(dynload::cudnnSetTensorNdDescriptor( + desc_, type, dims.size(), dims.data(), strides.data())); + return desc_; + } + + template + inline cudnnTensorDescriptor_t descriptor(const DataLayout& order, + const std::vector& dims) { + return descriptor(GetCudnnTensorFormat(order), CudnnDataType::type, + dims); + } + + private: + cudnnTensorDescriptor_t desc_; + DISABLE_COPY_AND_ASSIGN(ScopedTensorDescriptor); +}; + +class ScopedFilterDescriptor { + public: + ScopedFilterDescriptor() { + PADDLE_ENFORCE(dynload::cudnnCreateFilterDescriptor(&desc_)); + } + ~ScopedFilterDescriptor() { + PADDLE_ENFORCE(dynload::cudnnDestroyFilterDescriptor(desc_)); + } + + inline cudnnFilterDescriptor_t descriptor(const cudnnTensorFormat_t format, + const cudnnDataType_t type, + const std::vector& kernel) { + // filter layout: output input spatial_dim_y spatial_dim_x + PADDLE_ENFORCE(dynload::cudnnSetFilterNdDescriptor( + desc_, type, format, kernel.size(), kernel.data())); + return desc_; + } + + template + inline cudnnFilterDescriptor_t descriptor(const DataLayout& order, + const std::vector& kernel) { + return descriptor(GetCudnnTensorFormat(order), CudnnDataType::type, + kernel); + } + + private: + cudnnFilterDescriptor_t desc_; + DISABLE_COPY_AND_ASSIGN(ScopedFilterDescriptor); +}; + +class ScopedConvolutionDescriptor { + public: + ScopedConvolutionDescriptor() { + PADDLE_ENFORCE(dynload::cudnnCreateConvolutionDescriptor(&desc_)); + } + ~ScopedConvolutionDescriptor() { + PADDLE_ENFORCE(dynload::cudnnDestroyConvolutionDescriptor(desc_)); + } + + inline cudnnConvolutionDescriptor_t descriptor( + cudnnDataType_t type, const std::vector& pads, + const std::vector& strides, const std::vector& dilations) { + PADDLE_ENFORCE_EQ(pads.size(), strides.size()); + PADDLE_ENFORCE_EQ(pads.size(), dilations.size()); + +#if CUDNN_VERSION < 6000 + // cudnn v5 does not support dilation conv, the argument is called upscale + // instead of dilations and it is must be one. + for (size_t i = 0; i < dilations.size(); ++i) { + PADDLE_ENFORCE_EQ( + dilations[i], 1, + "Dilations conv is not supported in this cuDNN version"); + } +#endif + + PADDLE_ENFORCE(dynload::cudnnSetConvolutionNdDescriptor( + desc_, pads.size(), pads.data(), strides.data(), dilations.data(), + CUDNN_CROSS_CORRELATION, type)); + return desc_; + } + + template + inline cudnnConvolutionDescriptor_t descriptor( + const std::vector& pads, const std::vector& strides, + const std::vector& dilations) { + return descriptor(CudnnDataType::type, pads, strides, dilations); + } + + private: + cudnnConvolutionDescriptor_t desc_; + DISABLE_COPY_AND_ASSIGN(ScopedConvolutionDescriptor); +}; + +class ScopedPoolingDescriptor { + public: + ScopedPoolingDescriptor() { + PADDLE_ENFORCE(dynload::cudnnCreatePoolingDescriptor(&desc_)); + } + ~ScopedPoolingDescriptor() { + PADDLE_ENFORCE(dynload::cudnnDestroyPoolingDescriptor(desc_)); + } + + inline cudnnPoolingDescriptor_t descriptor(const PoolingMode& mode, + const std::vector& kernel, + const std::vector& pads, + const std::vector& strides) { + PADDLE_ENFORCE_EQ(kernel.size(), pads.size()); + PADDLE_ENFORCE_EQ(kernel.size(), strides.size()); + PADDLE_ENFORCE(dynload::cudnnSetPoolingNdDescriptor( + desc_, (mode == PoolingMode::kMaximum + ? CUDNN_POOLING_MAX + : CUDNN_POOLING_AVERAGE_COUNT_EXCLUDE_PADDING), + CUDNN_PROPAGATE_NAN, // Always propagate nans. + kernel.size(), kernel.data(), pads.data(), strides.data())); + return desc_; + } + + private: + cudnnPoolingDescriptor_t desc_; + DISABLE_COPY_AND_ASSIGN(ScopedPoolingDescriptor); +}; + +} // namespace platform +} // namespace paddle diff --git a/paddle/platform/cudnn_helper_test.cc b/paddle/platform/cudnn_helper_test.cc new file mode 100644 index 0000000000..6bd85ae1ca --- /dev/null +++ b/paddle/platform/cudnn_helper_test.cc @@ -0,0 +1,120 @@ +/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve. + +Licensed under the Apache License, Version 2.0 (the "License"); +you may not use this file except in compliance with the License. +You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. */ + +#include "paddle/platform/cudnn_helper.h" +#include + +TEST(CudnnHelper, ScopedTensorDescriptor) { + using paddle::platform::ScopedTensorDescriptor; + using paddle::platform::DataLayout; + + ScopedTensorDescriptor tensor_desc; + std::vector shape = {2, 4, 6, 6}; + auto desc = tensor_desc.descriptor(DataLayout::kNCHW, shape); + + cudnnDataType_t type; + int nd; + std::vector dims(4); + std::vector strides(4); + paddle::platform::dynload::cudnnGetTensorNdDescriptor( + desc, 4, &type, &nd, dims.data(), strides.data()); + + EXPECT_EQ(nd, 4); + for (size_t i = 0; i < dims.size(); ++i) { + EXPECT_EQ(dims[i], shape[i]); + } + EXPECT_EQ(strides[3], 1); + EXPECT_EQ(strides[2], 6); + EXPECT_EQ(strides[1], 36); + EXPECT_EQ(strides[0], 144); +} + +TEST(CudnnHelper, ScopedFilterDescriptor) { + using paddle::platform::ScopedFilterDescriptor; + using paddle::platform::DataLayout; + + ScopedFilterDescriptor filter_desc; + std::vector shape = {2, 3, 3}; + auto desc = filter_desc.descriptor(DataLayout::kNCHW, shape); + + cudnnDataType_t type; + int nd; + cudnnTensorFormat_t format; + std::vector kernel(3); + paddle::platform::dynload::cudnnGetFilterNdDescriptor(desc, 3, &type, &format, + &nd, kernel.data()); + + EXPECT_EQ(GetCudnnTensorFormat(DataLayout::kNCHW), format); + EXPECT_EQ(nd, 3); + for (size_t i = 0; i < shape.size(); ++i) { + EXPECT_EQ(kernel[i], shape[i]); + } +} + +TEST(CudnnHelper, ScopedConvolutionDescriptor) { + using paddle::platform::ScopedConvolutionDescriptor; + + ScopedConvolutionDescriptor conv_desc; + std::vector src_pads = {2, 2, 2}; + std::vector src_strides = {1, 1, 1}; + std::vector src_dilations = {1, 1, 1}; + auto desc = conv_desc.descriptor(src_pads, src_strides, src_dilations); + + cudnnDataType_t type; + cudnnConvolutionMode_t mode; + int nd; + std::vector pads(3); + std::vector strides(3); + std::vector dilations(3); + paddle::platform::dynload::cudnnGetConvolutionNdDescriptor( + desc, 3, &nd, pads.data(), strides.data(), dilations.data(), &mode, + &type); + + EXPECT_EQ(nd, 3); + for (size_t i = 0; i < src_pads.size(); ++i) { + EXPECT_EQ(pads[i], src_pads[i]); + EXPECT_EQ(strides[i], src_strides[i]); + EXPECT_EQ(dilations[i], src_dilations[i]); + } + EXPECT_EQ(mode, CUDNN_CROSS_CORRELATION); +} + +TEST(CudnnHelper, ScopedPoolingDescriptor) { + using paddle::platform::ScopedPoolingDescriptor; + using paddle::platform::PoolingMode; + + ScopedPoolingDescriptor pool_desc; + std::vector src_kernel = {2, 2, 5}; + std::vector src_pads = {1, 1, 2}; + std::vector src_strides = {2, 2, 3}; + auto desc = pool_desc.descriptor(PoolingMode::kMaximum, src_kernel, src_pads, + src_strides); + + cudnnPoolingMode_t mode; + cudnnNanPropagation_t nan_t = CUDNN_PROPAGATE_NAN; + int nd; + std::vector kernel(3); + std::vector pads(3); + std::vector strides(3); + paddle::platform::dynload::cudnnGetPoolingNdDescriptor( + desc, 3, &mode, &nan_t, &nd, kernel.data(), pads.data(), strides.data()); + + EXPECT_EQ(nd, 3); + for (size_t i = 0; i < src_pads.size(); ++i) { + EXPECT_EQ(kernel[i], src_kernel[i]); + EXPECT_EQ(pads[i], src_pads[i]); + EXPECT_EQ(strides[i], src_strides[i]); + } + EXPECT_EQ(mode, CUDNN_POOLING_MAX); +} diff --git a/paddle/platform/dynload/CMakeLists.txt b/paddle/platform/dynload/CMakeLists.txt index d205ead845..ceb66f84b6 100644 --- a/paddle/platform/dynload/CMakeLists.txt +++ b/paddle/platform/dynload/CMakeLists.txt @@ -1,2 +1,2 @@ cc_library(dynamic_loader SRCS dynamic_loader.cc DEPS glog gflags) -nv_library(dynload_cuda SRCS cublas.cc cudnn.cc curand.cc) +nv_library(dynload_cuda SRCS cublas.cc cudnn.cc curand.cc DEPS dynamic_loader) diff --git a/paddle/platform/dynload/cudnn.h b/paddle/platform/dynload/cudnn.h index ef0dd85b08..0120625b7c 100644 --- a/paddle/platform/dynload/cudnn.h +++ b/paddle/platform/dynload/cudnn.h @@ -62,19 +62,27 @@ extern void* cudnn_dso_handle; #define CUDNN_DNN_ROUTINE_EACH(__macro) \ __macro(cudnnSetTensor4dDescriptor); \ __macro(cudnnSetTensor4dDescriptorEx); \ + __macro(cudnnSetTensorNdDescriptor); \ + __macro(cudnnGetTensorNdDescriptor); \ __macro(cudnnGetConvolutionNdForwardOutputDim); \ __macro(cudnnGetConvolutionForwardAlgorithm); \ __macro(cudnnCreateTensorDescriptor); \ __macro(cudnnDestroyTensorDescriptor); \ __macro(cudnnCreateFilterDescriptor); \ __macro(cudnnSetFilter4dDescriptor); \ + __macro(cudnnSetFilterNdDescriptor); \ + __macro(cudnnGetFilterNdDescriptor); \ __macro(cudnnSetPooling2dDescriptor); \ + __macro(cudnnSetPoolingNdDescriptor); \ + __macro(cudnnGetPoolingNdDescriptor); \ __macro(cudnnDestroyFilterDescriptor); \ __macro(cudnnCreateConvolutionDescriptor); \ __macro(cudnnCreatePoolingDescriptor); \ __macro(cudnnDestroyPoolingDescriptor); \ __macro(cudnnSetConvolution2dDescriptor); \ __macro(cudnnDestroyConvolutionDescriptor); \ + __macro(cudnnSetConvolutionNdDescriptor); \ + __macro(cudnnGetConvolutionNdDescriptor); \ __macro(cudnnCreate); \ __macro(cudnnDestroy); \ __macro(cudnnSetStream); \ diff --git a/paddle/operators/scatter_op.cu b/paddle/platform/macros.h similarity index 66% rename from paddle/operators/scatter_op.cu rename to paddle/platform/macros.h index 6716b47883..4a04a38c0c 100644 --- a/paddle/operators/scatter_op.cu +++ b/paddle/platform/macros.h @@ -12,9 +12,12 @@ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. */ -#define EIGEN_USE_GPU -#include "paddle/operators/scatter_op.h" +#pragma once -namespace ops = paddle::operators; -REGISTER_OP_GPU_KERNEL(scatter, - ops::ScatterOpKernel); +// Disable the copy and assignment operator for a class. +#ifndef DISABLE_COPY_AND_ASSIGN +#define DISABLE_COPY_AND_ASSIGN(classname) \ + private: \ + classname(const classname&) = delete; \ + classname& operator=(const classname&) = delete +#endif diff --git a/paddle/pybind/pybind.cc b/paddle/pybind/pybind.cc index 42fce51024..997379701a 100644 --- a/paddle/pybind/pybind.cc +++ b/paddle/pybind/pybind.cc @@ -30,7 +30,7 @@ limitations under the License. */ namespace py = pybind11; -USE_OP(add_two); +USE_OP(add); USE_OP(onehot_cross_entropy); USE_OP(sgd); USE_OP(mul); @@ -39,14 +39,15 @@ USE_OP(sigmoid); USE_OP(softmax); USE_OP(rowwise_add); USE_OP(fill_zeros_like); -USE_OP_ITSELF(recurrent_op); +USE_NO_KERNEL_OP(recurrent); USE_OP(gaussian_random); USE_OP(uniform_random); USE_OP(lookup_table); USE_OP(scale); -USE_OP_ITSELF(identity); +USE_NO_KERNEL_OP(identity); USE_OP(minus); USE_OP(dropout); +USE_OP(cos_sim); USE_CPU_ONLY_OP(gather); USE_CPU_ONLY_OP(scatter); @@ -77,7 +78,7 @@ PYBIND11_PLUGIN(core) { .def("get_dims", [](const Tensor &self) { return vectorize(self.dims()); }) .def("set_dims", - [](Tensor &self, const std::vector &dim) { + [](Tensor &self, const std::vector &dim) { self.Resize(make_ddim(dim)); }) .def("alloc_float", diff --git a/paddle/pybind/tensor_py.h b/paddle/pybind/tensor_py.h index 39ba60b4dc..95171acf72 100644 --- a/paddle/pybind/tensor_py.h +++ b/paddle/pybind/tensor_py.h @@ -85,7 +85,7 @@ void PyCPUTensorSetFromArray( framework::Tensor &self, py::array_t array, paddle::platform::CPUPlace &place) { - std::vector dims; + std::vector dims; dims.reserve(array.ndim()); for (size_t i = 0; i < array.ndim(); ++i) { dims.push_back((int)array.shape()[i]); @@ -102,7 +102,7 @@ void PyCUDATensorSetFromArray( framework::Tensor &self, py::array_t array, paddle::platform::GPUPlace &place) { - std::vector dims; + std::vector dims; dims.reserve(array.ndim()); for (size_t i = 0; i < array.ndim(); ++i) { dims.push_back((int)array.shape()[i]); diff --git a/python/paddle/trainer/PyDataProvider2.py b/python/paddle/trainer/PyDataProvider2.py index 7e305e2cd9..248da4ae8d 100644 --- a/python/paddle/trainer/PyDataProvider2.py +++ b/python/paddle/trainer/PyDataProvider2.py @@ -27,6 +27,14 @@ class SequenceType(object): SEQUENCE = 1 SUB_SEQUENCE = 2 + @classmethod + def tostring(cls, value): + for k in cls.__dict__: + if not k.startswith('__'): + if getattr(cls, k) == value: + return cls.__name__ + '.' + k + return 'INVALID(' + str(value) + ')' + # TODO(yuyang18): Add string data type here. class DataType(object): @@ -35,6 +43,14 @@ class DataType(object): SparseValue = 2 Index = 3 + @classmethod + def tostring(cls, value): + for k in cls.__dict__: + if not k.startswith('__'): + if getattr(cls, k) == value: + return cls.__name__ + '.' + k + return 'INVALID(' + str(value) + ')' + class CacheType(object): NO_CACHE = 0 # No cache at all @@ -69,6 +85,26 @@ class InputType(object): self.seq_type = seq_type self.type = tp + def __repr__(self): + """ + Return a human readable representation like 'InputType(dim=25921, + seq_type=SequenceType.NO_SEQUENCE, type=DataType.Dense)' + """ + repr_str = type(self).__name__ + repr_str += '(' + serialize_func_map = { + 'dim': repr, + 'seq_type': SequenceType.tostring, + 'type': DataType.tostring + } + for idx, k in enumerate(self.__slots__): + if idx != 0: + repr_str += ', ' + repr_str += ( + k + '=' + serialize_func_map.get(k, repr)(getattr(self, k))) + repr_str += ')' + return repr_str + def dense_slot(dim, seq_type=SequenceType.NO_SEQUENCE): """ diff --git a/python/paddle/trainer_config_helpers/layers.py b/python/paddle/trainer_config_helpers/layers.py index 2bd274fad2..47ac601e67 100644 --- a/python/paddle/trainer_config_helpers/layers.py +++ b/python/paddle/trainer_config_helpers/layers.py @@ -53,7 +53,7 @@ __all__ = [ 'cos_sim', 'hsigmoid', 'conv_projection', - 'mse_cost', + 'square_error_cost', 'regression_cost', 'classification_cost', 'LayerOutput', @@ -4238,13 +4238,18 @@ def __cost_input__(input, label, weight=None): @wrap_name_default() @layer_support() -def mse_cost(input, label, weight=None, name=None, coeff=1.0, layer_attr=None): +def square_error_cost(input, + label, + weight=None, + name=None, + coeff=1.0, + layer_attr=None): """ - mean squared error cost: + sum of square error cost: .. math:: - \\frac{1}{N}\sum_{i=1}^N(t_i-y_i)^2 + cost = \\sum_{i=1}^N(t_i-y_i)^2 :param name: layer name. :type name: basestring @@ -4273,7 +4278,7 @@ def mse_cost(input, label, weight=None, name=None, coeff=1.0, layer_attr=None): return LayerOutput(name, LayerType.COST, parents=parents, size=1) -regression_cost = mse_cost +regression_cost = square_error_cost @wrap_name_default("cost") @@ -5798,9 +5803,9 @@ def huber_regression_cost(input, coeff=1.0, layer_attr=None): """ - In statistics, the Huber loss is a loss function used in robust regression, - that is less sensitive to outliers in data than the squared error loss. - Given a prediction f(x), a label y and :math:`\delta`, the loss function + In statistics, the Huber loss is a loss function used in robust regression, + that is less sensitive to outliers in data than the squared error loss. + Given a prediction f(x), a label y and :math:`\delta`, the loss function is defined as: .. math: @@ -5848,13 +5853,13 @@ def huber_classification_cost(input, coeff=1.0, layer_attr=None): """ - For classification purposes, a variant of the Huber loss called modified Huber - is sometimes used. Given a prediction f(x) (a real-valued classifier score) and - a true binary class label :math:`y\in \left \{-1, 1 \right \}`, the modified Huber + For classification purposes, a variant of the Huber loss called modified Huber + is sometimes used. Given a prediction f(x) (a real-valued classifier score) and + a true binary class label :math:`y\in \left \{-1, 1 \right \}`, the modified Huber loss is defined as: .. math: - loss = \max \left ( 0, 1-yf(x) \right )^2, yf(x)\geq 1 + loss = \max \left ( 0, 1-yf(x) \right )^2, yf(x)\geq 1 loss = -4yf(x), \text{otherwise} The example usage is: diff --git a/python/paddle/trainer_config_helpers/tests/configs/protostr/test_cost_layers_with_weight.protostr b/python/paddle/trainer_config_helpers/tests/configs/protostr/test_cost_layers_with_weight.protostr index 96fb1d4ebd..cec8a73db6 100644 --- a/python/paddle/trainer_config_helpers/tests/configs/protostr/test_cost_layers_with_weight.protostr +++ b/python/paddle/trainer_config_helpers/tests/configs/protostr/test_cost_layers_with_weight.protostr @@ -45,7 +45,7 @@ layers { coeff: 1.0 } layers { - name: "__mse_cost_0__" + name: "__square_error_cost_0__" type: "square_error" size: 1 active_type: "" @@ -130,7 +130,7 @@ input_layer_names: "label" input_layer_names: "weight" input_layer_names: "multi_class_label" output_layer_names: "__cost_0__" -output_layer_names: "__mse_cost_0__" +output_layer_names: "__square_error_cost_0__" output_layer_names: "__nce_layer_0__" evaluators { name: "classification_error_evaluator" @@ -146,7 +146,7 @@ sub_models { layer_names: "weight" layer_names: "__fc_layer_0__" layer_names: "__cost_0__" - layer_names: "__mse_cost_0__" + layer_names: "__square_error_cost_0__" layer_names: "multi_class_label" layer_names: "__nce_layer_0__" input_layer_names: "input" @@ -154,7 +154,7 @@ sub_models { input_layer_names: "weight" input_layer_names: "multi_class_label" output_layer_names: "__cost_0__" - output_layer_names: "__mse_cost_0__" + output_layer_names: "__square_error_cost_0__" output_layer_names: "__nce_layer_0__" evaluator_names: "classification_error_evaluator" is_recurrent_layer_group: false diff --git a/python/paddle/trainer_config_helpers/tests/configs/protostr/test_seq_select_layers.protostr b/python/paddle/trainer_config_helpers/tests/configs/protostr/test_sub_nested_seq_select_layer.protostr similarity index 100% rename from python/paddle/trainer_config_helpers/tests/configs/protostr/test_seq_select_layers.protostr rename to python/paddle/trainer_config_helpers/tests/configs/protostr/test_sub_nested_seq_select_layer.protostr diff --git a/python/paddle/trainer_config_helpers/tests/configs/test_cost_layers_with_weight.py b/python/paddle/trainer_config_helpers/tests/configs/test_cost_layers_with_weight.py index c369062930..caa6aaa943 100644 --- a/python/paddle/trainer_config_helpers/tests/configs/test_cost_layers_with_weight.py +++ b/python/paddle/trainer_config_helpers/tests/configs/test_cost_layers_with_weight.py @@ -10,7 +10,7 @@ fc = fc_layer(input=data, size=10, act=SoftmaxActivation()) outputs( classification_cost( input=fc, label=lbl, weight=wt), - mse_cost( + square_error_cost( input=fc, label=lbl, weight=wt), nce_layer( input=fc, diff --git a/python/paddle/v2/framework/op.py b/python/paddle/v2/framework/op.py index 6ac656321e..0349407a85 100644 --- a/python/paddle/v2/framework/op.py +++ b/python/paddle/v2/framework/op.py @@ -94,9 +94,14 @@ class OpDescCreationMethod(object): new_attr.floats.extend(user_defined_attr) elif attr.type == framework_pb2.STRINGS: new_attr.strings.extend(user_defined_attr) + elif attr.type == framework_pb2.INT_PAIRS: + for p in user_defined_attr: + pair = new_attr.pairs.add() + pair.first = p[0] + pair.second = p[1] else: raise NotImplementedError("Not support attribute type " + - attr.type) + str(attr.type)) return op_desc @@ -179,7 +184,7 @@ class OperatorFactory(object): class __RecurrentOp__(object): __proto__ = None - type = 'recurrent_op' + type = 'recurrent' def __init__(self): # cache recurrent_op's proto diff --git a/python/paddle/v2/framework/tests/CMakeLists.txt b/python/paddle/v2/framework/tests/CMakeLists.txt index 850910363d..e21f692f65 100644 --- a/python/paddle/v2/framework/tests/CMakeLists.txt +++ b/python/paddle/v2/framework/tests/CMakeLists.txt @@ -5,6 +5,7 @@ py_test(test_scope SRCS test_scope.py) py_test(test_tensor SRCS test_tensor.py) py_test(test_mul_op SRCS test_mul_op.py) py_test(test_dropout_op SRCS test_dropout_op.py) +py_test(test_cos_sim_op SRCS test_cos_sim_op.py) py_test(test_mean_op SRCS test_mean_op.py) diff --git a/python/paddle/v2/framework/tests/gradient_checker.py b/python/paddle/v2/framework/tests/gradient_checker.py index 518f828bac..fdb06b7988 100644 --- a/python/paddle/v2/framework/tests/gradient_checker.py +++ b/python/paddle/v2/framework/tests/gradient_checker.py @@ -36,13 +36,13 @@ def get_numeric_gradient(op, in_place=False): """ Get Numeric Gradient for an operator's input. - - :param op: C++ operator instance, could be an network - :param input_values: The input variables. Should be an dictionary, key is + + :param op: C++ operator instance, could be an network + :param input_values: The input variables. Should be an dictionary, key is variable name. Value is numpy array. - :param output_name: The final output variable name. + :param output_name: The final output variable name. :param input_to_check: The input variable need to get gradient. - :param delta: The perturbation value for numeric gradient method. The + :param delta: The perturbation value for numeric gradient method. The smaller delta is, the more accurate result will get. But if that delta is too small, it could occur numerical stability problem. :param local_scope: The local scope used for get_numeric_gradient. @@ -229,9 +229,9 @@ class GradientChecker(unittest.TestCase): """Use relative error for the comparison. :param numeric_grads: the numerical graidents. - :type numeric_grads: a list of numpy.array + :type numeric_grads: a list of numpy.array :param analytic_grads: the analytical graidents. - :type analytic_grads: a list of numpy.array + :type analytic_grads: a list of numpy.array :param name: the names of gradients, used to print for debug. :type names: a list of string :param msg_prefix: string info, used to print for debug. @@ -286,6 +286,9 @@ class GradientChecker(unittest.TestCase): for no_grad in no_grad_set: if no_grad not in in_names: raise ValueError("no_grad should be in in_names") + if no_grad in inputs_to_check: + raise ValueError("no_grad should not be in inputs_to_check") + backward_op = core.Operator.backward(forward_op, no_grad_set) places = [core.CPUPlace()] @@ -301,7 +304,6 @@ class GradientChecker(unittest.TestCase): check_names = [grad_var_name(name) for name in inputs_to_check] for place in places: - # get analytical gradients according to different device analytic_grads = self.__get_gradient(forward_op, backward_op, input_vars, check_names, place) self.__assert_is_close(numeric_grads, analytic_grads, check_names, diff --git a/python/paddle/v2/framework/tests/test_add_two_op.py b/python/paddle/v2/framework/tests/test_add_two_op.py index 0def484edd..a578e74eca 100644 --- a/python/paddle/v2/framework/tests/test_add_two_op.py +++ b/python/paddle/v2/framework/tests/test_add_two_op.py @@ -11,7 +11,7 @@ class TestAddOp(unittest.TestCase): __metaclass__ = OpTestMeta def setUp(self): - self.type = "add_two" + self.type = "add" self.inputs = { 'X': numpy.random.random((102, 105)).astype("float32"), 'Y': numpy.random.random((102, 105)).astype("float32") diff --git a/python/paddle/v2/framework/tests/test_cos_sim_op.py b/python/paddle/v2/framework/tests/test_cos_sim_op.py new file mode 100644 index 0000000000..32013a7999 --- /dev/null +++ b/python/paddle/v2/framework/tests/test_cos_sim_op.py @@ -0,0 +1,60 @@ +import unittest +import numpy as np +from gradient_checker import GradientChecker, create_op +from op_test_util import OpTestMeta + + +class TestCosSimOp(unittest.TestCase): + __metaclass__ = OpTestMeta + + def setUp(self): + self.type = "cos_sim" + self.inputs = { + 'X': np.random.random((32, 64)).astype("float32"), + 'Y': np.random.random((32, 64)).astype("float32") + } + expect_x_norm = np.linalg.norm(self.inputs['X'], axis=1) + expect_y_norm = np.linalg.norm(self.inputs['Y'], axis=1) + expect_out = (self.inputs['X'] * self.inputs['Y']).sum(axis=1) / \ + expect_x_norm / expect_y_norm + self.outputs = { + 'XNorm': np.expand_dims(expect_x_norm, 1), + 'YNorm': np.expand_dims(expect_y_norm, 1), + 'Out': np.expand_dims(expect_out, 1) + } + + +class TestCosSimGradOp(GradientChecker): + def setUp(self): + self.op = create_op("cos_sim") + self.inputs = { + 'X': np.random.random((10, 5)).astype("float32"), + 'Y': np.random.random((10, 5)).astype("float32") + } + + def test_cpu_gpu_compare(self): + self.compare_grad(self.op, self.inputs) + + def test_normal(self): + self.check_grad( + self.op, self.inputs, ["X", "Y"], "Out", max_relative_error=0.05) + + def test_ignore_x(self): + self.check_grad( + self.op, + self.inputs, ["Y"], + "Out", + max_relative_error=0.05, + no_grad_set={"X"}) + + def test_ignore_y(self): + self.check_grad( + self.op, + self.inputs, ["X"], + "Out", + max_relative_error=0.05, + no_grad_set={"Y"}) + + +if __name__ == '__main__': + unittest.main() diff --git a/python/paddle/v2/framework/tests/test_dropout_op.py b/python/paddle/v2/framework/tests/test_dropout_op.py index c5ff55f74c..3b742b8e1c 100644 --- a/python/paddle/v2/framework/tests/test_dropout_op.py +++ b/python/paddle/v2/framework/tests/test_dropout_op.py @@ -4,7 +4,7 @@ from gradient_checker import GradientChecker, create_op from op_test_util import OpTestMeta -class TestDropoutOpProbZero(unittest.TestCase): +class TestDropoutOpWithProbZero(unittest.TestCase): __metaclass__ = OpTestMeta def setUp(self): @@ -14,7 +14,7 @@ class TestDropoutOpProbZero(unittest.TestCase): self.outputs = {'Out': self.inputs['X'], 'Mask': np.ones((32, 64))} -class TestDropoutOpProbOne(unittest.TestCase): +class TestDropoutOpWithProbOne(unittest.TestCase): __metaclass__ = OpTestMeta def setUp(self): @@ -24,18 +24,32 @@ class TestDropoutOpProbOne(unittest.TestCase): self.outputs = {'Out': np.zeros((32, 64)), 'Mask': np.zeros((32, 64))} +class TestDropoutOpWithRank3(unittest.TestCase): + __metaclass__ = OpTestMeta + + def setUp(self): + self.type = "dropout" + self.inputs = {'X': np.random.random((32, 64, 8)).astype("float32")} + self.attrs = {'dropout_prob': 0.0} + self.outputs = {'Out': self.inputs['X'], 'Mask': np.ones((32, 64, 8))} + + class TestDropoutGradOp(GradientChecker): - def test_dropout_2d(self): - op = create_op("dropout") - inputs = {'X': np.random.random((10, 5)).astype("float32")} - self.compare_grad(op, inputs) - self.check_grad(op, inputs, set(["X"]), "Out") - - def test_dropout_3d(self): - op = create_op("dropout") - inputs = {'X': np.random.random((10, 5, 4)).astype("float32")} - self.compare_grad(op, inputs) - self.check_grad(op, inputs, set(["X"]), "Out") + def setUp(self): + self.op = create_op("dropout") + self.inputs = {'X': np.random.random((10, 5)).astype("float32")} + + def test_cpu_gpu_compare(self): + self.compare_grad(self.op, self.inputs) + + def test_normal(self): + self.check_grad(self.op, self.inputs, set(["X"]), "Out") + + +class TestDropoutGradOpWithRank3(TestDropoutGradOp): + def setUp(self): + self.op = create_op("dropout") + self.inputs = {'X': np.random.random((10, 5, 4)).astype("float32")} if __name__ == '__main__': diff --git a/python/paddle/v2/framework/tests/test_gradient_checker.py b/python/paddle/v2/framework/tests/test_gradient_checker.py index e0b3151208..857427cdfb 100644 --- a/python/paddle/v2/framework/tests/test_gradient_checker.py +++ b/python/paddle/v2/framework/tests/test_gradient_checker.py @@ -7,7 +7,7 @@ from gradient_checker import get_numeric_gradient class GetNumericGradientTest(unittest.TestCase): def test_add_op(self): - add_op = Operator('add_two', X="X", Y="Y", Out="Z") + add_op = Operator('add', X="X", Y="Y", Out="Z") x = numpy.random.random((10, 1)).astype("float32") y = numpy.random.random((10, 1)).astype("float32") diff --git a/python/paddle/v2/framework/tests/test_mul_op.py b/python/paddle/v2/framework/tests/test_mul_op.py index ee0d81a64e..b58e4266d1 100644 --- a/python/paddle/v2/framework/tests/test_mul_op.py +++ b/python/paddle/v2/framework/tests/test_mul_op.py @@ -16,16 +16,37 @@ class TestMulOp(unittest.TestCase): self.outputs = {'Out': np.dot(self.inputs['X'], self.inputs['Y'])} -class MulGradOpTest(GradientChecker): - def test_mul(self): - op = create_op("mul") - inputs = { +class TestMulGradOp(GradientChecker): + def setUp(self): + self.op = create_op("mul") + self.inputs = { 'X': np.random.random((32, 84)).astype("float32"), 'Y': np.random.random((84, 100)).astype("float32") } + + def test_cpu_gpu_compare(self): + self.compare_grad(self.op, self.inputs) + + def test_normal(self): # mul op will enlarge the relative error self.check_grad( - op, inputs, set(["X", "Y"]), "Out", max_relative_error=0.5) + self.op, self.inputs, ["X", "Y"], "Out", max_relative_error=0.5) + + def test_ignore_x(self): + self.check_grad( + self.op, + self.inputs, ["Y"], + "Out", + max_relative_error=0.5, + no_grad_set={"X"}) + + def test_ignore_y(self): + self.check_grad( + self.op, + self.inputs, ["X"], + "Out", + max_relative_error=0.5, + no_grad_set={"Y"}) # TODO(dzh,qijun) : mulgrad test case need transpose feature of blas library diff --git a/python/paddle/v2/framework/tests/test_net.py b/python/paddle/v2/framework/tests/test_net.py index 9339cf28da..e4b7cd480c 100644 --- a/python/paddle/v2/framework/tests/test_net.py +++ b/python/paddle/v2/framework/tests/test_net.py @@ -15,7 +15,7 @@ def fc(X, W, Y): class TestNet(unittest.TestCase): def test_net_all(self): net = core.Net.create() - op1 = Operator("add_two", X="X", Y="Y", Out="Out") + op1 = Operator("add", X="X", Y="Y", Out="Out") net.append_op(op1) net2 = core.Net.create() @@ -26,7 +26,7 @@ class TestNet(unittest.TestCase): expected = ''' Op(plain_net), inputs:{all[W, X, Y]}, outputs:{all[Out, fc.out, pre_activation]}. - Op(add_two), inputs:{X[X], Y[Y]}, outputs:{Out[Out]}. + Op(add), inputs:{X[X], Y[Y]}, outputs:{Out[Out]}. Op(plain_net), inputs:{all[W, X]}, outputs:{all[fc.out, pre_activation]}. Op(plain_net), inputs:{all[W, X]}, outputs:{all[fc.out, pre_activation]}. Op(mul), inputs:{X[X], Y[W]}, outputs:{Out[pre_activation]}. diff --git a/python/paddle/v2/framework/tests/test_operator.py b/python/paddle/v2/framework/tests/test_operator.py index 1abc4eeb57..040556322d 100644 --- a/python/paddle/v2/framework/tests/test_operator.py +++ b/python/paddle/v2/framework/tests/test_operator.py @@ -193,10 +193,10 @@ class TestOpDescCreationMethod(unittest.TestCase): class TestOpCreations(unittest.TestCase): def test_all(self): - add_op = op.Operator("add_two", X="a", Y="b", Out="z") + add_op = op.Operator("add", X="a", Y="b", Out="z") self.assertIsNotNone(add_op) # Invoke C++ DebugString() - self.assertEqual('Op(add_two), inputs:{X[a], Y[b]}, outputs:{Out[z]}.', + self.assertEqual('Op(add), inputs:{X[a], Y[b]}, outputs:{Out[z]}.', str(add_op)) diff --git a/python/paddle/v2/framework/tests/test_recurrent_op.py b/python/paddle/v2/framework/tests/test_recurrent_op.py index d6000ab9f9..22e680fd78 100644 --- a/python/paddle/v2/framework/tests/test_recurrent_op.py +++ b/python/paddle/v2/framework/tests/test_recurrent_op.py @@ -146,7 +146,7 @@ class TestRecurrentOp(unittest.TestCase): stepnet = core.Net.create() x_fc_op = Operator("mul", X="x@alias", Y="W", Out="Wx") h_fc_op = Operator("mul", X="h@pre", Y="U", Out="Uh") - sum_op = Operator("add_two", X="Wx", Y="Uh", Out="sum") + sum_op = Operator("add", X="Wx", Y="Uh", Out="sum") sig_op = Operator("sigmoid", X="sum", Y="h@alias") for op in [x_fc_op, h_fc_op, sum_op, sig_op]: diff --git a/python/paddle/v2/framework/tests/test_rowwise_add_op.py b/python/paddle/v2/framework/tests/test_rowwise_add_op.py index 45d569da29..2ddb85e2e7 100644 --- a/python/paddle/v2/framework/tests/test_rowwise_add_op.py +++ b/python/paddle/v2/framework/tests/test_rowwise_add_op.py @@ -16,14 +16,22 @@ class TestRowwiseAddOp(unittest.TestCase): self.outputs = {'Out': np.add(self.inputs['X'], self.inputs['b'])} -class RowwiseAddGradOpTest(GradientChecker): - def test_rowwise_add(self): - op = create_op("rowwise_add") - inputs = { +class TestRowwiseAddGradOp(GradientChecker): + def setUp(self): + self.op = create_op("rowwise_add") + self.inputs = { "X": np.random.uniform(0.1, 1, [5, 10]).astype("float32"), "b": np.random.uniform(0.1, 1, [10]).astype("float32") } - self.check_grad(op, inputs, set(["X", "b"]), "Out") + + def test_normal(self): + self.check_grad(self.op, self.inputs, ["X", "b"], "Out") + + def test_ignore_b(self): + self.check_grad(self.op, self.inputs, ["X"], "Out", no_grad_set={"b"}) + + def test_ignore_x(self): + self.check_grad(self.op, self.inputs, ["b"], "Out", no_grad_set={"X"}) if __name__ == '__main__': diff --git a/python/paddle/v2/tests/test_layer.py b/python/paddle/v2/tests/test_layer.py index 783a0ca85d..de932ad715 100644 --- a/python/paddle/v2/tests/test_layer.py +++ b/python/paddle/v2/tests/test_layer.py @@ -134,8 +134,9 @@ class CostLayerTest(unittest.TestCase): cost3 = layer.cross_entropy_cost(input=inference, label=label) cost4 = layer.cross_entropy_with_selfnorm_cost( input=inference, label=label) - cost5 = layer.mse_cost(input=inference, label=label) - cost6 = layer.mse_cost(input=inference, label=label, weight=weight) + cost5 = layer.square_error_cost(input=inference, label=label) + cost6 = layer.square_error_cost( + input=inference, label=label, weight=weight) cost7 = layer.multi_binary_label_cross_entropy_cost( input=inference, label=label) cost8 = layer.rank_cost(left=score, right=score, label=score)