From 9d569c5a38582cbf9022578c046f89a88697c493 Mon Sep 17 00:00:00 2001 From: fengjiayi Date: Thu, 3 Aug 2017 17:57:00 -0700 Subject: [PATCH 01/11] Update Backward.md Add the "Backward Operator Registry" section --- paddle/framework/backward.md | 24 ++++++++++++++++++++++-- 1 file changed, 22 insertions(+), 2 deletions(-) diff --git a/paddle/framework/backward.md b/paddle/framework/backward.md index 74c001b06a..61f308b469 100644 --- a/paddle/framework/backward.md +++ b/paddle/framework/backward.md @@ -1,8 +1,28 @@ -## Operator/expression 's Backward +# Operator/expression 's Backward -### Motivation +## Motivation In Neural Network, the backpropagation algorithm follows the chain rule, so we need to compound the fundmental gradient operators/expressions together with chain rule . Every forward network need a backward network to construct the full computation lineage, the operator/ expression's Backward feature will generate the backward pass respect to forward pass. + +## Backward Operator Registry + +A backward network is built up with several backward operators. Backward operators take forward operators' inputs, outputs and output gradients, and then calculate its input gradients. In most cases, there is a one-to-one correspondence between forward and backward operators. We use registry mechanism to save these correspondences, which is quite similar with operator registry itself. + +For example, we have got a `add_two_op`, and is registered by the following code: + +```cpp +REGISTER_OP(add_two, AddTwoOp, AddTwoOpMaker); +``` + +`add_two` is the operator's type. `AddTwoOp` and `AddTwoOpMaker` are the operator class and the operator maker class respectively. + +Assume that we have also got the backward operator of `add_two_op`, which calculating the gradients of `add_two_op`'s inputs. Then we register it by the following way: + +```cpp +REGISTER_GRADIENT_OP(add_two, add_two_grad, AddTwoGradOp); +``` + +`add_two_grad` is the type of backward operator, and `AddTwoGradOp` is its class name. ### Implement : gradient operator registry From 7304006b7121c844d071227a6c2d24245a06e32e Mon Sep 17 00:00:00 2001 From: fengjiayi Date: Tue, 8 Aug 2017 16:38:27 -0700 Subject: [PATCH 02/11] Update backward.md --- paddle/framework/backward.md | 29 ++++++++++++++++++++--------- 1 file changed, 20 insertions(+), 9 deletions(-) diff --git a/paddle/framework/backward.md b/paddle/framework/backward.md index 61f308b469..c717c2f30b 100644 --- a/paddle/framework/backward.md +++ b/paddle/framework/backward.md @@ -24,20 +24,31 @@ REGISTER_GRADIENT_OP(add_two, add_two_grad, AddTwoGradOp); `add_two_grad` is the type of backward operator, and `AddTwoGradOp` is its class name. -### Implement : gradient operator registry +## Backward Opeartor Creating -| | forward operator | backward operator | -| ---------------------- | ---------------- | -------------------------------- | -| **Operator::inputs_** | Inputs | Inputs, Outputs, OutputGradients | -| **Operator::outputs_** | Outputs | InputGradients | +### Usage -Inputs/Outputs means the input/output of the operator, InputGradients/OutputGradients is the gradient respect to forward opeartor. Forward operator and Backward operator are isomorphic, save their corresponding needs into member attribute. +Given a certain forward operator, we can get its corresponding backward opeartor by calling: -We use a global hash map record the gradient operators available, follow the philosophy of minimum core, make operator pluggable unit. Each gradient is an operator and it needs to regist itself. +```cpp +OperatorBase* bwd_op = BuildGradOp(const OperatorBase* fwd_op); +``` + +The function `BuildGradOp` will sequentially execute following processes: + +1. Getting the `type_` of given forward operator, and then creating the corresponding backward operator. + +2. Copying all the attributes of forward operator expect `input_format` and `output_format`(if it has), for their elements differ between forward and backward operators. + +3. Copying forward operator's `inputs_` and `outputs_` to backward operator's `inputs_`. And adding forward inputs' gradient variables into backward `output_`, adding forward outputs' gradient variables into backward `input_`. + +4. Building backward operator's `input_format`, `output_format` (if necessary) and `in_out_idxs_` according to its `inputs_` and `outputs_` just created. + +## Backward Network Building -grad_op_builder(fengjiayi) +A backward network is a series of backward operators. The main idea of building a backward network is creating backward operators in the inverted sequence and put them together. -### Implement : Backward network +In our design, the network itself is also a kind of operator. So the operators contained by a big network may be some small network. given a forward network, it generates the backward network. We only care about the Gradients—`OutputGradients`,`InputGradients`. From bb5c656b574b1e518da981d781db0e1e0a0e4d75 Mon Sep 17 00:00:00 2001 From: fengjiayi Date: Sat, 26 Aug 2017 19:15:31 -0700 Subject: [PATCH 03/11] test --- paddle/framework/backward.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/paddle/framework/backward.md b/paddle/framework/backward.md index c717c2f30b..d5dbd57d19 100644 --- a/paddle/framework/backward.md +++ b/paddle/framework/backward.md @@ -6,7 +6,7 @@ In Neural Network, the backpropagation algorithm follows the chain rule, so we n ## Backward Operator Registry -A backward network is built up with several backward operators. Backward operators take forward operators' inputs, outputs and output gradients, and then calculate its input gradients. In most cases, there is a one-to-one correspondence between forward and backward operators. We use registry mechanism to save these correspondences, which is quite similar with operator registry itself. +A backward network is built up with several backward operators. Backward operators take forward operators' inputs, outputs and output gradients and then calculate its input gradients. In most cases, there is a one-to-one correspondence between forward and backward operators. We use registry mechanism to save these correspondences, which is quite similar with operator registry itself. For example, we have got a `add_two_op`, and is registered by the following code: From 4a83dde594d0aa6d19aeff7471b040277a8a839f Mon Sep 17 00:00:00 2001 From: caoying03 Date: Sun, 27 Aug 2017 11:28:05 +0800 Subject: [PATCH 04/11] save parameters into ordered dict. --- python/paddle/v2/parameters.py | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/python/paddle/v2/parameters.py b/python/paddle/v2/parameters.py index b8af5abaea..475067ef22 100644 --- a/python/paddle/v2/parameters.py +++ b/python/paddle/v2/parameters.py @@ -14,6 +14,7 @@ import numpy as np from paddle.proto.ParameterConfig_pb2 import ParameterConfig +from collections import OrderedDict import paddle.trainer.config_parser as cp import struct import tarfile @@ -62,7 +63,7 @@ class Parameters(object): """ def __init__(self): - self.__param_conf__ = dict() + self.__param_conf__ = OrderedDict() self.__gradient_machines__ = [] self.__tmp_params__ = dict() @@ -231,6 +232,9 @@ class Parameters(object): :rtype: np.ndarray """ import py_paddle.swig_paddle as api + if self.__param_conf__[key].is_static: + return np.zeros(self.__param_conf__[key].size, dtype=np.float32) + return self.__getter_inner(key, api.PARAMETER_GRADIENT) def set(self, parameter_name, value): From 4590f793f111dd4fc5134ca9bbd0a213b41962b7 Mon Sep 17 00:00:00 2001 From: fengjiayi Date: Sun, 27 Aug 2017 17:37:41 -0700 Subject: [PATCH 05/11] Update backward document --- paddle/framework/backward.md | 24 ++++++++---------------- 1 file changed, 8 insertions(+), 16 deletions(-) diff --git a/paddle/framework/backward.md b/paddle/framework/backward.md index b4205fed2e..133b17c7be 100644 --- a/paddle/framework/backward.md +++ b/paddle/framework/backward.md @@ -2,32 +2,24 @@ ## Motivation -In Neural Network, the backpropagation algorithm follows the chain rule, so we need to compound the fundmental gradient operators/expressions together with chain rule . Every forward network need a backward network to construct the full computation lineage, the operator/ expression's Backward feature will generate the backward pass respect to forward pass. - +In Neural Network, the backpropagation algorithm follows the chain rule, so we need to compound the fundmental gradient operators/expressions together with chain rule . Every forward network need a backward network to construct the full computation lineage, the operator/expression's backward pass will be generated respect to forward pass. + ## Backward Operator Registry -A backward network is built up with several backward operators. Backward operators take forward operators' inputs, outputs and output gradients and then calculate its input gradients. In most cases, there is a one-to-one correspondence between forward and backward operators. We use registry mechanism to save these correspondences, which is quite similar with operator registry itself. +A backward network is built up with several backward operators. Backward operators take forward operators' inputs, outputs and output gradients and then calculate its input gradients. In most cases, there is a one-to-one correspondence between forward and backward operators. We use registry mechanism to save these correspondences. For example, we have got a `add_two_op`, and is registered by the following code: ```cpp -REGISTER_OP(add_two, AddTwoOp, AddTwoOpMaker); +REGISTER_OP(add_two, AddTwoOp, AddTwoOpMaker, add_two_grad, AddTwoGradOp); ``` `add_two` is the operator's type. `AddTwoOp` and `AddTwoOpMaker` are the operator class and the operator maker class respectively. -Assume that we have also got the backward operator of `add_two_op`, which calculating the gradients of `add_two_op`'s inputs. Then we register it by the following way: - -```cpp -REGISTER_GRADIENT_OP(add_two, add_two_grad, AddTwoGradOp); -``` - `add_two_grad` is the type of backward operator, and `AddTwoGradOp` is its class name. ## Backward Opeartor Creating -### Usage - Given a certain forward operator, we can get its corresponding backward opeartor by calling: ```cpp @@ -36,13 +28,13 @@ OperatorBase* bwd_op = BuildGradOp(const OperatorBase* fwd_op); The function `BuildGradOp` will sequentially execute following processes: -1. Getting the `type_` of given forward operator, and then creating the corresponding backward operator. +1. Get the `type_` of given forward operator, and then get the corresponding backward operator's type by looking up the `OpInfoMap`. -2. Copying all the attributes of forward operator expect `input_format` and `output_format`(if it has), for their elements differ between forward and backward operators. +2. Build two maps named `inputs` and `outputs` to temporary storage backward operator's inputs and outputs. Copy forward operator's `inputs_` and `outputs_` to map `inputs`, except these are not necessary for gradient computing. -3. Copying forward operator's `inputs_` and `outputs_` to backward operator's `inputs_`. And adding forward inputs' gradient variables into backward `output_`, adding forward outputs' gradient variables into backward `input_`. +3. Add forward inputs' gradient variables into map `output`, adding forward outputs' gradient variables into map `input`. -4. Building backward operator's `input_format`, `output_format` (if necessary) and `in_out_idxs_` according to its `inputs_` and `outputs_` just created. +4. Building backward operator with `inputs`, `outputs` and forward operator's attributes. ## Backward Network Building From be4c0123c4c6cccfaa8fafa9063ce84415854c28 Mon Sep 17 00:00:00 2001 From: caoying03 Date: Mon, 28 Aug 2017 10:11:54 +0800 Subject: [PATCH 06/11] follow comments. --- python/paddle/v2/parameters.py | 23 ++++++++++++++++++++--- 1 file changed, 20 insertions(+), 3 deletions(-) diff --git a/python/paddle/v2/parameters.py b/python/paddle/v2/parameters.py index 475067ef22..cc3adf6f48 100644 --- a/python/paddle/v2/parameters.py +++ b/python/paddle/v2/parameters.py @@ -43,9 +43,26 @@ def create(layers): class Parameters(object): """ - Parameters is a dictionary contains Paddle's parameter. The key of - Parameters is the name of parameter. The value of Parameters is a plain - :code:`numpy.ndarry` . + `Parameters` manages all the learnable parameters in a neural network. + It stores parameters' information in an OrderedDict, key of which is + the name of a parameter, and value related to a key is a parameter's + configuration, such as initialization mean and std, its size, whether it is + a static parameter, and so on. + + :param __param_conf__: this member stores the configurations of learnable + parameters in a network in an OrderedDict. The parameters are added by + following their creation order in the neural network one by one: + parameters of the previous layers in a network are careted first. + When a user iterates over this dict, he can visit parameters in the + network from button to up. + :type __param_conf__: OrderedDict + :param __gradient_machines__: all of the parameters in a neural network are + appended to a Paddle gradient machine, which is used internally to copy + the parameter values between the C++ and Python end. + :type __gradient_machines__: list + :param __tmp_params__: a dict to store dummy parameters if no + __gradient_machines__ is appended to `Parameters`. + :type __tmp_params__: dict Basically usage is From 346630f413a2e9aa9cbbdf2af4595a461ec09ac0 Mon Sep 17 00:00:00 2001 From: Luo Tao Date: Mon, 28 Aug 2017 11:19:53 +0800 Subject: [PATCH 07/11] Remove "About" tab in "Documentation" --- doc/about/index_cn.md | 11 ----------- doc/about/index_en.rst | 14 -------------- doc/index_en.rst | 1 - 3 files changed, 26 deletions(-) delete mode 100644 doc/about/index_cn.md delete mode 100644 doc/about/index_en.rst diff --git a/doc/about/index_cn.md b/doc/about/index_cn.md deleted file mode 100644 index 3bf030004d..0000000000 --- a/doc/about/index_cn.md +++ /dev/null @@ -1,11 +0,0 @@ -关于PaddlePaddle -================ - -PaddlePaddle是一个最早由百度科学家和工程师共同研发的并行分布式深度学习平台,兼备易用性、高效性、灵活性和可扩展性,目前已被百度内部多个产品线广泛使用。 -PaddlePaddle目前已经开放源码, 但是远未完善,我们希望能在这个基础上不断的改进、扩展和延伸。 -同时我们希望广大开发者积极提供反馈和贡献源代码,建立一个活跃的开源社区。 - -致谢 --------- - -在此,特别感谢PaddlePaddle的[所有贡献者](https://github.com/PaddlePaddle/Paddle/graphs/contributors)。 diff --git a/doc/about/index_en.rst b/doc/about/index_en.rst deleted file mode 100644 index 065c430cde..0000000000 --- a/doc/about/index_en.rst +++ /dev/null @@ -1,14 +0,0 @@ -ABOUT -======= - -PaddlPaddle is an easy-to-use, efficient, flexible and scalable deep learning platform, -which is originally developed by Baidu scientists and engineers for the purpose of applying deep learning to many products at Baidu. - -PaddlePaddle is now open source but far from complete, which is intended to be built upon, improved, scaled, and extended. -We hope to build an active open source community both by providing feedback and by actively contributing to the source code. - - -Credits --------- - -We owe many thanks to `all contributors and developers `_ of PaddlePaddle! diff --git a/doc/index_en.rst b/doc/index_en.rst index 168c7667c6..64684b8b9b 100644 --- a/doc/index_en.rst +++ b/doc/index_en.rst @@ -7,4 +7,3 @@ PaddlePaddle Documentation getstarted/index_en.rst howto/index_en.rst api/index_en.rst - about/index_en.rst From f0b25c4cfb21b41e8bc7222d44f05a9818dc9b47 Mon Sep 17 00:00:00 2001 From: caoying03 Date: Mon, 28 Aug 2017 12:20:28 +0800 Subject: [PATCH 08/11] follow comments to refine the comments. --- python/paddle/v2/parameters.py | 27 +++++++++++++-------------- 1 file changed, 13 insertions(+), 14 deletions(-) diff --git a/python/paddle/v2/parameters.py b/python/paddle/v2/parameters.py index cc3adf6f48..4cfd91882e 100644 --- a/python/paddle/v2/parameters.py +++ b/python/paddle/v2/parameters.py @@ -44,21 +44,20 @@ def create(layers): class Parameters(object): """ `Parameters` manages all the learnable parameters in a neural network. - It stores parameters' information in an OrderedDict, key of which is - the name of a parameter, and value related to a key is a parameter's - configuration, such as initialization mean and std, its size, whether it is - a static parameter, and so on. - - :param __param_conf__: this member stores the configurations of learnable - parameters in a network in an OrderedDict. The parameters are added by - following their creation order in the neural network one by one: - parameters of the previous layers in a network are careted first. - When a user iterates over this dict, he can visit parameters in the - network from button to up. + It stores parameters' information in an OrderedDict. The key is + the name of a parameter, and value is a parameter's configuration(in + protobuf format), such as initialization mean and std, its size, whether it + is a static parameter, and so on. + + :param __param_conf__: store the configurations of learnable parameters in + the network in an OrderedDict. Parameter is added one by one into the + dict by following their created order in the network: parameters of + the previous layers in a network are careted first. You can visit the + parameters from bottom to top by iterating over this dict. :type __param_conf__: OrderedDict :param __gradient_machines__: all of the parameters in a neural network are - appended to a Paddle gradient machine, which is used internally to copy - the parameter values between the C++ and Python end. + appended to a PaddlePaddle gradient machine, which is used internally to + copy parameter values between C++ and Python end. :type __gradient_machines__: list :param __tmp_params__: a dict to store dummy parameters if no __gradient_machines__ is appended to `Parameters`. @@ -271,7 +270,7 @@ class Parameters(object): append gradient machine to parameters. This method is used internally in Trainer.train. - :param gradient_machine: Paddle C++ GradientMachine object. + :param gradient_machine: PaddlePaddle C++ GradientMachine object. :type gradient_machine: api.GradientMachine :return: """ From 4f0c071e4909ff041f3a86c3a40c482becf50845 Mon Sep 17 00:00:00 2001 From: qijun Date: Mon, 28 Aug 2017 22:18:11 +0800 Subject: [PATCH 09/11] refine backward --- paddle/framework/backward.cc | 5 ++++- paddle/operators/net_op.cc | 9 ++++++--- 2 files changed, 10 insertions(+), 4 deletions(-) diff --git a/paddle/framework/backward.cc b/paddle/framework/backward.cc index bfda18724c..6b4c612cd8 100644 --- a/paddle/framework/backward.cc +++ b/paddle/framework/backward.cc @@ -124,6 +124,9 @@ static std::unique_ptr BackwardRecursive( std::list insert_position; for (auto& dup_output_op : dup_output_ops) { const std::string& name = dup_output_op.first; + // duplicate @Empty@ don't need to be added + if (name == kEmptyVarName) continue; + auto& dup_op = dup_output_op.second; // no duplicate output if (dup_op.size() == 1) continue; @@ -209,7 +212,7 @@ std::unique_ptr Backward( const OperatorBase& forwardOp, const std::unordered_set& no_grad_vars) { std::unordered_set no_grad_names; - no_grad_names.reserve(no_grad_vars.size()); + no_grad_names.reserve(no_grad_vars.size() + 1); no_grad_names.insert(std::string(kEmptyVarName) + kGradVarSuffix); diff --git a/paddle/operators/net_op.cc b/paddle/operators/net_op.cc index 44d925f0b0..78b5e27678 100644 --- a/paddle/operators/net_op.cc +++ b/paddle/operators/net_op.cc @@ -31,10 +31,13 @@ void NetOp::CompleteAddOp(bool calc) { for (auto& op : ops_) { for (auto& ipt : op->Inputs()) { for (auto& var_name : ipt.second) { - if (!Contains(output_set, var_name)) { // Not other op's output - input_set.insert(var_name); - } else { + // If input variable has been in output set, then it will be + // added into intermediate_outputs_. Otherwise, it will be + // added into input set. + if (Contains(output_set, var_name)) { intermediate_outputs_.insert(var_name); + } else { + input_set.insert(var_name); } } } From 980edfa69a72f57dea689d1d5b1bff6b388e7a71 Mon Sep 17 00:00:00 2001 From: fengjiayi Date: Mon, 28 Aug 2017 11:34:24 -0700 Subject: [PATCH 10/11] Refine backward document --- paddle/framework/backward.md | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/paddle/framework/backward.md b/paddle/framework/backward.md index 133b17c7be..ce324a73f0 100644 --- a/paddle/framework/backward.md +++ b/paddle/framework/backward.md @@ -6,9 +6,16 @@ In Neural Network, the backpropagation algorithm follows the chain rule, so we n ## Backward Operator Registry -A backward network is built up with several backward operators. Backward operators take forward operators' inputs, outputs and output gradients and then calculate its input gradients. In most cases, there is a one-to-one correspondence between forward and backward operators. We use registry mechanism to save these correspondences. +A backward network is built up with several backward operators. Backward operators take forward operators' inputs, outputs and output gradients and then calculate its input gradients. -For example, we have got a `add_two_op`, and is registered by the following code: +-| | forward operator | backward operator +-| ---------------------- | ---------------- |------------------------- | +-| **Operator::inputs_** | Inputs | Inputs, Outputs, OutputGradients | +-| **Operator::outputs_** | Outputs | InputGradients | + + In most cases, there is a one-to-one correspondence between forward and backward operators. These correspondences are recorded by a global hash map(`OpInfoMap`). To follow the philosophy of minimum core and make operators pluggable, the registry mechanism is introduced. + +For example, we have got a `add_two_op`, and we can register it's information and corresponding backward operator by the following macro: ```cpp REGISTER_OP(add_two, AddTwoOp, AddTwoOpMaker, add_two_grad, AddTwoGradOp); From eaeb69f98f70bbea4fe4aae9f7c7b830f75959c5 Mon Sep 17 00:00:00 2001 From: fengjiayi Date: Mon, 28 Aug 2017 13:47:37 -0700 Subject: [PATCH 11/11] Follow reviewer's comments --- paddle/framework/backward.md | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/paddle/framework/backward.md b/paddle/framework/backward.md index ce324a73f0..8aa6728a95 100644 --- a/paddle/framework/backward.md +++ b/paddle/framework/backward.md @@ -2,28 +2,28 @@ ## Motivation -In Neural Network, the backpropagation algorithm follows the chain rule, so we need to compound the fundmental gradient operators/expressions together with chain rule . Every forward network need a backward network to construct the full computation lineage, the operator/expression's backward pass will be generated respect to forward pass. +In Neural Network, the backpropagation algorithm follows the chain rule, so we need to compound the fundmental gradient operators/expressions together with chain rule . Every forward network need a backward network to construct the full computation graph, the operator/expression's backward pass will be generated respect to forward pass. ## Backward Operator Registry A backward network is built up with several backward operators. Backward operators take forward operators' inputs, outputs and output gradients and then calculate its input gradients. --| | forward operator | backward operator --| ---------------------- | ---------------- |------------------------- | --| **Operator::inputs_** | Inputs | Inputs, Outputs, OutputGradients | --| **Operator::outputs_** | Outputs | InputGradients | +| | forward operator | backward operator +| ---------------------- | ---------------- |------------------------- | +| **Operator::inputs_** | Inputs | Inputs, Outputs, OutputGradients | +| **Operator::outputs_** | Outputs | InputGradients | In most cases, there is a one-to-one correspondence between forward and backward operators. These correspondences are recorded by a global hash map(`OpInfoMap`). To follow the philosophy of minimum core and make operators pluggable, the registry mechanism is introduced. -For example, we have got a `add_two_op`, and we can register it's information and corresponding backward operator by the following macro: +For example, we have got a `mul_op`, and we can register it's information and corresponding backward operator by the following macro: ```cpp -REGISTER_OP(add_two, AddTwoOp, AddTwoOpMaker, add_two_grad, AddTwoGradOp); +REGISTER_OP(mul, MulOp, MulOpMaker, mul_grad, MulOpGrad); ``` -`add_two` is the operator's type. `AddTwoOp` and `AddTwoOpMaker` are the operator class and the operator maker class respectively. +`mul` is the operator's type. `MulOp` and `MulOpMaker` are the operator class and the operator maker class respectively. -`add_two_grad` is the type of backward operator, and `AddTwoGradOp` is its class name. +`mul_grad` is the type of backward operator, and `MulOpGrad` is its class name. ## Backward Opeartor Creating