From 6a329f0c8c392932d2885d0ada939353b5291f19 Mon Sep 17 00:00:00 2001
From: Yu Yang <yuyang18@baidu.com>
Date: Wed, 24 May 2017 16:22:33 +0800
Subject: [PATCH 1/5] design doc for implementation parameters in CPP.

---
 doc/design/parameters_in_cpp.md | 27 +++++++++++++++++++++++++++
 1 file changed, 27 insertions(+)
 create mode 100644 doc/design/parameters_in_cpp.md

diff --git a/doc/design/parameters_in_cpp.md b/doc/design/parameters_in_cpp.md
new file mode 100644
index 0000000000..4989a3d16b
--- /dev/null
+++ b/doc/design/parameters_in_cpp.md
@@ -0,0 +1,27 @@
+# Parameters in CPP
+
+`Parameters` is a concept we designed in Paddle V2 API. `Parameters` is a container of parameters, and make Paddle can shared parameter between topologies. We described usages of `Parameter` in [api.md](./api.md).
+
+We used Python to implementation Parameters before during API design phase. There are several defects for current implementation:
+* We just use `memcpy` to share Parameters between topologies, but this is very inefficient. 
+* We did not implement share Parameters while training is not complete. We just trigger `memcpy` when start training.
+
+It is necessary we implement Parameters in CPP side. However, it could be a refactorization for Paddle, because Paddle was designed for training only one topology before. In current Paddle implementation, there are three concepts associated with `Parameters`:
+
+1. `paddle::Parameter`. A `Parameters` is a container for `paddle::Parameter`. It is evident that we should use `paddle::Parameter` when developing `Parameters`. However, the `Parameter` class contains many functions and does not have a clear interface. It contains `create/store Parameter`, `serialize/deserialize`, `optimize(i.e SGD)`, `randomize/zero`. We just need `paddle::Parameter` just create and store `Tensors (or Matrix currently)`.  We should extract functionalities of Parameter into many classes to clean Paddle CPP implementation.
+2. `paddle::GradientMachine` and its sub-classes, i.e., `paddle::MultiGradientMachine`, `paddle::NeuralNetwork`. We should pass `Parameters` to `paddle::GradientMachine` when `forward/backward` to avoid `memcpy` between topologies. Also, we should handle multi-GPU/CPU training, because `forward` and `backward` would perform on multi-GPUs and multi-CPUs. `Parameters` should dispatch the parameter value to each device, and gather the parameter gradient from each device.
+
+3. `paddle::ParameterUpdater`. The ParameterUpdater is used to update parameters in Paddle. So `Parameters` should be used by `paddle::ParameterUpdater`, and `paddle::ParameterUpdater` should optimize `Parameters` (by SGD).
+
+
+The step by step approach for implementation Parameters in Paddle C++ core is listed below. Each step should be a PR merged into Paddle.
+
+1. Clean `paddle::Parameter` interface. Extract the functionalities of `paddle::Parameter` to prepare for the implementation of Parameters.
+
+2. Implementation a `Parameters` class. It just stores the `paddle::Parameter` inside. Make `GradientMachine` uses `Parameters` as a class member.
+
+3. Make `Parameters` support Multi-CPU and Multi-GPU training to prepare for sharing `Parameter` between topologies.  Because we need sharing `Parameter` between topologies, it is `Parameters`'s response to exchange Parameter between GPUs not `GradientMachine`, because `GradientMachine` only used for one topology.
+
+4. Make `Parameters` as an argument for `forward/backward` function, not a data member for `GradientMachine`. For example, `forward` could be `forward(const Parameters& params, ...)` and `backward` could be `backward(Parameters* params, ...)`. After this step, Paddle could share `Parameters` between topologies.
+
+5. `ParameterUpdater` is invoked by `GradientMachine` and `Trainer`, but it updates `Parameters`. In the end, we could change `ParameterUpdater` directly uses `Parameters` to make Paddle implementation clear.

From c57e98d4fea0af85a6ceff8eb838fb0a071f2222 Mon Sep 17 00:00:00 2001
From: Yu Yang <yuyang18@baidu.com>
Date: Wed, 24 May 2017 16:38:32 +0800
Subject: [PATCH 2/5] Refine english

---
 doc/design/parameters_in_cpp.md | 30 +++++++++++++++++++++---------
 1 file changed, 21 insertions(+), 9 deletions(-)

diff --git a/doc/design/parameters_in_cpp.md b/doc/design/parameters_in_cpp.md
index 4989a3d16b..357c057897 100644
--- a/doc/design/parameters_in_cpp.md
+++ b/doc/design/parameters_in_cpp.md
@@ -2,26 +2,38 @@
 
 `Parameters` is a concept we designed in Paddle V2 API. `Parameters` is a container of parameters, and make Paddle can shared parameter between topologies. We described usages of `Parameter` in [api.md](./api.md).
 
-We used Python to implementation Parameters before during API design phase. There are several defects for current implementation:
+We used Python to implement Parameters when disigning V2 API before. There are several defects for current implementation:
 * We just use `memcpy` to share Parameters between topologies, but this is very inefficient. 
-* We did not implement share Parameters while training is not complete. We just trigger `memcpy` when start training.
+* We did not implement share Parameters while training. We just trigger `memcpy` when start training.
 
-It is necessary we implement Parameters in CPP side. However, it could be a refactorization for Paddle, because Paddle was designed for training only one topology before. In current Paddle implementation, there are three concepts associated with `Parameters`:
+It is necessary that we implement Parameters in CPP side. However, it could be a refactorization for Paddle, because Paddle was designed for training only one topology before, i.e., each GradientMachine contains its Parameter as a data member. In current Paddle implementation, there are three concepts associated with `Parameters`:
 
-1. `paddle::Parameter`. A `Parameters` is a container for `paddle::Parameter`. It is evident that we should use `paddle::Parameter` when developing `Parameters`. However, the `Parameter` class contains many functions and does not have a clear interface. It contains `create/store Parameter`, `serialize/deserialize`, `optimize(i.e SGD)`, `randomize/zero`. We just need `paddle::Parameter` just create and store `Tensors (or Matrix currently)`.  We should extract functionalities of Parameter into many classes to clean Paddle CPP implementation.
-2. `paddle::GradientMachine` and its sub-classes, i.e., `paddle::MultiGradientMachine`, `paddle::NeuralNetwork`. We should pass `Parameters` to `paddle::GradientMachine` when `forward/backward` to avoid `memcpy` between topologies. Also, we should handle multi-GPU/CPU training, because `forward` and `backward` would perform on multi-GPUs and multi-CPUs. `Parameters` should dispatch the parameter value to each device, and gather the parameter gradient from each device.
+1. `paddle::Parameter`. A `Parameters` is a container for `paddle::Parameter`.
+It is evident that we should use `paddle::Parameter` when developing `Parameters`.
+However, the `Parameter` class contains many functions and does not have a clear interface.
+It contains `create/store Parameter`, `serialize/deserialize`, `optimize(i.e SGD)`, `randomize/zero`.
+When we developing `Parameters`, we only use `create/store Parameter` functionality.
+We should extract functionalities of Parameter into many classes to clean Paddle CPP implementation.
 
-3. `paddle::ParameterUpdater`. The ParameterUpdater is used to update parameters in Paddle. So `Parameters` should be used by `paddle::ParameterUpdater`, and `paddle::ParameterUpdater` should optimize `Parameters` (by SGD).
+2. `paddle::GradientMachine` and its sub-classes, e.g., `paddle::MultiGradientMachine`, `paddle::NeuralNetwork`.
+We should pass `Parameters` to `paddle::GradientMachine` when `forward/backward` to avoid `memcpy` between topologies.
+Also, we should handle multi-GPU/CPU training, because `forward` and `backward` would perform on multi-GPUs and multi-CPUs.
+`Parameters` should dispatch the parameter value to each device, and gather the parameter gradient from each device.
 
+3. `paddle::ParameterUpdater`. The ParameterUpdater is used to update parameters in Paddle. 
+So `Parameters` should be used by `paddle::ParameterUpdater`, and `paddle::ParameterUpdater` should optimize `Parameters` (by SGD).
 
-The step by step approach for implementation Parameters in Paddle C++ core is listed below. Each step should be a PR merged into Paddle.
+
+The step by step approach for implementation Parameters in Paddle C++ core is listed below. Each step should be a PR and could be merged into Paddle one by one.
 
 1. Clean `paddle::Parameter` interface. Extract the functionalities of `paddle::Parameter` to prepare for the implementation of Parameters.
 
 2. Implementation a `Parameters` class. It just stores the `paddle::Parameter` inside. Make `GradientMachine` uses `Parameters` as a class member.
 
-3. Make `Parameters` support Multi-CPU and Multi-GPU training to prepare for sharing `Parameter` between topologies.  Because we need sharing `Parameter` between topologies, it is `Parameters`'s response to exchange Parameter between GPUs not `GradientMachine`, because `GradientMachine` only used for one topology.
+3. Make `Parameters` support Multi-CPU and Multi-GPU training to prepare for sharing `Parameter` between topologies.
+Because we need share `Parameters` between topologies, it is `Parameters`'s response to exchange Parameters between GPUs.
+`GradientMachine` should not handle how to exchange Parameters because `GradientMachine` only used to train one topology and we need to support train many topologies in Paddle, i.e., there could be many GradientMachines use one `Parameters`.
 
 4. Make `Parameters` as an argument for `forward/backward` function, not a data member for `GradientMachine`. For example, `forward` could be `forward(const Parameters& params, ...)` and `backward` could be `backward(Parameters* params, ...)`. After this step, Paddle could share `Parameters` between topologies.
 
-5. `ParameterUpdater` is invoked by `GradientMachine` and `Trainer`, but it updates `Parameters`. In the end, we could change `ParameterUpdater` directly uses `Parameters` to make Paddle implementation clear.
+5. `ParameterUpdater` is invoked by `GradientMachine` and `Trainer`, but it updates `Parameters`. In the end of this refactorization, we could change `ParameterUpdater` directly uses `Parameters` to make `ParameterUpdater`'s implementation clear.

From bb777f8e4abc8e4e191074864a1851fe1d725fe4 Mon Sep 17 00:00:00 2001
From: Yu Yang <yuyang18@baidu.com>
Date: Wed, 24 May 2017 16:44:14 +0800
Subject: [PATCH 3/5] Refine English

---
 doc/design/parameters_in_cpp.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/doc/design/parameters_in_cpp.md b/doc/design/parameters_in_cpp.md
index 357c057897..d603de3114 100644
--- a/doc/design/parameters_in_cpp.md
+++ b/doc/design/parameters_in_cpp.md
@@ -6,7 +6,7 @@ We used Python to implement Parameters when disigning V2 API before. There are s
 * We just use `memcpy` to share Parameters between topologies, but this is very inefficient. 
 * We did not implement share Parameters while training. We just trigger `memcpy` when start training.
 
-It is necessary that we implement Parameters in CPP side. However, it could be a refactorization for Paddle, because Paddle was designed for training only one topology before, i.e., each GradientMachine contains its Parameter as a data member. In current Paddle implementation, there are three concepts associated with `Parameters`:
+It is necessary that we implement Parameters in CPP side. However, it could be a code refactoring for Paddle, because Paddle was designed for training only one topology before, i.e., each GradientMachine contains its Parameter as a data member. In current Paddle implementation, there are three concepts associated with `Parameters`:
 
 1. `paddle::Parameter`. A `Parameters` is a container for `paddle::Parameter`.
 It is evident that we should use `paddle::Parameter` when developing `Parameters`.
@@ -36,4 +36,4 @@ Because we need share `Parameters` between topologies, it is `Parameters`'s resp
 
 4. Make `Parameters` as an argument for `forward/backward` function, not a data member for `GradientMachine`. For example, `forward` could be `forward(const Parameters& params, ...)` and `backward` could be `backward(Parameters* params, ...)`. After this step, Paddle could share `Parameters` between topologies.
 
-5. `ParameterUpdater` is invoked by `GradientMachine` and `Trainer`, but it updates `Parameters`. In the end of this refactorization, we could change `ParameterUpdater` directly uses `Parameters` to make `ParameterUpdater`'s implementation clear.
+5. `ParameterUpdater` is invoked by `GradientMachine` and `Trainer`, but it updates `Parameters`. In the end of this code refactoring, we could change `ParameterUpdater` directly uses `Parameters` to make `ParameterUpdater`'s implementation clear.

From 27c70f39ec3917cb72b63a0dd7ccec08abb494fa Mon Sep 17 00:00:00 2001
From: Yu Yang <yuyang18@baidu.com>
Date: Wed, 24 May 2017 16:51:24 +0800
Subject: [PATCH 4/5] typo

---
 doc/design/parameters_in_cpp.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/doc/design/parameters_in_cpp.md b/doc/design/parameters_in_cpp.md
index d603de3114..4c0dcc1823 100644
--- a/doc/design/parameters_in_cpp.md
+++ b/doc/design/parameters_in_cpp.md
@@ -2,7 +2,7 @@
 
 `Parameters` is a concept we designed in Paddle V2 API. `Parameters` is a container of parameters, and make Paddle can shared parameter between topologies. We described usages of `Parameter` in [api.md](./api.md).
 
-We used Python to implement Parameters when disigning V2 API before. There are several defects for current implementation:
+We used Python to implement Parameters when designing V2 API before. There are several defects for current implementation:
 * We just use `memcpy` to share Parameters between topologies, but this is very inefficient. 
 * We did not implement share Parameters while training. We just trigger `memcpy` when start training.
 

From a038163eefd4c9868f15eaeb5be4839c8e990e9d Mon Sep 17 00:00:00 2001
From: Yu Yang <yuyang18@baidu.com>
Date: Thu, 25 May 2017 13:38:51 +0800
Subject: [PATCH 5/5] Follow comments

---
 doc/design/parameters_in_cpp.md | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/doc/design/parameters_in_cpp.md b/doc/design/parameters_in_cpp.md
index 4c0dcc1823..b6f99bc7d9 100644
--- a/doc/design/parameters_in_cpp.md
+++ b/doc/design/parameters_in_cpp.md
@@ -1,4 +1,4 @@
-# Parameters in CPP
+# Design Doc: The C++ Class `Parameters`
 
 `Parameters` is a concept we designed in Paddle V2 API. `Parameters` is a container of parameters, and make Paddle can shared parameter between topologies. We described usages of `Parameter` in [api.md](./api.md).
 
@@ -33,6 +33,8 @@ The step by step approach for implementation Parameters in Paddle C++ core is li
 3. Make `Parameters` support Multi-CPU and Multi-GPU training to prepare for sharing `Parameter` between topologies.
 Because we need share `Parameters` between topologies, it is `Parameters`'s response to exchange Parameters between GPUs.
 `GradientMachine` should not handle how to exchange Parameters because `GradientMachine` only used to train one topology and we need to support train many topologies in Paddle, i.e., there could be many GradientMachines use one `Parameters`.
+   * We should use a global function to exchange Parameters between GPUs, not a member function in `Parameters`. The `MultiGradientMachine` invoke this function, which uses `Parameters` as this function inputs.
+   * The MultiGradientMachine contains many functionalities. Extracting the Parameters exchanging logic could make MultiGradientMachine clearer and simpler.
 
 4. Make `Parameters` as an argument for `forward/backward` function, not a data member for `GradientMachine`. For example, `forward` could be `forward(const Parameters& params, ...)` and `backward` could be `backward(Parameters* params, ...)`. After this step, Paddle could share `Parameters` between topologies.