update by comment

7 years ago · f839e91b04
parent b382747396
commit f839e91b04
4 changed files with 28 additions and 24 deletions
--- a/doc/fluid/design/dist_train/large_model.md
+++ b/doc/fluid/design/dist_train/large_model.md
@ -1,44 +1,48 @@
-# Design Doc: Large Model
+# Design Doc: Prefecting Parameter From Parameter Server
 ## Abstract
-We propose an approach to support the large parameter.
+We propose an approach to prefetch parameter from Parameter
-For embedding layer, the parameter may very large and could
+Server while distributed training so that Fluid would training
-not be stored in one trainer's memory. In this approach, a Trainer would
+a model including the large parameter which could not be stored in one
-prefetch a sliced parameter from different Parameter Server instances
+trainer's memory.
-according to the input `Ids`, and then run forward, backward and send
+
-the gradient to Parameter Server to execute the optimize program.
+## Background
 For an embedding layer, the trainable parameter may be very large and could
 not be stored in one trainer's memory. In Fluid distributed training,
 [Distributed Transpiler](./parameter_server.md#distributed-transpiler) would split every parameter into a number of small
 parameters and stored in Parameter Server, so we could prefetch the parameter
 from the specified Parameter Server according to the input `Ids`.
 ## Design
-**NOTE**: this approach is a feature of Fluid distributed trianing, maybe you want
+This is a feature of Fluid distributed training, maybe you want
 to know [Distributed Architecture](./distributed_architecture.md) and
 [Parameter Server](./parameter_server.md) before reading the following content.
-Fluid large model distributed training use 
+### Partationed Parameter
 [Distributed Transpiler](./parameter_server.md#distributed-transpiler) to split
 a large parameter into multiple parameters which stored on Parameter Server, and
 the Trainer would prefetch them by `RPC` interface.
 ### Split Large Parameter
 <img src="src/split_parameter.png" width="400" />
-**Distributed Transpiler** would split the large parameter
+- **Distributed Transpiler** would split the large parameter
-(weight) into some sliced parameters (weight_0, weight_1, weight_2) as the 
+(weight) into some partitioned parameters (weight_0, weight_1, weight_2) as the
 figure above.
 - We could use `round-robin` to distribute the partitioned parameter.
-### Prefetch Parameters from Parameter Servers
+### Prefetching Parameter
 <img src="src/prefetch_parameters.png" width="400" />
- `PrefetchRpc` operator would send the rows index the multiple Parameter Servers,
+- `prefetch_rpc` operator would prefetch the parameter from different Parameter
-  and then receive the SelctedRows.
+    Server according with the input `Ids`, we use [SelectedRows](../../../design/selected_rows.md)
- The different with normal Fluid distributed training, we only prefetch the rows
+    as the received variable type.
 - `merge_selected_rows` operator would merge the received parameters into one
    `SelectedRows` variable.
 ## TODO
- Async Update
+- `prefetch_rpc` operator to send rows index and receive SelectedRows variables.
-
+- `lookup_table` need to support `SelectedRows` variable type as input `Weight`.
-  To avoid slow-node, Async update is important for distributed training,
+- Async Update, To avoid slow-node, Async update is important for distributed training,
-  we need an design doc and implement it in future.
+  we need a design doc and implement it in future.
--- a/doc/fluid/design/dist_train/src/prefetch_parameters.graffle
+++ b/doc/fluid/design/dist_train/src/prefetch_parameters.graffle
--- a/doc/fluid/design/dist_train/src/split_parameter.graffle
+++ b/doc/fluid/design/dist_train/src/split_parameter.graffle
--- a/doc/fluid/design/dist_train/src/split_parameter.png
+++ b/doc/fluid/design/dist_train/src/split_parameter.png