parent
b382747396
commit
f839e91b04
@ -1,44 +1,48 @@
|
||||
# Design Doc: Large Model
|
||||
# Design Doc: Prefecting Parameter From Parameter Server
|
||||
|
||||
## Abstract
|
||||
|
||||
We propose an approach to support the large parameter.
|
||||
For embedding layer, the parameter may very large and could
|
||||
not be stored in one trainer's memory. In this approach, a Trainer would
|
||||
prefetch a sliced parameter from different Parameter Server instances
|
||||
according to the input `Ids`, and then run forward, backward and send
|
||||
the gradient to Parameter Server to execute the optimize program.
|
||||
We propose an approach to prefetch parameter from Parameter
|
||||
Server while distributed training so that Fluid would training
|
||||
a model including the large parameter which could not be stored in one
|
||||
trainer's memory.
|
||||
|
||||
## Background
|
||||
|
||||
For an embedding layer, the trainable parameter may be very large and could
|
||||
not be stored in one trainer's memory. In Fluid distributed training,
|
||||
[Distributed Transpiler](./parameter_server.md#distributed-transpiler) would split every parameter into a number of small
|
||||
parameters and stored in Parameter Server, so we could prefetch the parameter
|
||||
from the specified Parameter Server according to the input `Ids`.
|
||||
|
||||
## Design
|
||||
|
||||
**NOTE**: this approach is a feature of Fluid distributed trianing, maybe you want
|
||||
This is a feature of Fluid distributed training, maybe you want
|
||||
to know [Distributed Architecture](./distributed_architecture.md) and
|
||||
[Parameter Server](./parameter_server.md) before reading the following content.
|
||||
|
||||
Fluid large model distributed training use
|
||||
[Distributed Transpiler](./parameter_server.md#distributed-transpiler) to split
|
||||
a large parameter into multiple parameters which stored on Parameter Server, and
|
||||
the Trainer would prefetch them by `RPC` interface.
|
||||
|
||||
### Split Large Parameter
|
||||
### Partationed Parameter
|
||||
|
||||
<img src="src/split_parameter.png" width="400" />
|
||||
|
||||
**Distributed Transpiler** would split the large parameter
|
||||
(weight) into some sliced parameters (weight_0, weight_1, weight_2) as the
|
||||
- **Distributed Transpiler** would split the large parameter
|
||||
(weight) into some partitioned parameters (weight_0, weight_1, weight_2) as the
|
||||
figure above.
|
||||
- We could use `round-robin` to distribute the partitioned parameter.
|
||||
|
||||
### Prefetch Parameters from Parameter Servers
|
||||
### Prefetching Parameter
|
||||
|
||||
<img src="src/prefetch_parameters.png" width="400" />
|
||||
|
||||
- `PrefetchRpc` operator would send the rows index the multiple Parameter Servers,
|
||||
and then receive the SelctedRows.
|
||||
- The different with normal Fluid distributed training, we only prefetch the rows
|
||||
- `prefetch_rpc` operator would prefetch the parameter from different Parameter
|
||||
Server according with the input `Ids`, we use [SelectedRows](../../../design/selected_rows.md)
|
||||
as the received variable type.
|
||||
- `merge_selected_rows` operator would merge the received parameters into one
|
||||
`SelectedRows` variable.
|
||||
|
||||
## TODO
|
||||
|
||||
- Async Update
|
||||
|
||||
To avoid slow-node, Async update is important for distributed training,
|
||||
we need an design doc and implement it in future.
|
||||
- `prefetch_rpc` operator to send rows index and receive SelectedRows variables.
|
||||
- `lookup_table` need to support `SelectedRows` variable type as input `Weight`.
|
||||
- Async Update, To avoid slow-node, Async update is important for distributed training,
|
||||
we need a design doc and implement it in future.
|
||||
|
Binary file not shown.
Binary file not shown.
Before Width: | Height: | Size: 68 KiB After Width: | Height: | Size: 77 KiB |
Loading…
Reference in new issue