diff --git a/doc/fluid/design/dist_train/large_model.md b/doc/fluid/design/dist_train/large_model.md index f82fa6f81e..9689582130 100644 --- a/doc/fluid/design/dist_train/large_model.md +++ b/doc/fluid/design/dist_train/large_model.md @@ -11,6 +11,10 @@ the gradient to Parameter Server to execute the optimize program. ## Design +**NOTE**: this approach is a feature of Fluid distributed trianing, maybe you want +to know [Distributed Architecture](./distributed_architecture.md) and +[Parameter Server](./parameter_server.md) before reading the following content. + Fluid large model distributed training use [Distributed Transpiler](./parameter_server.md#distributed-transpiler) to split a large parameter into multiple parameters which stored on Parameter Server, and