Update api.md

9 years ago · dd229dc740
parent 36036c0ea5
commit dd229dc740
1 changed files with 90 additions and 38 deletions
--- a/doc/design/api.md
+++ b/doc/design/api.md
@ -1,38 +1,90 @@
-import yi_json
+# Design Doc: PaddlePaddle API
-
+
-g = 100
+## Ingredients
-def read():
+
-    queue q;
+As the first step of our design, we list important concepts in deep
-    # warmup q
+learning and try to figure their relationship, as shown below:
-    for i = 0 : 1000
+
-        q.push(read())
+```
-    yield q.shuffle_get()
+Model = {topology, parameters}
-
+
-input = paddle.layer.data(...)
+Evaluator = {Model*, activations}
-intermediate = paddle.layers.fc(input)
+- forward
-output = paddle.layer.softmax(intermediate)
+- test
-
+
-model = paddle.model.create(output)
+GradientMachine = {Model*, gradients}
-
+- backward
-train(model, data_provider=read, cluster="clusterId")
+
-
+Optimizer = {Model*, Evaluator*, GradientMachine*}
-#--------------------------------------------------------------------------------
+- train
-
+- update
-# 1. package, docker build, docker push
+- checkpoint
-# 2. kubectl, clusterId Kuberentes job, 10 trainer containers, 5 parameter server containers
+```
-
+
-#--------------------------------------------------------------------------------
+where the pair of curly braces `{` and `}` indicate *composition*, `*`
-
+indicates a *reference*, and `-` marks a "class method".
-def train():
+
-    if os.environ["kube_api_server"] == nil:
+
-        docker_build()
+### Model
-        docker_push()
+
-        kube_ctrl_start_job()
+We used to think that parameters are part of the toplogy (or layers).
-    else:
+But that is not true, because multiple layers could share the same
-        rank = kube_mpi_rank()
+parameter matrix.  An example is a network that compares two text
-        if rank == 0:
+segments in a semantic space:
-            master()
+
-        elif rank >= 15:
+```
-            parameter_server()
+          semantic
-        else:
+text A -> projection ---\
-            _train()
+          layer A        \
                          cosine
                          similarity -> output
                          layer
          semantic       /
 text B -> projection ---/
          layer B
 ```
 In this network, the two semantic projection layers (A and B) share
 the same parameter matrix.
 For more information about our API that specifies topology and
 parameter sharing, please refer to [TODO: API].
 ### Evaluator
 Supposed that we have a trained ranking model, we should be able to
 use it in our search engine.  The search engine's Web server is a
 concurrent program so to serve many HTTP requests simultaneously.  It
 doens't make sense for each of these threads to have its own copy of
 model, because that would duplicate topologies and parameters.
 However, each thread should be able to record layer outputs, i.e.,
 activations, computed from an input, derived from the request.  With
 *Evaluator* that saves activations, we can write the over-simplified
 server program as:
 ```python
 m = paddle.model.load("trained.model")
 http.handle("/",
            lambda req:
                e = paddle.evaluator.create(m)
                e.forward(req)
 				e.activation(layer="output")) # returns activations of layer "output"
 ```
 ### GradientMachine
 Similar to the evaluation, the training needs to compute gradients so
 to update model parameters.  Because an [optimizer](#optimizer) might
 run multiple simultaneous threads to update the same model, gradients
 should be separated from the model.  Because gradients are only used
 in training, but not serving, they should be separate from Evaluator.
 Hence the `GradientMachine`.
 ### Optimizer
 None of Model, Evaluator, nor GradientMachine implements the training
 loop, hence Optimizer.  We can define a concurrent optimizer that runs
 multiple simultaneious threads to train a model -- just let each
 thread has its own GradientMachine object.