Paddle/doc/design/api.md

# PaddlePaddle Design Doc

## Ingredients

As our design principle is starting from the essence: how could we
allow users to express and solve their problems as neural networks.
Some essential concepts that our API have to provide include:

1. A *topology* is an expression of *layers*.

1. A layer could be any kind of computation, including *cost*.

1. Some layers have parameters, some don't. Most costs don't have
   parameters.

1. In some topologies, layers share parameters.  For
   example,
   [the network for training a ranking model](https://github.com/PaddlePaddle/Paddle/issues/1311#issuecomment-279121850).

1. At programming time, users specify topologies and possible sharing
   of parameters.  PaddlePaddle can figure out and create parameters
   required (and possibly shared) by one or more topologies.


## Starting from Examples

As a summarization
of
[our disucssion](https://github.com/PaddlePaddle/Paddle/issues/1315),
let us present two examples here:


### Example 1. Sharing Parameters between Layers

We use
the
[3-branch ranking](https://github.com/PaddlePaddle/Paddle/issues/1311#issuecomment-279121850) model
in this example.  For your convenience, I copy-a-paste the model's
topology as follows:

```
A -> f -\
Q -> f --> cost
B -> f -/
```

The following program trains the topology including the cost, and then
use the sub-network in the trained topology in inference:

```python
def f(in):
    e = paddle.layer.embedding(in, parameter_name="embedding")
    o = paddle.layer.softmax(e, parameter_name="semantic")
    return o

# Create 3 topologies (subnets), they share parameters because all
# correspoinding layers have the same parameter names.
fA = f(paddle.layer.data(input_name="A"))
fB = f(paddle.layer.data(input_name="B"))
fQ = f(paddle.layer.data(input_name="Q"))

topology = paddle.layer.less_than(
               paddle.layer.cross_entropy(fA, fQ),
               paddle.layer.corss_entropy(fB, fQ))

# Derive parameters required in topology and create them in model.
parameters = paddle.parameters.create(topology)

# Estimate parameters used in topology from data.
paddle.train(topology, parameters, reader=read_ranking_model_data)

# Inference using fA (or fB or fC, as they share their parameters).
[testA, testB, testQ] = read_ranking_model_data()
print "The sematic-vector of testA: ", paddle.infer(fA, parameters, testA)
```


### Example 2. Sharing Parameters between "Models"

We use [GAN](https://github.com/PaddlePaddle/book/tree/develop/gan) in
this example.  In the following example program, `d0` and `d1`
correspond to the two networks in the following figure:

<img src="https://github.com/wangyang59/book/raw/00036f4b0da5225041a6824587c1a01cf20159b1/gan/image/gan_ig.png" width=400 />

```python
def G(in):
    # over-simplified example as G has only one layers:
    return paddle.layer.fc(in, parameter_name="G")

def D(in);
    # again, over-simplified:
    return paddle.layer.fc(in, parameter_name="D")

# Construct the first topology, which contains both D and G.
# By learning this topology, we update parameters of G.
d0 = paddle.layer.should_be_false(D(G(paddle.layer.data())))

# Construct a second topology d1, which contains only D. By
# training this topology, we update parameters of D.  Note
# that d1 share parameters with d0.
d1 = paddle.layer.should_be_true(D(paddle.layer.data()))

# Create parameters from a list of multiple topologies (models) for
# the chance to share parameters between these topologies.
parameters = paddle.parameters.create([d0, d1])

# Iterative training of GAN.
for ...:
    train(d0, parameters, reader=read_from_rng, immutable_parameters={"D"})
    train(d1, parameters, reader=read_from_realistic_images)

# Use d1 for inference:
print "D thinks a batch of images are realistic ", infer(d1, parameters, read_mnist_images)
```


### Summarization


Above two programs reveal some important design concerns:

1. Users describe a topology as an expression of layers.  Every layer
   has a *parameter name*.  If the users don't specify it explicitly, it's automatically generated as a unique name.  By
   specifying the parameter name, users can specify the sharing of
   parameters between layers and even between topologies.

1. `paddle.parameters.create` figures out parameters required by one
   or more topologies from parameter names of layers.  It creates these
   parameters and returns a `ParameterSet` object, which is in essence
   a map from *parameter names* to *parameters*.

1. At training and inference time, `paddle.train` and `paddle.infer`
   requires both a topology and the parameter set that holds the parameters of that topology.  There are some reasons:

   1. This prevents users from forgetting to call
      `paddle.parameters.create`.
   1. `paddle.train` needs to know which parameter set to update.
   1. Users could load another (pre-trained) parameter set and use it
      with a topology in `train.infer`.

1. By specifying the `immutable_parameters` parameter of
   `paddle.train`, we can forbid the update of these parameters.


## Reader

Not all programming frameworks allow users to define I/O functions.
An example is Google MapReduce, which can only read from text,
SSTable, and RecordIO files.  Hadoop MapReduce allows users to define
readers and writers by deriving from base classes `Reader` and
`Writer`.  The former is less flexible but also less error-prone.  We
decide to provide the flexibility to users to define their readers.


There are some open questions here:

1. **Should a reader return a Python dictionary?**

1. **How to map multiple outputs from a reader to multiple data layers?**

1. **How to easily compose some existing readers to read more data and
   feed a topology with more data layers?**


## Training

The recommended way to training a model is to call `paddle.train`,
which simply calls `paddle.trainer.Default`, a global variable of
type `paddle.trainer.SGD`.  Equivalently, we can do

```python
opt = paddle.trainer.SGD(..., paddle.updater.Adam(...))
opt.train(topology, parameters, reader=read, ...)
```

### Updater

Please be aware that a trainer can accept an updater as its data
member, where an updater is a class derived from
`paddle.trainer.Updater`.  This is to make it easier to customize
trainers, as discussed
[here](https://github.com/PaddlePaddle/Paddle/issues/1319).

### Event Handler

`paddle.train` and `paddle.trainer.XXX.train` take an optional
parameter `event_handler`, which should be either `None` or a function
that handle some events:

1. BeginTraining
1. EndTraining
1. BeginIteration
1. EndIteration
1. BeginPass
1. EndPass

where EndPass is sent if and only if the reader yields
`end_pass=True`.

An example as follows:

```python
def event_handler(event):
    if ininstance(event, paddle.event.EndIteration):
        print paddle.test(...)

paddle.train(topology, parameters, reader, event_handler)
```

If we are writing a PaddlePaddle program in and for iPython/Jypyter,
we can use metaplotlib in the event handler to plot a curve of
cost/error versus iterations, as shown
[here](https://blog.dominodatalab.com/interactive-dashboards-in-jupyter/).

### Distributed Training

If users want to do distributed training on a cluster, s/he should
call `paddle.dist_train` and provides access tokens to the cluster as
a parameter.

For example, if the user has a TLS certificate that allows him to
access a Kubernetes cluster, s/he should be able to call

```python
paddle.dist_train(model,
                  trainer=paddle.trainer.SGD(...,
                                             paddle.updater.Adam(...)),
                  reader=read,
                  k8s_user="yi",
                  k8s_token="kube_cluster_tls.pem",
                  k8s_job="hello",
                  num_parameter_servers=15)
```

The pseudo code of `paddle.dist_train` is as follows:

```python
def dist_train(topology, parameters, trainer, reader, ...):
    if os.getenv("KUBERNETES_SERVICE_HOST") == None:
        image_name = k8s_user + '/' + k8s_job
        docker_build(image_name)
        docker_push()
        kube_ctrl_start_job(image_name, k8s_user, k8s_token)
    else:
        rank = kube_list_containers_in_job_and_return_current_containers_rank()
        if rank == 0:
            master()
        elif rank < 15:
            parameter_server()
        else:
            trainer.train(model, reader=read)
```

Please be aware that if a process is running on the Kubernetes
cluster, it will have some environment variables pre-defined.

If `dist_train` doesn't see these environment variables, it knows
that it's running on users' personal computer, and it should work as a
*launcher*.  Otherwise, it knows that it's running on the cluster and
need to figure out its role as either the master, or a trainer, or a
parameter server.
Correct minor problems 8 years ago			`# PaddlePaddle Design Doc`
Update api.md 8 years ago
			`## Ingredients`

Update according to discussions in https://github.com/PaddlePaddle/Paddle/issues/1315 8 years ago			`As our design principle is starting from the essence: how could we`
Fix a few typos in docs 8 years ago			`allow users to express and solve their problems as neural networks.`
Update according to discussions in https://github.com/PaddlePaddle/Paddle/issues/1315 8 years ago			`Some essential concepts that our API have to provide include:`
Update api.md 8 years ago
Update according to discussions in https://github.com/PaddlePaddle/Paddle/issues/1315 8 years ago			`1. A topology is an expression of layers.`
Update api.md 8 years ago
Update according to discussions in https://github.com/PaddlePaddle/Paddle/issues/1315 8 years ago			`1. A layer could be any kind of computation, including cost.`
Update api.md 8 years ago
Update according to discussions in https://github.com/PaddlePaddle/Paddle/issues/1315 8 years ago			`1. Some layers have parameters, some don't. Most costs don't have`
			`parameters.`
Update api.md 8 years ago
Update according to discussions in https://github.com/PaddlePaddle/Paddle/issues/1315 8 years ago			`1. In some topologies, layers share parameters. For`
			`example,`
			`[the network for training a ranking model](https://github.com/PaddlePaddle/Paddle/issues/1311#issuecomment-279121850).`
Put all layers and costs in package paddle.layer 8 years ago
Update according to discussions in https://github.com/PaddlePaddle/Paddle/issues/1315 8 years ago			`1. At programming time, users specify topologies and possible sharing`
			`of parameters. PaddlePaddle can figure out and create parameters`
			`required (and possibly shared) by one or more topologies.`
Put all layers and costs in package paddle.layer 8 years ago
Update according to discussions in https://github.com/PaddlePaddle/Paddle/issues/1315 8 years ago
			`## Starting from Examples`
Update api.md 8 years ago
Update according to discussions in https://github.com/PaddlePaddle/Paddle/issues/1315 8 years ago			`As a summarization`
			`of`
			`[our disucssion](https://github.com/PaddlePaddle/Paddle/issues/1315),`
			`let us present two examples here:`
Update api.md 8 years ago

Update according to discussions in https://github.com/PaddlePaddle/Paddle/issues/1315 8 years ago			`### Example 1. Sharing Parameters between Layers`
Update api.md 8 years ago
Update according to discussions in https://github.com/PaddlePaddle/Paddle/issues/1315 8 years ago			`We use`
			`the`
			`[3-branch ranking](https://github.com/PaddlePaddle/Paddle/issues/1311#issuecomment-279121850) model`
			`in this example. For your convenience, I copy-a-paste the model's`
			`topology as follows:`
Update api.md 8 years ago
			```
Update according to discussions in https://github.com/PaddlePaddle/Paddle/issues/1315 8 years ago			`A -> f -\`
			`Q -> f --> cost`
			`B -> f -/`
Update api.md 8 years ago			```

Update according to discussions in https://github.com/PaddlePaddle/Paddle/issues/1315 8 years ago			`The following program trains the topology including the cost, and then`
			`use the sub-network in the trained topology in inference:`
Update api.md 8 years ago
			```python
Update according to discussions in https://github.com/PaddlePaddle/Paddle/issues/1315 8 years ago			`def f(in):`
			`e = paddle.layer.embedding(in, parameter_name="embedding")`
			`o = paddle.layer.softmax(e, parameter_name="semantic")`
			`return o`

			`# Create 3 topologies (subnets), they share parameters because all`
			`# correspoinding layers have the same parameter names.`
			`fA = f(paddle.layer.data(input_name="A"))`
			`fB = f(paddle.layer.data(input_name="B"))`
			`fQ = f(paddle.layer.data(input_name="Q"))`

Put all layers and costs in package paddle.layer 8 years ago			`topology = paddle.layer.less_than(`
			`paddle.layer.cross_entropy(fA, fQ),`
			`paddle.layer.corss_entropy(fB, fQ))`
Update according to discussions in https://github.com/PaddlePaddle/Paddle/issues/1315 8 years ago
			`# Derive parameters required in topology and create them in model.`
			`parameters = paddle.parameters.create(topology)`

			`# Estimate parameters used in topology from data.`
			`paddle.train(topology, parameters, reader=read_ranking_model_data)`

			`# Inference using fA (or fB or fC, as they share their parameters).`
			`[testA, testB, testQ] = read_ranking_model_data()`
			`print "The sematic-vector of testA: ", paddle.infer(fA, parameters, testA)`
Update api.md 8 years ago			```


Update according to discussions in https://github.com/PaddlePaddle/Paddle/issues/1315 8 years ago			`### Example 2. Sharing Parameters between "Models"`
Update api.md 8 years ago
Update according to discussions in https://github.com/PaddlePaddle/Paddle/issues/1315 8 years ago			`We use [GAN](https://github.com/PaddlePaddle/book/tree/develop/gan) in`
			this example. In the following example program, `d0` and `d1`
			`correspond to the two networks in the following figure:`
Add Programming section 8 years ago
Update according to discussions in https://github.com/PaddlePaddle/Paddle/issues/1315 8 years ago			`<img src="https://github.com/wangyang59/book/raw/00036f4b0da5225041a6824587c1a01cf20159b1/gan/image/gan_ig.png" width=400 />`
Add Programming section 8 years ago
			```python
Update according to discussions in https://github.com/PaddlePaddle/Paddle/issues/1315 8 years ago			`def G(in):`
			`# over-simplified example as G has only one layers:`
Put all layers and costs in package paddle.layer 8 years ago			`return paddle.layer.fc(in, parameter_name="G")`
Update according to discussions in https://github.com/PaddlePaddle/Paddle/issues/1315 8 years ago
			`def D(in);`
			`# again, over-simplified:`
			`return paddle.layer.fc(in, parameter_name="D")`

			`# Construct the first topology, which contains both D and G.`
			`# By learning this topology, we update parameters of G.`
Put all layers and costs in package paddle.layer 8 years ago			`d0 = paddle.layer.should_be_false(D(G(paddle.layer.data())))`
Update according to discussions in https://github.com/PaddlePaddle/Paddle/issues/1315 8 years ago
			`# Construct a second topology d1, which contains only D. By`
Put all layers and costs in package paddle.layer 8 years ago			`# training this topology, we update parameters of D. Note`
Update according to discussions in https://github.com/PaddlePaddle/Paddle/issues/1315 8 years ago			`# that d1 share parameters with d0.`
Put all layers and costs in package paddle.layer 8 years ago			`d1 = paddle.layer.should_be_true(D(paddle.layer.data()))`
Update according to discussions in https://github.com/PaddlePaddle/Paddle/issues/1315 8 years ago
			`# Create parameters from a list of multiple topologies (models) for`
			`# the chance to share parameters between these topologies.`
			`parameters = paddle.parameters.create([d0, d1])`

			`# Iterative training of GAN.`
			`for ...:`
			`train(d0, parameters, reader=read_from_rng, immutable_parameters={"D"})`
			`train(d1, parameters, reader=read_from_realistic_images)`

			`# Use d1 for inference:`
			`print "D thinks a batch of images are realistic ", infer(d1, parameters, read_mnist_images)`
Add Programming section 8 years ago			```


Update according to discussions in https://github.com/PaddlePaddle/Paddle/issues/1315 8 years ago			`### Summarization`
Add Programming section 8 years ago

Update according to discussions in https://github.com/PaddlePaddle/Paddle/issues/1315 8 years ago			`Above two programs reveal some important design concerns:`
Add Programming section 8 years ago
Update according to discussions in https://github.com/PaddlePaddle/Paddle/issues/1315 8 years ago			`1. Users describe a topology as an expression of layers. Every layer`
			`has a parameter name. If the users don't specify it explicitly, it's automatically generated as a unique name. By`
			`specifying the parameter name, users can specify the sharing of`
			`parameters between layers and even between topologies.`
Add Programming section 8 years ago
Update according to discussions in https://github.com/PaddlePaddle/Paddle/issues/1315 8 years ago			1. `paddle.parameters.create` figures out parameters required by one
			`or more topologies from parameter names of layers. It creates these`
			parameters and returns a `ParameterSet` object, which is in essence
			`a map from parameter names to parameters.`
Add Programming section 8 years ago
Update according to discussions in https://github.com/PaddlePaddle/Paddle/issues/1315 8 years ago			1. At training and inference time, `paddle.train` and `paddle.infer`
			`requires both a topology and the parameter set that holds the parameters of that topology. There are some reasons:`
Put all layers and costs in package paddle.layer 8 years ago
Update according to discussions in https://github.com/PaddlePaddle/Paddle/issues/1315 8 years ago			`1. This prevents users from forgetting to call`
			`paddle.parameters.create`.
			1. `paddle.train` needs to know which parameter set to update.
			`1. Users could load another (pre-trained) parameter set and use it`
			with a topology in `train.infer`.
Put all layers and costs in package paddle.layer 8 years ago
Update according to discussions in https://github.com/PaddlePaddle/Paddle/issues/1315 8 years ago			1. By specifying the `immutable_parameters` parameter of
			`paddle.train`, we can forbid the update of these parameters.
Put all layers and costs in package paddle.layer 8 years ago
Update according to discussions in https://github.com/PaddlePaddle/Paddle/issues/1315 8 years ago
			`## Reader`
Add Programming section 8 years ago
			`Not all programming frameworks allow users to define I/O functions.`
			`An example is Google MapReduce, which can only read from text,`
			`SSTable, and RecordIO files. Hadoop MapReduce allows users to define`
			readers and writers by deriving from base classes `Reader` and
			`Writer`. The former is less flexible but also less error-prone. We
			`decide to provide the flexibility to users to define their readers.`


Update according to discussions in https://github.com/PaddlePaddle/Paddle/issues/1315 8 years ago			`There are some open questions here:`
Add Programming section 8 years ago
Update according to discussions in https://github.com/PaddlePaddle/Paddle/issues/1315 8 years ago			`1. Should a reader return a Python dictionary?`
Add Programming section 8 years ago
Update according to discussions in https://github.com/PaddlePaddle/Paddle/issues/1315 8 years ago			`1. How to map multiple outputs from a reader to multiple data layers?`
Add Programming section 8 years ago
Update according to discussions in https://github.com/PaddlePaddle/Paddle/issues/1315 8 years ago			`1. **How to easily compose some existing readers to read more data and`
			`feed a topology with more data layers?**`
Add Programming section 8 years ago

Add Event Handler 8 years ago			`## Training`
Add Programming section 8 years ago
			The recommended way to training a model is to call `paddle.train`,
addrss https://github.com/PaddlePaddle/Paddle/issues/1319 as well 8 years ago			which simply calls `paddle.trainer.Default`, a global variable of
			type `paddle.trainer.SGD`. Equivalently, we can do
Add Programming section 8 years ago
			```python
addrss https://github.com/PaddlePaddle/Paddle/issues/1319 as well 8 years ago			`opt = paddle.trainer.SGD(..., paddle.updater.Adam(...))`
Add Event Handler 8 years ago			`opt.train(topology, parameters, reader=read, ...)`
Add Programming section 8 years ago			```

Add Event Handler 8 years ago			`### Updater`

Add metaplotlib for iPython 8 years ago			`Please be aware that a trainer can accept an updater as its data`
			`member, where an updater is a class derived from`
			`paddle.trainer.Updater`. This is to make it easier to customize
			`trainers, as discussed`
			`[here](https://github.com/PaddlePaddle/Paddle/issues/1319).`
addrss https://github.com/PaddlePaddle/Paddle/issues/1319 as well 8 years ago
Add Event Handler 8 years ago			`### Event Handler`

			`paddle.train` and `paddle.trainer.XXX.train` take an optional
			parameter `event_handler`, which should be either `None` or a function
			`that handle some events:`

			`1. BeginTraining`
			`1. EndTraining`
Add metaplotlib for iPython 8 years ago			`1. BeginIteration`
			`1. EndIteration`
Add Event Handler 8 years ago			`1. BeginPass`
			`1. EndPass`

			`where EndPass is sent if and only if the reader yields`
			`end_pass=True`.

			`An example as follows:`

			```python
			`def event_handler(event):`
Add metaplotlib for iPython 8 years ago			`if ininstance(event, paddle.event.EndIteration):`
Add Event Handler 8 years ago			`print paddle.test(...)`

			`paddle.train(topology, parameters, reader, event_handler)`
			```
addrss https://github.com/PaddlePaddle/Paddle/issues/1319 as well 8 years ago
Add metaplotlib for iPython 8 years ago			`If we are writing a PaddlePaddle program in and for iPython/Jypyter,`
			`we can use metaplotlib in the event handler to plot a curve of`
			`cost/error versus iterations, as shown`
			`[here](https://blog.dominodatalab.com/interactive-dashboards-in-jupyter/).`

Add Event Handler 8 years ago			`### Distributed Training`
Add Programming section 8 years ago
			`If users want to do distributed training on a cluster, s/he should`
			call `paddle.dist_train` and provides access tokens to the cluster as
			`a parameter.`

			`For example, if the user has a TLS certificate that allows him to`
			`access a Kubernetes cluster, s/he should be able to call`

			```python
			`paddle.dist_train(model,`
Put all layers and costs in package paddle.layer 8 years ago			`trainer=paddle.trainer.SGD(...,`
addrss https://github.com/PaddlePaddle/Paddle/issues/1319 as well 8 years ago			`paddle.updater.Adam(...)),`
Add Programming section 8 years ago			`reader=read,`
			`k8s_user="yi",`
			`k8s_token="kube_cluster_tls.pem",`
			`k8s_job="hello",`
			`num_parameter_servers=15)`
			```

Fix a few typos in docs 8 years ago			The pseudo code of `paddle.dist_train` is as follows:
Add Programming section 8 years ago
			```python
addrss https://github.com/PaddlePaddle/Paddle/issues/1319 as well 8 years ago			`def dist_train(topology, parameters, trainer, reader, ...):`
Add Programming section 8 years ago			`if os.getenv("KUBERNETES_SERVICE_HOST") == None:`
			`image_name = k8s_user + '/' + k8s_job`
			`docker_build(image_name)`
			`docker_push()`
			`kube_ctrl_start_job(image_name, k8s_user, k8s_token)`
			`else:`
			`rank = kube_list_containers_in_job_and_return_current_containers_rank()`
			`if rank == 0:`
			`master()`
			`elif rank < 15:`
			`parameter_server()`
			`else:`
addrss https://github.com/PaddlePaddle/Paddle/issues/1319 as well 8 years ago			`trainer.train(model, reader=read)`
Add Programming section 8 years ago			```

			`Please be aware that if a process is running on the Kubernetes`
			`cluster, it will have some environment variables pre-defined.`

Update according to discussions in https://github.com/PaddlePaddle/Paddle/issues/1315 8 years ago			If `dist_train` doesn't see these environment variables, it knows
Add Programming section 8 years ago			`that it's running on users' personal computer, and it should work as a`
			`launcher. Otherwise, it knows that it's running on the cluster and`
			`need to figure out its role as either the master, or a trainer, or a`
			`parameter server.`