Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into dev_add_axis
	
		
	
				
					
				
			
						commit
						86655cb963
					
				@ -0,0 +1,51 @@
 | 
				
			||||
# Design Doc: Computations as Graphs
 | 
				
			||||
 | 
				
			||||
A primary goal of the refactorization of PaddlePaddle is a more flexible representation of deep learning computation, in particular, a graph of operators and variables, instead of sequences of layers as before.
 | 
				
			||||
 | 
				
			||||
This document explains that the construction of a graph as three steps:
 | 
				
			||||
 | 
				
			||||
- construct the forward part
 | 
				
			||||
- construct the backward part
 | 
				
			||||
- construct the optimization part
 | 
				
			||||
 | 
				
			||||
Let us take the problem of image classification as a simple example.  The application program that trains the model looks like:
 | 
				
			||||
 | 
				
			||||
```python
 | 
				
			||||
x = layer.data("images")
 | 
				
			||||
l = layer.data("label")
 | 
				
			||||
y = layer.fc(x)
 | 
				
			||||
cost = layer.mse(y, l)
 | 
				
			||||
optimize(cost)
 | 
				
			||||
train(cost, reader=mnist.train())
 | 
				
			||||
```
 | 
				
			||||
 | 
				
			||||
### Forward Part
 | 
				
			||||
 | 
				
			||||
The first four lines of above program build the forward part of the graph.
 | 
				
			||||
 | 
				
			||||

 | 
				
			||||
 | 
				
			||||
In particular, the first line `x = layer.data("images")` creates variable x and a Feed operator that copies a column from the minibatch to x.  `y = layer.fc(x)` creates not only the FC operator and output variable y, but also two parameters, W and b.
 | 
				
			||||
 | 
				
			||||
In this example, all operators are created as `OpDesc` protobuf messages, and all variables are `VarDesc`.  These protobuf messages are saved in a `BlockDesc` protobuf message.
 | 
				
			||||
 | 
				
			||||
### Backward Part
 | 
				
			||||
 | 
				
			||||
The fifth line `optimize(cost)` calls two functions, `ConstructBackwardGraph` and `ConstructOptimizationGraph`.
 | 
				
			||||
 | 
				
			||||
`ConstructBackwardGraph` traverses the forward graph in the `BlockDesc` protobuf message and builds the backward part.
 | 
				
			||||
 | 
				
			||||

 | 
				
			||||
 | 
				
			||||
According to the chain rule of gradient computation, `ConstructBackwardGraph` would
 | 
				
			||||
 | 
				
			||||
1. create a gradient operator G for each operator F,
 | 
				
			||||
1. make all inputs, outputs, and outputs' gradient of F as inputs of G,
 | 
				
			||||
1. create gradients for all inputs of F, except for those who don't have gradients, like x and l, and
 | 
				
			||||
1. make all these gradients as outputs of G.
 | 
				
			||||
 | 
				
			||||
### Optimization Part
 | 
				
			||||
 | 
				
			||||
For each parameter, like W and b created by `layer.fc`, marked as double circles in above graphs, `ConstructOptimizationGraph` creates an optimization operator to apply its gradient.  Here results in the complete graph:
 | 
				
			||||
 | 
				
			||||

 | 
				
			||||
@ -0,0 +1,59 @@
 | 
				
			||||
IfOp should have only one branch. An IfOp operator takes a `cond` variable whose value must be a vector of N boolean elements. Its return value has M (M<=N) instances, each corresponds to a true element in `cond`.
 | 
				
			||||
 | 
				
			||||
```python
 | 
				
			||||
import paddle as pd
 | 
				
			||||
 | 
				
			||||
x = var()
 | 
				
			||||
y = var()
 | 
				
			||||
cond = var()
 | 
				
			||||
 | 
				
			||||
b = pd.create_ifop(inputs=[x], output_num=1)
 | 
				
			||||
with b.true_block():
 | 
				
			||||
    x = b.inputs(0)
 | 
				
			||||
    z = operator.add(x, y)
 | 
				
			||||
    b.set_output(0, operator.softmax(z))
 | 
				
			||||
 | 
				
			||||
out = b(cond)
 | 
				
			||||
```
 | 
				
			||||
 | 
				
			||||
If we want the output still has N instances, we can use IfElseOp with a default value, whose minibatch size must be N:
 | 
				
			||||
 | 
				
			||||
```python
 | 
				
			||||
import paddle as pd
 | 
				
			||||
 | 
				
			||||
x = var()
 | 
				
			||||
y = var()
 | 
				
			||||
cond = var()
 | 
				
			||||
default_value = var()
 | 
				
			||||
b = pd.create_ifelseop(inputs=[x], output_num=1)
 | 
				
			||||
with b.true_block():
 | 
				
			||||
    x = b.inputs(0)
 | 
				
			||||
    z = operator.add(x, y)
 | 
				
			||||
    b.set_output(0, operator.softmax(z))
 | 
				
			||||
 | 
				
			||||
with b.false_block():
 | 
				
			||||
    x = b.inputs(0)
 | 
				
			||||
    z = layer.fc(x)
 | 
				
			||||
    b.set_output(0, operator.softmax(z))
 | 
				
			||||
 | 
				
			||||
out = b(cond)
 | 
				
			||||
```
 | 
				
			||||
 | 
				
			||||
If only true_block is set in an IfElseOp, we can have a default value for false as:
 | 
				
			||||
```python
 | 
				
			||||
import paddle as pd
 | 
				
			||||
 | 
				
			||||
x = var()
 | 
				
			||||
y = var()
 | 
				
			||||
cond = var()
 | 
				
			||||
default_value = var()
 | 
				
			||||
b = pd.create_ifelseop(inputs=[x], output_num=1, default_value)
 | 
				
			||||
 | 
				
			||||
with b.true_block():
 | 
				
			||||
    x = b.inputs(0)
 | 
				
			||||
    z = operator.add(x, y)
 | 
				
			||||
    b.set_output(0, operator.softmax(z))
 | 
				
			||||
 | 
				
			||||
out = b(cond)
 | 
				
			||||
```
 | 
				
			||||
where default_value is a list of vars for `cond` == False.
 | 
				
			||||
@ -0,0 +1,11 @@
 | 
				
			||||
cat ./graph_construction_example.dot | \
 | 
				
			||||
    sed 's/color=red/color=red, style=invis/g' | \
 | 
				
			||||
    sed 's/color=green/color=green, style=invis/g' | \
 | 
				
			||||
    dot -Tpng > graph_construction_example_forward_only.png
 | 
				
			||||
 | 
				
			||||
cat ./graph_construction_example.dot | \
 | 
				
			||||
    sed 's/color=green/color=green, style=invis/g' | \
 | 
				
			||||
    dot -Tpng > graph_construction_example_forward_backward.png
 | 
				
			||||
 | 
				
			||||
cat ./graph_construction_example.dot | \
 | 
				
			||||
    dot -Tpng > graph_construction_example_all.png
 | 
				
			||||
@ -0,0 +1,65 @@
 | 
				
			||||
digraph ImageClassificationGraph {
 | 
				
			||||
        ///////// The forward part /////////
 | 
				
			||||
        FeedX [label="Feed", color=blue, shape=box];
 | 
				
			||||
        FeedY [label="Feed", color=blue, shape=box];
 | 
				
			||||
        FC [label="FC", color=blue, shape=box];
 | 
				
			||||
        MSE [label="MSE", color=blue, shape=box];
 | 
				
			||||
 | 
				
			||||
        x [label="x", color=blue, shape=oval];
 | 
				
			||||
        l [label="l", color=blue, shape=oval];
 | 
				
			||||
        y [label="y", color=blue, shape=oval];
 | 
				
			||||
        W [label="W", color=blue, shape=doublecircle];
 | 
				
			||||
        b [label="b", color=blue, shape=doublecircle];
 | 
				
			||||
        cost [label="cost", color=blue, shape=oval];
 | 
				
			||||
 | 
				
			||||
        FeedX -> x -> FC -> y -> MSE -> cost [color=blue];
 | 
				
			||||
        FeedY -> l [color=blue];
 | 
				
			||||
        W -> FC [color=blue];
 | 
				
			||||
        b -> FC [color=blue];
 | 
				
			||||
        l -> MSE [color=blue];
 | 
				
			||||
 | 
				
			||||
        ////////// The backward part /////////
 | 
				
			||||
        MSE_Grad [label="MSE_grad", color=red, shape=box];
 | 
				
			||||
        FC_Grad [label="FC_grad", color=red, shape=box];
 | 
				
			||||
 | 
				
			||||
        d_cost [label="d cost", color=red, shape=oval];
 | 
				
			||||
        d_y [label="d y", color=red, shape=oval];
 | 
				
			||||
        d_b [label="d b", color=red, shape=oval];
 | 
				
			||||
        d_W [label="d W", color=red, shape=oval];
 | 
				
			||||
 | 
				
			||||
        cost -> MSE_Grad [color=red];
 | 
				
			||||
        d_cost -> MSE_Grad [color=red];
 | 
				
			||||
        x -> MSE_Grad [color=red];
 | 
				
			||||
        l -> MSE_Grad [color=red];
 | 
				
			||||
        y -> MSE_Grad -> d_y [color=red];
 | 
				
			||||
 | 
				
			||||
        x -> FC_Grad [color=red];
 | 
				
			||||
        y -> FC_Grad [color=red];
 | 
				
			||||
        d_y -> FC_Grad [color=red];
 | 
				
			||||
        W -> FC_Grad -> d_W [color=red];
 | 
				
			||||
        b -> FC_Grad -> d_b [color=red];
 | 
				
			||||
 | 
				
			||||
        ////////// The optimizaiton part //////////
 | 
				
			||||
 | 
				
			||||
        OPT_W [label="SGD", color=green, shape=box];
 | 
				
			||||
        OPT_b [label="SGD", color=green, shape=box];
 | 
				
			||||
 | 
				
			||||
        W -> OPT_W [color=green];
 | 
				
			||||
        b -> OPT_b [color=green];
 | 
				
			||||
        d_W -> OPT_W -> W [color=green];
 | 
				
			||||
        d_b -> OPT_b -> b [color=green];
 | 
				
			||||
 | 
				
			||||
        ////////// Groupings //////////
 | 
				
			||||
 | 
				
			||||
        subgraph clusterMSE {
 | 
				
			||||
                style=invis;
 | 
				
			||||
                MSE;
 | 
				
			||||
                MSE_Grad;
 | 
				
			||||
        }
 | 
				
			||||
 | 
				
			||||
        subgraph clusterFC {
 | 
				
			||||
                style=invis;
 | 
				
			||||
                FC;
 | 
				
			||||
                FC_Grad;
 | 
				
			||||
        }
 | 
				
			||||
}
 | 
				
			||||
| 
		 After Width: | Height: | Size: 54 KiB  | 
| 
		 After Width: | Height: | Size: 46 KiB  | 
| 
		 After Width: | Height: | Size: 28 KiB  | 
Some files were not shown because too many files have changed in this diff Show More
					Loading…
					
					
				
		Reference in new issue