Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into poolmaxpool_with_mask
After Width: | Height: | Size: 57 KiB |
@ -0,0 +1,23 @@
|
||||
# Executor Design Doc
|
||||
|
||||
## Motivation
|
||||
|
||||
We use executor to do the runtime evaluation of a `ProgramDesc`.
|
||||
|
||||
## Overview
|
||||
|
||||
An executor takes a `ProgramDesc`, a `block_id` and a `Scope`. The `ProgramDesc` is a list of blocks and each block contains the protobuf definition of all the parameters and operators. The `block_id` specifies the entrance block. And the `Scope` is the container of all the variable instance, which is persistent throughout different runs.
|
||||
|
||||
### What does executor do?
|
||||
|
||||
It evaluates all the operators in the `block_id`th block of a `ProgramDesc`.
|
||||
|
||||
### What does executor NOT do?
|
||||
|
||||
It does not do runtime optimization, meaning intelligently parse the dependency of each op a choose which one to be run and in which order they should be run.
|
||||
|
||||
It does not do graph partitioning, meaning dividing the `ProgramDesc` into several small pieces and executing them on different devices.
|
||||
|
||||
## Implementation
|
||||
|
||||
`Executor` evaluates a `ProgramDesc`. Essentially, it instantiates Variables and Operators, then run all the operators in sequence. [[code]](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/executor.cc)
|
Before Width: | Height: | Size: 58 KiB After Width: | Height: | Size: 56 KiB |
Before Width: | Height: | Size: 50 KiB After Width: | Height: | Size: 49 KiB |
Before Width: | Height: | Size: 32 KiB After Width: | Height: | Size: 30 KiB |
@ -0,0 +1,78 @@
|
||||
# Design Doc: InferVarType
|
||||
|
||||
## The Problem Posed
|
||||
|
||||
The variable in our design can hold variant types. Such as `LoDTensor` and `SelectedRows`. An operator should be able to inference the variable types of its output.
|
||||
|
||||
For example, a `lookup table` operator takes two `LoDTensor`; one is a float tensor as the embedding table, the other is an int tensor as word ID. The gradient operator of `lookup table` will generate a `SelectedRows` as its output. A `sum` operator can take both `LoDTensor` and `SelectedRows` as its inputs and will generate a `LoDTensor` if any of its inputs is `LoDTensor`, otherwise, the `sum` operator will generate `SelectedRows` as its output.
|
||||
|
||||
The variable type will be constant at runtime. Every variable's type can either be set by the user (input data and parameter) or be inferred by the operator in compile time.
|
||||
|
||||
## Proposed Solution
|
||||
|
||||
The `InferVarType` is a compile-time function which is registered to each operator. The inferface of that function is:
|
||||
|
||||
|
||||
```c++
|
||||
using InferVarTypeFN = std::function<
|
||||
void (const OpDescBind& /*op_desc*/, BlockDescBind* /*block*/)>;
|
||||
```
|
||||
|
||||
It takes an operator description as its input and will write the output variable type and store them in block description.
|
||||
|
||||
The `InferVarTypeFN` will be registered in `OpInfo`, to replace `infer_var_type_` field. The `OpInfo` should be
|
||||
|
||||
```cpp
|
||||
struct OpInfo {
|
||||
InferVarTypeFN infer_var_type_;
|
||||
...
|
||||
};
|
||||
```
|
||||
|
||||
The default `InferVarType` will set output type as `LoDTensor`. It can be done by `GetInferVarType()`.
|
||||
|
||||
```cpp
|
||||
void DefaultInferVarType(const OpDescBind& op_desc, BlockDescBind* block) {
|
||||
// set the output type of variable as `LoDTensor`.
|
||||
// ...
|
||||
}
|
||||
|
||||
struct OpInfo {
|
||||
InferVarTypeFN infer_var_type_;
|
||||
InferVarTypeFN GetInferVarType() const {
|
||||
if (infer_var_type_) {
|
||||
return infer_var_type_;
|
||||
} else {
|
||||
return DefaultInferVarType;
|
||||
}
|
||||
}
|
||||
};
|
||||
```
|
||||
|
||||
## Register InferVarType
|
||||
|
||||
We provide a thin base class for registering an `InferVarTypeFN`. To use a base class will ease the implementation of registry since we can detect the registry entry is an `InferVarTypeFN` or not.
|
||||
|
||||
```cpp
|
||||
class VarTypeInferer {
|
||||
public:
|
||||
virtual void operator()(const OpDescBind& op_desc, BlockDescBind* block) const = 0;
|
||||
}
|
||||
```
|
||||
|
||||
Operator developers can write the specialize `VarTypeInferer` as follow.
|
||||
|
||||
```cpp
|
||||
class SpecialVarTypeInferer : public VarTypeInferer {
|
||||
public:
|
||||
virtual void operator()(const OpDescBind& op_desc, BlockDescBind* block) const {
|
||||
// .. own logic
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Then user can register the `InferVarType` just like `GradOpDescMaker` and `OpInfoMaker`.
|
||||
|
||||
```
|
||||
REGISTER_OPERATOR(some_op, OpType, SpecialVarTypeInferer, ...);
|
||||
```
|
@ -0,0 +1,105 @@
|
||||
## Optimizer Design
|
||||
|
||||
### The Problem
|
||||
|
||||
A PaddlePaddle program, or a block, is a sequence of operators operating variables. A training program needs to do three kinds of works:
|
||||
|
||||
1. the forward pass, which computes intermediate results and the cost(s),
|
||||
1. the backward pass, which derives gradients from intermediate results and costs, and
|
||||
1. the optimization pass, which update model parameters to optimize the cost(s).
|
||||
|
||||
These works rely on three kinds of operators:
|
||||
|
||||
1. forward operators,
|
||||
1. gradient operators, and
|
||||
1. optimization operators.
|
||||
|
||||
It's true that users should be able to create all these operators manually by calling some low-level API, but it would be much more convenient if they could only describe the forward pass and let PaddlePaddle create the backward and optimization operators automatically.
|
||||
|
||||
In this design, we propose a high-level API that automatically derives the optimisation pass and operators from the forward pass.
|
||||
|
||||
|
||||
### High-level Python API to describe the training process
|
||||
|
||||
1. User write code to describe the network:
|
||||
|
||||
```python
|
||||
images = layer.data("images")
|
||||
labels = layer.data("labels")
|
||||
w1 = pd.var("w1")
|
||||
b1 = pd.var("b1")
|
||||
hidden = layer.fc(images, w=w1, b=b1)
|
||||
cost = layer.mse(hidden, labels)
|
||||
```
|
||||
|
||||
The above code snippet will create forward operators in [Block](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/block.md).
|
||||
|
||||
|
||||
2. Users create a certain kind of Optimizer with some argument.
|
||||
|
||||
```python
|
||||
optimizer = AdagradOptimizer(learing_rate=0.001)
|
||||
```
|
||||
|
||||
3. Users use the optimizer to `minimize` a certain `cost` through updating parameters in parameter_list.
|
||||
|
||||
```python
|
||||
opt_op_list = optimizer.minimize(cost, parameter_list=[w1, b1])
|
||||
```
|
||||
The above code snippet will create gradient and optimization operators in Block. The return value of `minimize()` is list of optimization operators that will be run by session.
|
||||
|
||||
4. Users use Session/Executor to run this opt_op_list as target to do training.
|
||||
|
||||
```python
|
||||
sess.run(target= opt_op_list, ...)
|
||||
```
|
||||
|
||||
#### Optimizer Python interface:
|
||||
|
||||
```python
|
||||
class Optimizer(object):
|
||||
"""Optimizer Base class.
|
||||
|
||||
"""
|
||||
|
||||
def __init__(self):
|
||||
pass
|
||||
|
||||
def create_backward_pass(self, loss, parameter_list=None):
|
||||
"""
|
||||
create and add gradient Operators in BlockDesc to Compute gradients of `loss`
|
||||
for parameters in parameter_list
|
||||
|
||||
Args:
|
||||
loss: an variable generated by cost function.
|
||||
parameter_list: parameters that need to compute gradient and update to optimize the lost.
|
||||
|
||||
Returns:
|
||||
list of (parameters, gradients) pair.
|
||||
"""
|
||||
return None
|
||||
|
||||
def create_optimization_pass(self, parameters_and_grads):
|
||||
"""Add optimization operators to update gradients to variables.
|
||||
|
||||
Args:
|
||||
parameters_and_grads: a list of (variable, gradient) pair to update.
|
||||
|
||||
Returns:
|
||||
optmization_op_list: a list of optimization operator that will update parameter using gradient.
|
||||
"""
|
||||
return None
|
||||
|
||||
def minimize(self, loss, parameter_list):
|
||||
"""Add operations to minimize `loss` by updating `parameter_list`.
|
||||
|
||||
This method combines interface `create_backward_pass()` and
|
||||
`create_optimization_pass()` into one.
|
||||
"""
|
||||
params_grads = self.create_backward_pass(loss, parameter_list)
|
||||
update_ops = self.create_optimization_pass(params_grads)
|
||||
return update_ops
|
||||
|
||||
```
|
||||
|
||||
Users can inherit the Optimizer above to create their own Optimizer with some special logic, such as AdagradOptimizer.
|