commit
2fc012c54c
@ -0,0 +1,99 @@
|
||||
# Design Doc: Functions, Operators, and Layers
|
||||
|
||||
In a DL system, we can compose one or more fine grained operators into a coarse grained one. For example, the FC layer can be composed of a multiplication operator and an add operator.
|
||||
|
||||
Historically, some fine grained operations are known as operators, and some coarse level ones are known as layers. But we need a well-defined separation.
|
||||
|
||||
In general, operators are those very fine grained operations, e.g., mul and add. In the implementation, we can write them as C++ functions:
|
||||
|
||||
```c++
|
||||
template <typename T> T add(T x, T y) { return x + y; }
|
||||
template <typename T> T mul(T x, T y) { return x * y; }
|
||||
```
|
||||
|
||||
Then we can wrap them into operators which are C++ classes and can be created from Python bindings by name. A C macro can do this. For example, the following macro invocation
|
||||
|
||||
```c++
|
||||
#define MAKE_FUNCTION_OPERATOR(mul);
|
||||
```
|
||||
|
||||
generates
|
||||
|
||||
```c++
|
||||
template <typename T> class mulOp : public OperatorBase {...};
|
||||
REGISTER_OP(mulOp<float32>, "mul");
|
||||
```
|
||||
|
||||
so that in Python we can create operator mul by:
|
||||
|
||||
```python
|
||||
X1 = Var()
|
||||
X2 = Var()
|
||||
Y = Var()
|
||||
paddle.cpp.create_operator("mul", input=[X1, X2], output=Y)
|
||||
```
|
||||
|
||||
Also, at the same time, we can compose a coarse level C++ operator class by composing functions `mul` and `add`:
|
||||
|
||||
```c++
|
||||
template <typename T>
|
||||
class FCOp : public OperatorBase {
|
||||
public:
|
||||
void Run(...) {
|
||||
add(mul(Input<T>("X"), Input<T>("W")), Input<T>("b");
|
||||
}
|
||||
};
|
||||
REGISTER_OP(FCOp, "fc");
|
||||
```
|
||||
|
||||
We need to support such composition in Python as well. To do so, we need a higher level Python wrapping of operator creation than `paddle.cpp.create_operator`. This higher level operator API should be compatible with the layer API.
|
||||
|
||||
Let's explain using an example. Suppose that we are going to compose the FC using mul and add in Python, we'd like to have Python functions `mul` and `add` defined in module `operator`:
|
||||
|
||||
```python
|
||||
def operator.mul(X1, X2):
|
||||
O = Var()
|
||||
paddle.cpp.create_operator("mul", input={X1, Y1], output=O)
|
||||
return O
|
||||
|
||||
def operator.add(X1, X2):
|
||||
O = Var()
|
||||
paddle.cpp.create_operator("add", input={X1, X2], output=O)
|
||||
return O
|
||||
```
|
||||
|
||||
Above code snippets are automatically generated. Given them, users can define
|
||||
|
||||
```python
|
||||
def layer.fc(X):
|
||||
W = Var()
|
||||
b = Var()
|
||||
return operator.add(operator.mul(X, W), b)
|
||||
```
|
||||
|
||||
If we don't have `operator.mul` and `operator.add`, the definiton of `layer.fc` would be complicated:
|
||||
|
||||
```python
|
||||
def layer.fc(X):
|
||||
W = Var()
|
||||
b = Var()
|
||||
O1 = Var()
|
||||
paddle.cpp.create_operator("mul", input=[X, W], output=O1)
|
||||
O2 = Var()
|
||||
paddle.cpp.create_operator("add", input=[O1, b], output=O2)
|
||||
return O2
|
||||
```
|
||||
|
||||
We'd like to have Python bindings to operators in package `paddle.operator`, and Python compositions of operators in package `paddle.layer`. So we have the following concepts in above illustrative example:
|
||||
|
||||
```
|
||||
| C++ functions/functors | mul | add | | |
|
||||
| C++ operator class | mulOp | addOp | FCOp | |
|
||||
| Python binding | operator.mul | operator.add | operator.fc | |
|
||||
| Python function | | | | layer.fc |
|
||||
```
|
||||
|
||||
This is how we differentiate layer and operators in PaddlePaddle:
|
||||
|
||||
- those defined in C++ and have a lightweighted Python wrapper in module `operators` are operators; whereas
|
||||
- those who don't have C++ implementations but a Python implementation that compose C++ operators are known as layers.
|
@ -0,0 +1,59 @@
|
||||
IfOp should have only one branch. An IfOp operator takes a `cond` variable whose value must be a vector of N boolean elements. Its return value has M (M<=N) instances, each corresponds to a true element in `cond`.
|
||||
|
||||
```python
|
||||
import paddle as pd
|
||||
|
||||
x = var()
|
||||
y = var()
|
||||
cond = var()
|
||||
|
||||
b = pd.create_ifop(inputs=[x], output_num=1)
|
||||
with b.true_block():
|
||||
x = b.inputs(0)
|
||||
z = operator.add(x, y)
|
||||
b.set_output(0, operator.softmax(z))
|
||||
|
||||
out = b(cond)
|
||||
```
|
||||
|
||||
If we want the output still has N instances, we can use IfElseOp with a default value, whose minibatch size must be N:
|
||||
|
||||
```python
|
||||
import paddle as pd
|
||||
|
||||
x = var()
|
||||
y = var()
|
||||
cond = var()
|
||||
default_value = var()
|
||||
b = pd.create_ifelseop(inputs=[x], output_num=1)
|
||||
with b.true_block():
|
||||
x = b.inputs(0)
|
||||
z = operator.add(x, y)
|
||||
b.set_output(0, operator.softmax(z))
|
||||
|
||||
with b.false_block():
|
||||
x = b.inputs(0)
|
||||
z = layer.fc(x)
|
||||
b.set_output(0, operator.softmax(z))
|
||||
|
||||
out = b(cond)
|
||||
```
|
||||
|
||||
If only true_block is set in an IfElseOp, we can have a default value for false as:
|
||||
```python
|
||||
import paddle as pd
|
||||
|
||||
x = var()
|
||||
y = var()
|
||||
cond = var()
|
||||
default_value = var()
|
||||
b = pd.create_ifelseop(inputs=[x], output_num=1, default_value)
|
||||
|
||||
with b.true_block():
|
||||
x = b.inputs(0)
|
||||
z = operator.add(x, y)
|
||||
b.set_output(0, operator.softmax(z))
|
||||
|
||||
out = b(cond)
|
||||
```
|
||||
where default_value is a list of vars for `cond` == False.
|
@ -0,0 +1,122 @@
|
||||
# Design Doc: LoD (Level-of-Detail) Tensor
|
||||
|
||||
PaddlePaddle's RNN doesn't require that all instances have the same length. To do so, we introduce an extension to Tensor, namely, LoD Tensor.
|
||||
|
||||
## Challenge of Variable-length Inputs
|
||||
|
||||
People usually represent a mini-batch by a Tensor. For example, a mini-batch of 32 images, each of size 32x32, is a 10x32x32 Tensor. So a transformation, T, of all images can be a matrix multiplication of the 32x32xO-dimensional tensor T and the 10x32x32 Tensor.
|
||||
|
||||
Another example is that each mini-batch contains 32 sentences, where each word is a D-dimensional one-hot vector. If all sentences have the same length L, we can represent this mini-batch by a 32xLxD tensor. However, in most cases, sentences have variable lengths, and we will need an index data structure to record these variable lengths.
|
||||
|
||||
## LoD as a Solution
|
||||
|
||||
### Mini-Batch of variable-length sentenses
|
||||
|
||||
Let's imagine a mini-batch of 3 variable lengths sentences, containing 3, 1, and 2 words respectively. We can represent it by a (3+1+2)xD tensor plus some index information:
|
||||
|
||||
```
|
||||
3
|
||||
3 1 2
|
||||
||| | ||
|
||||
```
|
||||
|
||||
Each `|` represents a D-dimensional word vectors. The number 3 on top indicate 3 sentences, and numbers 3, 1, and 2 on the second level represent the number of words in each sentence.
|
||||
|
||||
### Mini-Batch of variable-length videos
|
||||
|
||||
This approach generalizes to the case where elements are not words, but higher dimensional objects, like images. Suppose that a mini-batch contains videos of the same frame size 640x480. If a mini-batch contains 3 videos of 3, 1, and 2 frames respectively. The underlying tensor is of size (3+1+2)x640x480. The index information illustrates as:
|
||||
|
||||
```
|
||||
3
|
||||
3 1 2
|
||||
口口口 口 口口
|
||||
```
|
||||
|
||||
where each `口` represents an image.
|
||||
|
||||
### Mini-Batch of fixed-size images
|
||||
|
||||
Let's get back to a typical example, image classification, where each mini-batch has M fixed-sized images. The LoD Tensor representation is
|
||||
|
||||
```
|
||||
M
|
||||
1 1 1 1 1
|
||||
口口口口 ... 口
|
||||
```
|
||||
|
||||
The many 1's on the second level seem duplicated. For this particular case of 2 levels and the second level always have length 1, we can ignore the LoD index.
|
||||
|
||||
### Design and summarization
|
||||
|
||||
In summary, as long as that the essential elements (words or images) have the same size, we can represent mini-batches by a LoD Tensor:
|
||||
|
||||
- The underlying tensor has size LxD1xD2x..., where D1xD2... is the size of the essential elements, and
|
||||
- the first dimension size L has an additon property -- a LoD index as a nested vector:
|
||||
|
||||
```c++
|
||||
typedef std::vector<std::vector> > LoD;
|
||||
```
|
||||
|
||||
- The LoD index can is not necessary when there are only two levels and all elements of the second level have length 1.
|
||||
|
||||
## Slicing of LoD Tensor
|
||||
|
||||
Consider that we have a network with three levels of RNN: the top level one handles articles, the second level one handles sentences, and the basic level one handles words. This network requires that mini-batches represented by 4 level LoD Tensor, for example,
|
||||
|
||||
```
|
||||
3
|
||||
3 1 2
|
||||
3 2 4 1 2 3
|
||||
||| || |||| | || |||
|
||||
```
|
||||
|
||||
To allow each level of RNN to handle its input, we define **the slicing of a LoD Tensor is defined as getting the j-th sequence on level i, or the <i,j>-slice**
|
||||
|
||||
For example, the <2,1>-slice of above slice is
|
||||
|
||||
```
|
||||
2
|
||||
||
|
||||
```
|
||||
|
||||
and the <1,2>-slice of above example is
|
||||
|
||||
```
|
||||
2
|
||||
2 3
|
||||
|| |||
|
||||
```
|
||||
|
||||
Let's go on slicing this slice. Its <1,1>-slice is
|
||||
|
||||
```
|
||||
3
|
||||
|||
|
||||
```
|
||||
|
||||
### The General Slicing Algorithm
|
||||
|
||||
The algorithm, with over-simplified data structure, is defined as
|
||||
|
||||
```c++
|
||||
typedef vector<vector<int> > LoD;
|
||||
|
||||
struct LoDTensor {
|
||||
LoD lod_;
|
||||
float* tensor_;
|
||||
};
|
||||
|
||||
LoDTensor Slice(const LoDTensor& lodt, int level, int sequence) {
|
||||
|
||||
}
|
||||
```
|
||||
|
||||
### Slicing the Top Level
|
||||
|
||||
Please be aware that an RNN operator only slices the top level of a LoD Tensor to get the step inputs.
|
||||
|
||||
```c++
|
||||
LoDTensor Slice(const LoDTensor& lodt, int sequence) {
|
||||
|
||||
}
|
||||
```
|
Some files were not shown because too many files have changed in this diff Show More
Loading…
Reference in new issue