From 345499b0feeff2a09394529fe74f5a705bbdc5e2 Mon Sep 17 00:00:00 2001 From: Yu Yang Date: Tue, 12 Sep 2017 19:17:48 -0700 Subject: [PATCH 01/13] Add Skeleton of Python API design --- doc/design/python_api.md | 104 +++++++++++++++++++++++++++++++++++++++ 1 file changed, 104 insertions(+) create mode 100644 doc/design/python_api.md diff --git a/doc/design/python_api.md b/doc/design/python_api.md new file mode 100644 index 0000000000..395341d37e --- /dev/null +++ b/doc/design/python_api.md @@ -0,0 +1,104 @@ +# Design Doc: Python API + + + +The top level user API in Python should be as same as API in `paddle.v2` after refactoring Paddle from a layer based framework to an operator based framework. There are many new classes in CPP in [compile time] for describing neural networks, such as `Variable`, `Operator`, `Block`. The issue about current design is how to give a proper way to wrap the C++ API to `paddle.v2` API and writing layers in Python. + + + +This implementation of Python API includes two steps. + +1. Implement the Python API using current C++ runtime concepts. +2. Replace the implementation by using compile-time concepts when they are completed. + +... + + +## Python Class about compile-time concepts + + +| Python Class | Compile-time protobuf | +| --- | --- | +| Block | BlockDesc | +| Operator | OpDesc | +| Variable | VarDesc | + + +### Block + + + +```python +class Block(objects): + def __init__(self, parent=None): + self.vars_ = map() + self.ops_ = vector() + if parent is None: + self.global_vars = map() + self.parent=None + else: + self.parent = parent + self.global_vars = None + + def create_global_vars(...): + if self.parent is not None: + return self.parent.create_global_vars(...) + else: + return self.global_vars.new() +``` + + +### Operator + + + +```python +class Operator(object): + def __init__(self, type, inputs, outputs, attrs): + # create OpDesc in Python + op_desc = ... + self.cpp_op_desc_ptr = cpp.to_cpp_op_desc(op_desc) + cpp.infer_shapes(self.cpp_op_desc_ptr, inputs, outputs) + outputs.op = self + + def type(self): + return self.cpp_op_desc_ptr.type() +``` + +### Variable + + + +```python +class Variable(object): + def __init__(self, shape, dtype="float32", name=None, block=None): + if name is None: + if prefix is not None: + name = unique_name_generator(prefix) + else: + name = unique_name_generator("unknown") + self.name = name + self.block = block + self.cpp_var_desc_ptr = ... + self.op = None + + def shape(self): + cpp_shape = self.cpp_var_desc_ptr.shape() + return [None if elem < 0 else elem for elem in cpp_shape] +``` + +### Parameter + + + + + +```python +class Parameter(Variable): + def __init__(self, trainable, initialize_attrs, optimize_attrs): + pass +``` + +## Layer Functions + + From 4b623c1e1178a2ee7dc27243cee2fbae6ea9eea4 Mon Sep 17 00:00:00 2001 From: Yu Yang Date: Tue, 12 Sep 2017 20:08:42 -0700 Subject: [PATCH 02/13] Update design --- doc/design/python_api.md | 12 +++++------- 1 file changed, 5 insertions(+), 7 deletions(-) diff --git a/doc/design/python_api.md b/doc/design/python_api.md index 395341d37e..e314d1049c 100644 --- a/doc/design/python_api.md +++ b/doc/design/python_api.md @@ -1,22 +1,20 @@ # Design Doc: Python API - - The top level user API in Python should be as same as API in `paddle.v2` after refactoring Paddle from a layer based framework to an operator based framework. There are many new classes in CPP in [compile time] for describing neural networks, such as `Variable`, `Operator`, `Block`. The issue about current design is how to give a proper way to wrap the C++ API to `paddle.v2` API and writing layers in Python. - - This implementation of Python API includes two steps. 1. Implement the Python API using current C++ runtime concepts. 2. Replace the implementation by using compile-time concepts when they are completed. -... +The implementation of the first step is a temporary implementation. We should design our Python API concepts based on `compile-time` concepts. We just use `runtime` classes to implement it for now. + + +## Python Class and compile-time protobuf +As we design our Python API concepts based on `compile-time`, we try to map our Python classes to every compile-time result, i.e., the protobuf messages. They are: -## Python Class about compile-time concepts - | Python Class | Compile-time protobuf | | --- | --- | | Block | BlockDesc | From ea22f838290f2d44887b4e9da2b8b5f73669fd63 Mon Sep 17 00:00:00 2001 From: Yu Yang Date: Tue, 12 Sep 2017 20:41:21 -0700 Subject: [PATCH 03/13] Update design --- doc/design/python_api.md | 42 +++++++++++++++++++++++++++------------- 1 file changed, 29 insertions(+), 13 deletions(-) diff --git a/doc/design/python_api.md b/doc/design/python_api.md index e314d1049c..02bfcde04e 100644 --- a/doc/design/python_api.md +++ b/doc/design/python_api.md @@ -12,7 +12,7 @@ The implementation of the first step is a temporary implementation. We should de ## Python Class and compile-time protobuf -As we design our Python API concepts based on `compile-time`, we try to map our Python classes to every compile-time result, i.e., the protobuf messages. They are: +Since we design our Python API concepts based on `compile-time`, we try to map our Python classes to every compile-time result, i.e., the protobuf messages. They are: | Python Class | Compile-time protobuf | @@ -24,27 +24,43 @@ As we design our Python API concepts based on `compile-time`, we try to map our ### Block - +Block is just like programming languages `{}`, which contains many operators and variables. There are two data fields in `Block`. 1) An associate map, whose key is variable name and value is variable itself; 2) A list of operators. + +The block is hierarchical because PaddlePaddle supports RNN and IfElse. For example, RNN is like `for-loop` in programming languages. There is new `block` inside a `for-loop`. To represent hierarchies, `Block` stores the `parent Block` inside. If `parent=None`, the `Block` is the outermost block, i.e., the `global` block. + ```python class Block(objects): def __init__(self, parent=None): - self.vars_ = map() - self.ops_ = vector() - if parent is None: - self.global_vars = map() - self.parent=None - else: - self.parent = parent - self.global_vars = None + self.vars = map() + self.ops = vector() + self.parent = parent + + def create_var(self, ...): + # create variable in `self.vars` + return Variable(...) - def create_global_vars(...): + + def create_global_var(self, ...): if self.parent is not None: - return self.parent.create_global_vars(...) + return self.parent.create_global_var(...) else: - return self.global_vars.new() + return self.create_var(...) + + def create_parameter(self, ...): + return self.create_global_var(...) + + def append_operator(self, ...): + self.ops.append(...) + + def prepend_operator(self, ...): + self.ops.prepend(...) ``` +Users are able to create a global variable inside any block since they many create parameters inside a RNN or IfElseOp. All parameters should be stored in the global block, not the step block in RNN. + +Users can create local variables for outputs of operators. Users can also append and prepend an operator in current block. Prepending `random initialize` operator or `load` operator is very useful to initialize parameters before training. + ### Operator From c33a9bddfdd86a273a342c1be857423d0e331476 Mon Sep 17 00:00:00 2001 From: Yu Yang Date: Tue, 12 Sep 2017 20:52:22 -0700 Subject: [PATCH 04/13] Update doc --- doc/design/python_api.md | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/doc/design/python_api.md b/doc/design/python_api.md index 02bfcde04e..0a036d1613 100644 --- a/doc/design/python_api.md +++ b/doc/design/python_api.md @@ -1,6 +1,6 @@ # Design Doc: Python API -The top level user API in Python should be as same as API in `paddle.v2` after refactoring Paddle from a layer based framework to an operator based framework. There are many new classes in CPP in [compile time] for describing neural networks, such as `Variable`, `Operator`, `Block`. The issue about current design is how to give a proper way to wrap the C++ API to `paddle.v2` API and writing layers in Python. +The top level user API in Python should be as same as API in `paddle.v2` after refactoring Paddle from a layer based framework to an operator based framework. There are many new classes in C++ in [compile time] for describing neural networks, such as `Variable`, `Operator`, `Block`. The issue about current design is how to give a proper way to wrap the C++ API to `paddle.v2` API and writing layers in Python. This implementation of Python API includes two steps. @@ -64,21 +64,22 @@ Users can create local variables for outputs of operators. Users can also append ### Operator - +Operator class will take inputs, outputs and attributes of the operator into `protobuf` OpDesc and create a C++ `OpDesc` instance. The `infer_shape` perform on C++ objects. ```python class Operator(object): def __init__(self, type, inputs, outputs, attrs): # create OpDesc in Python op_desc = ... - self.cpp_op_desc_ptr = cpp.to_cpp_op_desc(op_desc) - cpp.infer_shapes(self.cpp_op_desc_ptr, inputs, outputs) - outputs.op = self + self.cpp_op_desc_ptr = core.OpDesc(op_desc) + cpp.infer_shape(self.cpp_op_desc_ptr, inputs, outputs) def type(self): return self.cpp_op_desc_ptr.type() ``` +After creating a C++ `OpDesc`, `Operator` in Python can only reads the attribute from C++ side. + ### Variable From 5f2bb1fdcb8435c39d8e74e10dc54083d7cb8f55 Mon Sep 17 00:00:00 2001 From: Yu Yang Date: Tue, 12 Sep 2017 20:55:26 -0700 Subject: [PATCH 05/13] Tab to space --- doc/design/python_api.md | 67 ++++++++++++++++++++-------------------- 1 file changed, 33 insertions(+), 34 deletions(-) diff --git a/doc/design/python_api.md b/doc/design/python_api.md index 0a036d1613..08255db6e9 100644 --- a/doc/design/python_api.md +++ b/doc/design/python_api.md @@ -31,30 +31,30 @@ The block is hierarchical because PaddlePaddle supports RNN and IfElse. For exam ```python class Block(objects): - def __init__(self, parent=None): - self.vars = map() - self.ops = vector() - self.parent = parent - - def create_var(self, ...): - # create variable in `self.vars` - return Variable(...) - - - def create_global_var(self, ...): - if self.parent is not None: - return self.parent.create_global_var(...) - else: - return self.create_var(...) - - def create_parameter(self, ...): - return self.create_global_var(...) - - def append_operator(self, ...): - self.ops.append(...) - - def prepend_operator(self, ...): - self.ops.prepend(...) + def __init__(self, parent=None): + self.vars = map() + self.ops = vector() + self.parent = parent + + def create_var(self, ...): + # create variable in `self.vars` + return Variable(...) + + + def create_global_var(self, ...): + if self.parent is not None: + return self.parent.create_global_var(...) + else: + return self.create_var(...) + + def create_parameter(self, ...): + return self.create_global_var(...) + + def append_operator(self, ...): + self.ops.append(...) + + def prepend_operator(self, ...): + self.ops.prepend(...) ``` Users are able to create a global variable inside any block since they many create parameters inside a RNN or IfElseOp. All parameters should be stored in the global block, not the step block in RNN. @@ -68,14 +68,14 @@ Operator class will take inputs, outputs and attributes of the operator into `pr ```python class Operator(object): - def __init__(self, type, inputs, outputs, attrs): - # create OpDesc in Python - op_desc = ... - self.cpp_op_desc_ptr = core.OpDesc(op_desc) - cpp.infer_shape(self.cpp_op_desc_ptr, inputs, outputs) - - def type(self): - return self.cpp_op_desc_ptr.type() + def __init__(self, type, inputs, outputs, attrs): + # create OpDesc in Python + op_desc = ... + self.cpp_op_desc_ptr = core.OpDesc(op_desc) + cpp.infer_shape(self.cpp_op_desc_ptr, inputs, outputs) + + def type(self): + return self.cpp_op_desc_ptr.type() ``` After creating a C++ `OpDesc`, `Operator` in Python can only reads the attribute from C++ side. @@ -108,8 +108,7 @@ class Variable(object): -```python -class Parameter(Variable): +```pythonVclass Parameter(Variable): def __init__(self, trainable, initialize_attrs, optimize_attrs): pass ``` From 889f5a41cacf155e89122df4f1722700ceb26464 Mon Sep 17 00:00:00 2001 From: Yu Yang Date: Tue, 12 Sep 2017 20:56:44 -0700 Subject: [PATCH 06/13] Update --- doc/design/python_api.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/doc/design/python_api.md b/doc/design/python_api.md index 08255db6e9..05875eeb21 100644 --- a/doc/design/python_api.md +++ b/doc/design/python_api.md @@ -108,7 +108,8 @@ class Variable(object): -```pythonVclass Parameter(Variable): +```python +class Parameter(Variable): def __init__(self, trainable, initialize_attrs, optimize_attrs): pass ``` From 6bf1283cbebba1b795896c6dea4eb7f5a407e159 Mon Sep 17 00:00:00 2001 From: fengjiayi Date: Wed, 13 Sep 2017 13:48:39 -0700 Subject: [PATCH 07/13] Add doc for `Variable` --- doc/design/python_api.md | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/doc/design/python_api.md b/doc/design/python_api.md index 05875eeb21..e515f8594d 100644 --- a/doc/design/python_api.md +++ b/doc/design/python_api.md @@ -82,18 +82,16 @@ After creating a C++ `OpDesc`, `Operator` in Python can only reads the attribute ### Variable - +Operators' intputs, outputs and parameters are all variables. In our design, a variable has four key attributes: its name(`name`), the block it belongs to(`block`), a pointer pointed to its C++ Protobuf object(`cpp_var_desc_ptr`), and the operator it is created by(`op`). All of these attributes are initialized in constructor, except the `op`. The `op` will keep being `None` till the variable is taken as an operator's output. ```python class Variable(object): def __init__(self, shape, dtype="float32", name=None, block=None): if name is None: - if prefix is not None: - name = unique_name_generator(prefix) - else: - name = unique_name_generator("unknown") + name = unique_name_generator() self.name = name self.block = block + # build C++ Protobuf object self.cpp_var_desc_ptr = ... self.op = None @@ -102,6 +100,10 @@ class Variable(object): return [None if elem < 0 else elem for elem in cpp_shape] ``` +Protobuf object should be created in C++ not Python because it is needed by infershape, and infershape is implementated by C++ code. The C++ Protobuf object is accessible for Python through the `cpp_var_desc_ptr` pointer. + +The user is allowed to build an variable without specifying its name. If so, it is going to be assigned with an automatically generated unique name. + ### Parameter From 216b87abd894d09177eca9b4c72c953ba0e5c451 Mon Sep 17 00:00:00 2001 From: fengjiayi Date: Wed, 13 Sep 2017 14:41:14 -0700 Subject: [PATCH 08/13] Add doc for `Parameter` --- doc/design/python_api.md | 13 +++++++------ 1 file changed, 7 insertions(+), 6 deletions(-) diff --git a/doc/design/python_api.md b/doc/design/python_api.md index e515f8594d..4c012ddd19 100644 --- a/doc/design/python_api.md +++ b/doc/design/python_api.md @@ -82,7 +82,7 @@ After creating a C++ `OpDesc`, `Operator` in Python can only reads the attribute ### Variable -Operators' intputs, outputs and parameters are all variables. In our design, a variable has four key attributes: its name(`name`), the block it belongs to(`block`), a pointer pointed to its C++ Protobuf object(`cpp_var_desc_ptr`), and the operator it is created by(`op`). All of these attributes are initialized in constructor, except the `op`. The `op` will keep being `None` till the variable is taken as an operator's output. +Operators' inputs, outputs, and parameters are all variables. In our design, a variable has four key attributes: its name(`name`), the block it belongs to(`block`), a pointer pointed to its C++ Protobuf object(`cpp_var_desc_ptr`), and the operator it is created by(`op`). All of these attributes are initialized in the constructor, except the `op`. The `op` will keep being `None` till the variable is taken as an operator's output. ```python class Variable(object): @@ -100,15 +100,13 @@ class Variable(object): return [None if elem < 0 else elem for elem in cpp_shape] ``` -Protobuf object should be created in C++ not Python because it is needed by infershape, and infershape is implementated by C++ code. The C++ Protobuf object is accessible for Python through the `cpp_var_desc_ptr` pointer. +The Protobuf object should be created in C++ not Python because it is needed by infershape, and infershape is implemented by C++ code. The C++ Protobuf object is accessible for Python through the `cpp_var_desc_ptr`, just like how `shape()` function does. -The user is allowed to build an variable without specifying its name. If so, it is going to be assigned with an automatically generated unique name. +The user is allowed to build a variable without specifying its name. If so, it is going to be assigned with an automatically generated unique name. ### Parameter - - - +The parameter is a kind of special variable. They need to be initialized at the very beginning and updated after each batch training. So if a variable is a parameter, our compiler will add an initializer op and an optimizer op for it during the building process of computation graph. Apart from these, there is no more difference between variable and parameter. In other words, 'parameter' is only a label attached to variables, to tell the compiler these ones require additional processing. ```python class Parameter(Variable): @@ -116,6 +114,9 @@ class Parameter(Variable): pass ``` +The class `Parameter` is derived from class `Variable`. In addition to variables have, parameters are able to hold their initializing and updating information. A parameter's `self.op` will always be `None` because it can never be an operator's output. + + ## Layer Functions From 709a3481482b5b6f40e8df14949bfeb3c311738d Mon Sep 17 00:00:00 2001 From: fengjiayi Date: Wed, 13 Sep 2017 16:07:20 -0700 Subject: [PATCH 09/13] Add doc for `Layer function` --- doc/design/python_api.md | 36 +++++++++++++++++++++++++++++++++++- 1 file changed, 35 insertions(+), 1 deletion(-) diff --git a/doc/design/python_api.md b/doc/design/python_api.md index 4c012ddd19..2ca6dffae8 100644 --- a/doc/design/python_api.md +++ b/doc/design/python_api.md @@ -119,4 +119,38 @@ The class `Parameter` is derived from class `Variable`. In addition to variables ## Layer Functions - +A layer is a Python function. When it is invoked, it creates a serise of operators and variables then inserts them into the block. It is something like the macro in C++. It is called 'Layer' because the combination of added operators acts just like what a neural network layer does. + +Here are examples about how to write a date layer and FC layer: + +### Data Layer + +```python +def data_layer(name, type, block=None): + if block is None: + block = g_block + # type = dense_vector(size=10) / integer_value(range=10) + return block.create_global_var( + name=name, + shape=[None] + type.dims(), + dtype=type.dtype) + +``` + +Before building new variables, we need to specify which block to use. If we don't, the default one `g_block` will be used. In the above `data_layer` code, a variable is created and be inserted into the root block to make it global. This varibale is going to be used as input data of the whole network. + +### FC Layer + +```python +def fc_layer(input, size, block=None, ...): + if block is None: + block = g_block + w = block.create_parameter(...) + b = block.create_parameter(...) + out = stack.create_var() + op = block.append_operator(Operator("FC", X=input, W=w, b=b, Out=out)) + out.op = op + return out +``` + +In the `fc_layer` code, we create two parameters(`w` and `b`), one variable(`out`) and one operator(`FC operator`), then insert all of them into specify block. From 6b222d550bbeeac6d1278ee6b7a4145f7ba8d6d3 Mon Sep 17 00:00:00 2001 From: fengjiayi Date: Wed, 13 Sep 2017 16:10:24 -0700 Subject: [PATCH 10/13] Fix typo --- doc/design/python_api.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/doc/design/python_api.md b/doc/design/python_api.md index 2ca6dffae8..f0a300bed0 100644 --- a/doc/design/python_api.md +++ b/doc/design/python_api.md @@ -119,9 +119,9 @@ The class `Parameter` is derived from class `Variable`. In addition to variables ## Layer Functions -A layer is a Python function. When it is invoked, it creates a serise of operators and variables then inserts them into the block. It is something like the macro in C++. It is called 'Layer' because the combination of added operators acts just like what a neural network layer does. +A layer is a Python function. When it is invoked, it creates a series of operators and variables then inserts them into the block. It is something like the macro in C++. It is called 'Layer' because the combination of added operators acts just like what a neural network layer does. -Here are examples about how to write a date layer and FC layer: +Here are examples of how to write a data layer and FC layer: ### Data Layer @@ -137,7 +137,7 @@ def data_layer(name, type, block=None): ``` -Before building new variables, we need to specify which block to use. If we don't, the default one `g_block` will be used. In the above `data_layer` code, a variable is created and be inserted into the root block to make it global. This varibale is going to be used as input data of the whole network. +Before building new variables, we need to specify which block to use. If we don't, the default one `g_block` will be used. In the above `data_layer` code, a variable is created and be inserted into the root block to make it global. This variable is going to be used as input data of the whole network. ### FC Layer @@ -153,4 +153,4 @@ def fc_layer(input, size, block=None, ...): return out ``` -In the `fc_layer` code, we create two parameters(`w` and `b`), one variable(`out`) and one operator(`FC operator`), then insert all of them into specify block. +In the `fc_layer` code, we create two parameters(`w` and `b`), one variable(`out`) and one operator(`FC operator`), then insert all of them into the specified block. From 3c63b0bc4bf90befcd08fe7620ed0329f79e5fdd Mon Sep 17 00:00:00 2001 From: Yu Yang Date: Wed, 20 Sep 2017 10:25:54 -0700 Subject: [PATCH 11/13] Change typo --- doc/design/python_api.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/doc/design/python_api.md b/doc/design/python_api.md index f0a300bed0..74d4d3239a 100644 --- a/doc/design/python_api.md +++ b/doc/design/python_api.md @@ -1,6 +1,6 @@ # Design Doc: Python API -The top level user API in Python should be as same as API in `paddle.v2` after refactoring Paddle from a layer based framework to an operator based framework. There are many new classes in C++ in [compile time] for describing neural networks, such as `Variable`, `Operator`, `Block`. The issue about current design is how to give a proper way to wrap the C++ API to `paddle.v2` API and writing layers in Python. +The top level user API in Python should be as same as API in `paddle.v2` after refactoring Paddle from a layer based framework to an operator based framework. There are many new classes in C++ in [compile time] for describing neural networks, such as `Variable`, `Operator`, `Block`. The issue about current design is how to give a proper way to wrap the C++ API to `paddle.v2` API and write layers in Python. This implementation of Python API includes two steps. @@ -57,7 +57,7 @@ class Block(objects): self.ops.prepend(...) ``` -Users are able to create a global variable inside any block since they many create parameters inside a RNN or IfElseOp. All parameters should be stored in the global block, not the step block in RNN. +Users are able to create a global variable inside any block since they many create parameters inside a RNN or IfElse. All parameters should be stored in the global block, not the step block in RNN. Users can create local variables for outputs of operators. Users can also append and prepend an operator in current block. Prepending `random initialize` operator or `load` operator is very useful to initialize parameters before training. @@ -147,7 +147,7 @@ def fc_layer(input, size, block=None, ...): block = g_block w = block.create_parameter(...) b = block.create_parameter(...) - out = stack.create_var() + out = block.create_var() op = block.append_operator(Operator("FC", X=input, W=w, b=b, Out=out)) out.op = op return out From 531455651b720e8bd3b8eabf7b461e8d4138c3a9 Mon Sep 17 00:00:00 2001 From: fengjiayi Date: Wed, 20 Sep 2017 17:40:15 -0700 Subject: [PATCH 12/13] Add `Program` to Python API design doc --- doc/design/python_api.md | 56 ++++++++++++++++++++++++++++++---------- 1 file changed, 43 insertions(+), 13 deletions(-) diff --git a/doc/design/python_api.md b/doc/design/python_api.md index 74d4d3239a..3122f5bfd8 100644 --- a/doc/design/python_api.md +++ b/doc/design/python_api.md @@ -17,24 +17,55 @@ Since we design our Python API concepts based on `compile-time`, we try to map o | Python Class | Compile-time protobuf | | --- | --- | +| Program | ProgramDesc | | Block | BlockDesc | | Operator | OpDesc | | Variable | VarDesc | +### Program + +`Program` is the description of the whole training process and there can only be one `Program` object, which is created automatically by the system at the very beginning. `Program` is formed by a series of `Block`. + +```python +class Program(objects): + def __init__(self): + self.blocks = vector() + self.blocks.append(Block(None)) + self.current_block_idx = 0 + + def get_block(block_idx): + return self.blocks[block_idx] + + def current_block(): + return self.get_block(self.current_block_idx) + + def fallback_current_block(): + self.current_block_idx = self.current_block().parent_idx + + + def create_block(): + new_block_idx = len(self.block) + self.blocks.append(Block(parent_idx=self.current_block_idx, + idx=new_block_idx)) + self.current_block_idx = new_block_idx +``` + +`Program` will create the first block in its constructor. The first block is called 'global block'. It is where all parameters are stored. ### Block Block is just like programming languages `{}`, which contains many operators and variables. There are two data fields in `Block`. 1) An associate map, whose key is variable name and value is variable itself; 2) A list of operators. -The block is hierarchical because PaddlePaddle supports RNN and IfElse. For example, RNN is like `for-loop` in programming languages. There is new `block` inside a `for-loop`. To represent hierarchies, `Block` stores the `parent Block` inside. If `parent=None`, the `Block` is the outermost block, i.e., the `global` block. +The block is hierarchical because PaddlePaddle supports RNN and IfElse. For example, RNN is like `for-loop` in programming languages. There is new `block` inside a `for-loop`. To represent hierarchies, `Block` stores the index of `parent Block` inside. The 'index' means the block's position in `Program`'s `blocks`. If `parent_idx=None`, the block itself is the outermost block, i.e., the 'global block'. ```python class Block(objects): - def __init__(self, parent=None): + def __init__(self, parent_idx, idx): self.vars = map() self.ops = vector() - self.parent = parent + self.idx = idx + self.parent_idx = parent_idx def create_var(self, ...): # create variable in `self.vars` @@ -42,8 +73,9 @@ class Block(objects): def create_global_var(self, ...): - if self.parent is not None: - return self.parent.create_global_var(...) + if self.parent_idx is not None: + parent_block = program.get_block(parent_idx) + return parent_block.create_global_var(...) else: return self.create_var(...) @@ -126,9 +158,8 @@ Here are examples of how to write a data layer and FC layer: ### Data Layer ```python -def data_layer(name, type, block=None): - if block is None: - block = g_block +def data_layer(name, type): + block = program.current_block() # type = dense_vector(size=10) / integer_value(range=10) return block.create_global_var( name=name, @@ -137,14 +168,13 @@ def data_layer(name, type, block=None): ``` -Before building new variables, we need to specify which block to use. If we don't, the default one `g_block` will be used. In the above `data_layer` code, a variable is created and be inserted into the root block to make it global. This variable is going to be used as input data of the whole network. +All the new variables and operators will be built in the `current block`. In the above `data_layer` code, a variable is created and be inserted into the root block to make it global. This variable is going to be used as input data of the whole network. ### FC Layer ```python -def fc_layer(input, size, block=None, ...): - if block is None: - block = g_block +def fc_layer(input, size, ...): + block = program.current_block() w = block.create_parameter(...) b = block.create_parameter(...) out = block.create_var() @@ -153,4 +183,4 @@ def fc_layer(input, size, block=None, ...): return out ``` -In the `fc_layer` code, we create two parameters(`w` and `b`), one variable(`out`) and one operator(`FC operator`), then insert all of them into the specified block. +In the `fc_layer` code, we create two parameters(`w` and `b`), one variable(`out`) and one operator(`FC operator`), then insert all of them into the `current block`. From e6f2da489c2d69bae2150edd7537d6d40094b8b3 Mon Sep 17 00:00:00 2001 From: Yi Wang Date: Fri, 29 Sep 2017 20:19:55 -0700 Subject: [PATCH 13/13] Update --- doc/design/python_api.md | 220 ++++++++++++++++++++++----------------- 1 file changed, 125 insertions(+), 95 deletions(-) diff --git a/doc/design/python_api.md b/doc/design/python_api.md index 3122f5bfd8..5c68354274 100644 --- a/doc/design/python_api.md +++ b/doc/design/python_api.md @@ -1,174 +1,206 @@ # Design Doc: Python API -The top level user API in Python should be as same as API in `paddle.v2` after refactoring Paddle from a layer based framework to an operator based framework. There are many new classes in C++ in [compile time] for describing neural networks, such as `Variable`, `Operator`, `Block`. The issue about current design is how to give a proper way to wrap the C++ API to `paddle.v2` API and write layers in Python. +Due to the refactorization of the PaddlePaddle core, we need Python classes to construct corresponding protobuf messages that describe a DL program. -This implementation of Python API includes two steps. - -1. Implement the Python API using current C++ runtime concepts. -2. Replace the implementation by using compile-time concepts when they are completed. - -The implementation of the first step is a temporary implementation. We should design our Python API concepts based on `compile-time` concepts. We just use `runtime` classes to implement it for now. - - -## Python Class and compile-time protobuf - -Since we design our Python API concepts based on `compile-time`, we try to map our Python classes to every compile-time result, i.e., the protobuf messages. They are: - - -| Python Class | Compile-time protobuf | +| Python classes | Protobuf messages | | --- | --- | | Program | ProgramDesc | | Block | BlockDesc | | Operator | OpDesc | | Variable | VarDesc | +Please be aware that these Python classes need to maintain some construction-time information, which are not part of the protobuf messages. + +## Core Concepts + ### Program -`Program` is the description of the whole training process and there can only be one `Program` object, which is created automatically by the system at the very beginning. `Program` is formed by a series of `Block`. +A `ProgramDesc` describes a [DL program](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/program.md), which is composed of an array of `BlockDesc`s. A `BlockDesc` refers to its parent block by its index in the array. For example, operators in the step block of an RNN operator needs to be able to access variables in its ancessor blocks. + +Whenever we create a block, we need set its parent block to the current block, so the Python class `Program` needs to maintain a data member `current_block`. ```python class Program(objects): def __init__(self): + self.proto = core.NewProgram() # a C++ ProgramDesc pointer. self.blocks = vector() - self.blocks.append(Block(None)) - self.current_block_idx = 0 + self.blocks.append(Block(self, -1)) # the global block + self.current_block = 0 # initialized to the global block - def get_block(block_idx): - return self.blocks[block_idx] + def global_block(): + return self.blocks[0] def current_block(): - return self.get_block(self.current_block_idx) - - def fallback_current_block(): - self.current_block_idx = self.current_block().parent_idx + return self.get_block(self.current_block) + def rollback(): + self.current_block = self.current_block().parent_idx def create_block(): new_block_idx = len(self.block) - self.blocks.append(Block(parent_idx=self.current_block_idx, - idx=new_block_idx)) - self.current_block_idx = new_block_idx + self.blocks.append(Block(self, self.current_block)) + self.current_block = new_block_idx + return current_block() ``` -`Program` will create the first block in its constructor. The first block is called 'global block'. It is where all parameters are stored. +`Program` is an accessor to the protobuf message `ProgramDesc`, which is created in C++ space, because the InferShape function is in C++, which manipulates `VarDesc` messages, which are in turn members of `BlockDesc`, which is a member of `ProgramDesc`. -### Block +`Program` creates the first block as the global block in its constructor. All parameters and their initializer operators are in the global block. -Block is just like programming languages `{}`, which contains many operators and variables. There are two data fields in `Block`. 1) An associate map, whose key is variable name and value is variable itself; 2) A list of operators. +### Block -The block is hierarchical because PaddlePaddle supports RNN and IfElse. For example, RNN is like `for-loop` in programming languages. There is new `block` inside a `for-loop`. To represent hierarchies, `Block` stores the index of `parent Block` inside. The 'index' means the block's position in `Program`'s `blocks`. If `parent_idx=None`, the block itself is the outermost block, i.e., the 'global block'. +A [Block](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/block.md) includes +1. a map from variable names to an instance of the Python `Variable` class, and +1. a list of `Operator` instances. ```python class Block(objects): - def __init__(self, parent_idx, idx): + def __init__(self, program, parent_idx): + self.proto = core.NewBlock(program.proto) + self.program = program self.vars = map() self.ops = vector() - self.idx = idx self.parent_idx = parent_idx - + def create_var(self, ...): - # create variable in `self.vars` - return Variable(...) - - - def create_global_var(self, ...): - if self.parent_idx is not None: - parent_block = program.get_block(parent_idx) - return parent_block.create_global_var(...) - else: - return self.create_var(...) - - def create_parameter(self, ...): - return self.create_global_var(...) - + return Variable(self, ...) + + def _create_global_var(self, ...): + program.global_block().create_var(...) + + def create_parameter(self, name, ...): + # Parameter is a subclass of variable. See Parameter section for details. + self.vars[name] = Parameter(self._create_global_var(...), ...) + return self.vars[name] + def append_operator(self, ...): - self.ops.append(...) - - def prepend_operator(self, ...): - self.ops.prepend(...) -``` + self.ops.append(Operator(self, ...)) -Users are able to create a global variable inside any block since they many create parameters inside a RNN or IfElse. All parameters should be stored in the global block, not the step block in RNN. + def prepend_operator(self, ...): # Parameter's ctor prepands initialize operators. + self.ops.prepend(Operator(self, ...)) +``` -Users can create local variables for outputs of operators. Users can also append and prepend an operator in current block. Prepending `random initialize` operator or `load` operator is very useful to initialize parameters before training. +`create_parameter` is necessary because parameters are global variables, those defined in the global block, but can be created in some sub-blocks, e.g., an FC layer in the step block of an RNN operator. +`prepand_operator` is necessary because the constructor of `Parameter` needs to create the initialize (or load) operator of the parameter, and would like to put it in the *preamble* of the global block. ### Operator -Operator class will take inputs, outputs and attributes of the operator into `protobuf` OpDesc and create a C++ `OpDesc` instance. The `infer_shape` perform on C++ objects. +The `Operator` class fills in the `OpDesc` message and calls the C++ function `InferShape` to infer output shape from input shape. ```python class Operator(object): - def __init__(self, type, inputs, outputs, attrs): - # create OpDesc in Python - op_desc = ... - self.cpp_op_desc_ptr = core.OpDesc(op_desc) - cpp.infer_shape(self.cpp_op_desc_ptr, inputs, outputs) + def __init__(self, + block, # Block + type, # string + inputs, # dict + outputs,# dict + attrs # dict + ): + self.proto = core.NewOpDesc(block.proto, type, inputs, outputs, attrs) + core.infer_shape(self.proto, inputs, outputs) def type(self): - return self.cpp_op_desc_ptr.type() + return self.proto.type() ``` -After creating a C++ `OpDesc`, `Operator` in Python can only reads the attribute from C++ side. +`Operator` creates the `OpDesc` message in C++ space, so could it call the `InferShape` function, which is in C++. ### Variable -Operators' inputs, outputs, and parameters are all variables. In our design, a variable has four key attributes: its name(`name`), the block it belongs to(`block`), a pointer pointed to its C++ Protobuf object(`cpp_var_desc_ptr`), and the operator it is created by(`op`). All of these attributes are initialized in the constructor, except the `op`. The `op` will keep being `None` till the variable is taken as an operator's output. +Operators take Variables as its inputs and outputs. ```python class Variable(object): - def __init__(self, shape, dtype="float32", name=None, block=None): + def __init__(self, + block=None, # Block + name=None, # string + shape, # tuple + dtype="float32", # string + lod_level=None # int + ): if name is None: name = unique_name_generator() self.name = name self.block = block - # build C++ Protobuf object - self.cpp_var_desc_ptr = ... - self.op = None - - def shape(self): - cpp_shape = self.cpp_var_desc_ptr.shape() - return [None if elem < 0 else elem for elem in cpp_shape] + self.proto = core.NewVarDesc(block.proto, name, shape, lod_level) + self.writer = None ``` -The Protobuf object should be created in C++ not Python because it is needed by infershape, and infershape is implemented by C++ code. The C++ Protobuf object is accessible for Python through the `cpp_var_desc_ptr`, just like how `shape()` function does. - -The user is allowed to build a variable without specifying its name. If so, it is going to be assigned with an automatically generated unique name. +Please be aware of `self.writer`, that tracks operator who creates the variable. It possible that there are more than one operators who write a variable, but in Python space, each writes to a variable is represented by a Variable class. This is guaranteed by the fact that **`core.NewVarDesc` must NOT create a new `VarDesc` message if its name already exists in the specified block**. ### Parameter -The parameter is a kind of special variable. They need to be initialized at the very beginning and updated after each batch training. So if a variable is a parameter, our compiler will add an initializer op and an optimizer op for it during the building process of computation graph. Apart from these, there is no more difference between variable and parameter. In other words, 'parameter' is only a label attached to variables, to tell the compiler these ones require additional processing. +A parameter is a global variable with an initializer (or load) operator. ```python class Parameter(Variable): - def __init__(self, trainable, initialize_attrs, optimize_attrs): - pass + def __init__(self, + block=None, # Block + name=None, # string + shape, # tuple + dtype="float32", # string + lod_level=None # int + trainable, # bool + initialize_op_attrs, + optimize_op_attrs): + super(Parameter, self).__init__(block, name, shape, dtype, lod_level) + self.trainable = trainable + self.optimize_op_attrs = optimize_op_attrs + block.prepend(Operator(block, # Block + initialize_op_attrs['type'], # string + None, # no inputs + self, # output is the parameter + initialize_op_attrs) ``` -The class `Parameter` is derived from class `Variable`. In addition to variables have, parameters are able to hold their initializing and updating information. A parameter's `self.op` will always be `None` because it can never be an operator's output. +When users create a parameter, s/he can call +```python +program.create_parameter( + ..., + init_attr={ + type: "uniform_random", + min: -1.0, + max: 1.0, + }) +) +``` -## Layer Functions +In above example, `init_attr.type` names an initialize operator. It can also name the load operator + +```python +init_attr={ + type: "load", + filename: "something.numpy", +} +``` -A layer is a Python function. When it is invoked, it creates a series of operators and variables then inserts them into the block. It is something like the macro in C++. It is called 'Layer' because the combination of added operators acts just like what a neural network layer does. +`optimize_op_attrs` is not in the `VarDesc` message, but kept in the Python instance, as it will be used in the Python space when creating the optimize operator's `OpDesc`, and will be in the `OpDesc` message. + +## Layer Functions -Here are examples of how to write a data layer and FC layer: +A layer is a Python function that creates some operators and variables. Layers simplify the work of application programmers. ### Data Layer ```python -def data_layer(name, type): - block = program.current_block() - # type = dense_vector(size=10) / integer_value(range=10) - return block.create_global_var( - name=name, - shape=[None] + type.dims(), +def data_layer(name, type, column_name): + block = the_current_program.glolal_block() + var = block.create_global_var( + name=name, + shape=[None] + type.dims(), dtype=type.dtype) + block.prepend_operator(block, + type="Feed", + inputs = None, + outputs = [var], + {column_name: column_name}) + return var +``` -``` - -All the new variables and operators will be built in the `current block`. In the above `data_layer` code, a variable is created and be inserted into the root block to make it global. This variable is going to be used as input data of the whole network. +The input to the feed operator is a special variable in the global scope, which is the output of [Python readers](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/reader/README.md). ### FC Layer @@ -178,9 +210,7 @@ def fc_layer(input, size, ...): w = block.create_parameter(...) b = block.create_parameter(...) out = block.create_var() - op = block.append_operator(Operator("FC", X=input, W=w, b=b, Out=out)) - out.op = op + op = block.append_operator("FC", X=input, W=w, b=b, out=out) + out.writer = op return out ``` - -In the `fc_layer` code, we create two parameters(`w` and `b`), one variable(`out`) and one operator(`FC operator`), then insert all of them into the `current block`.