commit
515981d714
After Width: | Height: | Size: 57 KiB |
File diff suppressed because it is too large
Load Diff
@ -0,0 +1,105 @@
|
||||
## Optimizer Design
|
||||
|
||||
### The Problem
|
||||
|
||||
A PaddlePaddle program, or a block, is a sequence of operators operating variables. A training program needs to do three kinds of works:
|
||||
|
||||
1. the forward pass, which computes intermediate results and the cost(s),
|
||||
1. the backward pass, which derives gradients from intermediate results and costs, and
|
||||
1. the optimization pass, which update model parameters to optimize the cost(s).
|
||||
|
||||
These works rely on three kinds of operators:
|
||||
|
||||
1. forward operators,
|
||||
1. gradient operators, and
|
||||
1. optimization operators.
|
||||
|
||||
It's true that users should be able to create all these operators manually by calling some low-level API, but it would be much more convenient if they could only describe the forward pass and let PaddlePaddle create the backward and optimization operators automatically.
|
||||
|
||||
In this design, we propose a high-level API that automatically derives the optimisation pass and operators from the forward pass.
|
||||
|
||||
|
||||
### High-level Python API to describe the training process
|
||||
|
||||
1. User write code to describe the network:
|
||||
|
||||
```python
|
||||
images = layer.data("images")
|
||||
labels = layer.data("labels")
|
||||
w1 = pd.var("w1")
|
||||
b1 = pd.var("b1")
|
||||
hidden = layer.fc(images, w=w1, b=b1)
|
||||
cost = layer.mse(hidden, labels)
|
||||
```
|
||||
|
||||
The above code snippet will create forward operators in [Block](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/block.md).
|
||||
|
||||
|
||||
2. Users create a certain kind of Optimizer with some argument.
|
||||
|
||||
```python
|
||||
optimizer = AdagradOptimizer(learing_rate=0.001)
|
||||
```
|
||||
|
||||
3. Users use the optimizer to `minimize` a certain `cost` through updating parameters in parameter_list.
|
||||
|
||||
```python
|
||||
opt_op_list = optimizer.minimize(cost, parameter_list=[w1, b1])
|
||||
```
|
||||
The above code snippet will create gradient and optimization operators in Block. The return value of `minimize()` is list of optimization operators that will be run by session.
|
||||
|
||||
4. Users use Session/Executor to run this opt_op_list as target to do training.
|
||||
|
||||
```python
|
||||
sess.run(target= opt_op_list, ...)
|
||||
```
|
||||
|
||||
#### Optimizer Python interface:
|
||||
|
||||
```python
|
||||
class Optimizer(object):
|
||||
"""Optimizer Base class.
|
||||
|
||||
"""
|
||||
|
||||
def __init__(self):
|
||||
pass
|
||||
|
||||
def create_backward_pass(self, loss, parameter_list=None):
|
||||
"""
|
||||
create and add gradient Operators in BlockDesc to Compute gradients of `loss`
|
||||
for parameters in parameter_list
|
||||
|
||||
Args:
|
||||
loss: an variable generated by cost function.
|
||||
parameter_list: parameters that need to compute gradient and update to optimize the lost.
|
||||
|
||||
Returns:
|
||||
list of (parameters, gradients) pair.
|
||||
"""
|
||||
return None
|
||||
|
||||
def create_optimization_pass(self, parameters_and_grads):
|
||||
"""Add optimization operators to update gradients to variables.
|
||||
|
||||
Args:
|
||||
parameters_and_grads: a list of (variable, gradient) pair to update.
|
||||
|
||||
Returns:
|
||||
optmization_op_list: a list of optimization operator that will update parameter using gradient.
|
||||
"""
|
||||
return None
|
||||
|
||||
def minimize(self, loss, parameter_list):
|
||||
"""Add operations to minimize `loss` by updating `parameter_list`.
|
||||
|
||||
This method combines interface `create_backward_pass()` and
|
||||
`create_optimization_pass()` into one.
|
||||
"""
|
||||
params_grads = self.create_backward_pass(loss, parameter_list)
|
||||
update_ops = self.create_optimization_pass(params_grads)
|
||||
return update_ops
|
||||
|
||||
```
|
||||
|
||||
Users can inherit the Optimizer above to create their own Optimizer with some special logic, such as AdagradOptimizer.
|
@ -0,0 +1,74 @@
|
||||
# Design Doc: Selected Rows
|
||||
|
||||
`SelectedRows` is a kind of sparse tensor data type, which is designed to support `embedding` operators. The gradient of embedding table is a sparse tensor. Only a few rows are non-zero values in that tensor. It is straightforward to represent the sparse tensor by the following sparse tensor data structure:
|
||||
|
||||
```cpp
|
||||
class SelectedRows {
|
||||
private:
|
||||
vector<int> rows_;
|
||||
Tensor value_;
|
||||
int height_;
|
||||
};
|
||||
```
|
||||
|
||||
The field `height_` shows the first dimension of `SelectedRows`. The `rows` are the indices of which rows of `SelectedRows` are non-zeros. The `value_` field is an N-dim tensor and shape is `[rows.size() /* NUM_ROWS */, ...]`, which supplies values for each row. The dimension of `SelectedRows` satisfies `[height_] + value_.shape[1:]`.
|
||||
|
||||
Suppose that a SelectedRows-typed variable `x` has many rows, but only two of them have values -- row 73 is `[1, 2]` and row 84 is `[3, 4]`, the `SelectedRows` representation would be:
|
||||
|
||||
```
|
||||
x = SelectedRow {
|
||||
rows = [73, 84],
|
||||
value = [[1, 2], [3,4]]
|
||||
}
|
||||
```
|
||||
|
||||
|
||||
## SelectedRows in Protobuf
|
||||
|
||||
`SelectedRows` is a kind of `Variable`. `VarDesc` in protobuf should describe the `SelectedRows` information. Only the tensor dimension of a `SelectedRows` will be described in compile-time since the `rows_` and `value_` are related to training data.
|
||||
So we use `TensorDesc` to unify `data_type` and `dims`. A LodTensorDesc contains a `TensorDesc` and `lod_level`. The description of `SelectedRows` is a Tensor description.
|
||||
|
||||
```proto
|
||||
message TensorDesc {
|
||||
required DataType data_type = 1;
|
||||
repeated int64 dims = 2; // [UNK, 640, 480] is saved as [-1, 640, 480]
|
||||
}
|
||||
|
||||
message LodTensorDesc {
|
||||
required TensorDesc tensor = 1;
|
||||
optional int lod_level = 2;
|
||||
}
|
||||
|
||||
message VarDesc {
|
||||
required string name = 1;
|
||||
enum VarType {
|
||||
LOD_TENSOR = 0;
|
||||
SELECTED_ROWS = 1;
|
||||
}
|
||||
required VarType type = 2;
|
||||
optional LodTensorDesc lod_desc = 3;
|
||||
optional TensorDesc selected_rows_desc = 4;
|
||||
optional bool persistable = 5 [ default = false ];
|
||||
}
|
||||
```
|
||||
|
||||
## InferShape for Selected Rows
|
||||
|
||||
Just like `LoD` information, `InferShape` method will inference output tensor type as well. The operator should decide whether its output is a `SelectedRows` or `Dense` tensor.
|
||||
|
||||
For example, the gradient operator of `TableLookup` will always generate `SelectedRows`. Its `InferShape` method should be like following
|
||||
|
||||
```cpp
|
||||
void TableLookupGrad::InferShape(context) {
|
||||
...
|
||||
context.SetDataType("Embedding.Grad", kSelectedRows);
|
||||
}
|
||||
```
|
||||
|
||||
|
||||
## Sparse Operators
|
||||
|
||||
There are several operators should be written to support `SelectedRows`. They are:
|
||||
|
||||
1. Operators which generates `SelectedRows` gradient. e.g. Gradient of `TableLookupOp`.
|
||||
2. Optimize operators which support `SelectedRows` gradient. e.g. `SGD` or `AdaGrad` for `SelectedRows`. However, there should be only one `SGD` operator. `OpWithKernel::Run` should select a suitable kernel for both `dense` tensor or `SelectedRows`.
|
@ -0,0 +1,35 @@
|
||||
|
||||
digraph Test {
|
||||
z -> generator -> G_img;
|
||||
G_img -> discriminator -> D_f -> d_loss_f;
|
||||
label0 -> d_loss_f -> d_loss;
|
||||
|
||||
img -> discriminator -> D_t -> d_loss_t;
|
||||
label1 -> d_loss_t -> d_loss;
|
||||
|
||||
d_loss -> d_loss_t[color=red, style=dashed];
|
||||
d_loss -> d_loss_f[color=red, style=dashed];
|
||||
d_loss_t -> D_t[color=red, style=dashed];
|
||||
d_loss_f -> D_f[color=red, style=dashed];
|
||||
D_t -> discriminator[color=red, style=dashed];
|
||||
D_f -> discriminator[color=red, style=dashed];
|
||||
|
||||
D_f -> g_loss;
|
||||
label2 -> g_loss;
|
||||
|
||||
g_loss -> D_f[color=green, style=dashed];
|
||||
D_f -> discriminator[color=green, style=dashed];
|
||||
discriminator -> G_img[color=green, style=dashed];
|
||||
G_img -> generator[color=green, style=dashed];
|
||||
|
||||
discriminator [color=red, shape=box];
|
||||
generator [color=green, shape=box];
|
||||
z [shape=diamond];
|
||||
img [shape=diamond];
|
||||
label0 [shape=diamond];
|
||||
label1 [shape=diamond];
|
||||
label2 [shape=diamond];
|
||||
|
||||
d_loss [color=red];
|
||||
g_loss [color=green];
|
||||
}
|
After Width: | Height: | Size: 58 KiB |
@ -1,27 +1,32 @@
|
||||
add_subdirectory(cuda)
|
||||
add_subdirectory(function)
|
||||
add_subdirectory(utils)
|
||||
add_subdirectory(testing)
|
||||
add_subdirectory(math)
|
||||
add_subdirectory(parameter)
|
||||
add_subdirectory(gserver)
|
||||
add_subdirectory(pserver)
|
||||
add_subdirectory(trainer)
|
||||
add_subdirectory(scripts)
|
||||
add_subdirectory(string)
|
||||
add_subdirectory(parameter)
|
||||
add_subdirectory(testing)
|
||||
|
||||
if(MOBILE_INFERENCE)
|
||||
add_subdirectory(capi)
|
||||
else()
|
||||
add_subdirectory(pserver)
|
||||
add_subdirectory(trainer)
|
||||
add_subdirectory(string)
|
||||
add_subdirectory(scripts)
|
||||
|
||||
if(Boost_FOUND)
|
||||
if(WITH_C_API)
|
||||
add_subdirectory(capi)
|
||||
endif()
|
||||
|
||||
if(Boost_FOUND)
|
||||
add_subdirectory(memory)
|
||||
add_subdirectory(platform)
|
||||
add_subdirectory(framework)
|
||||
add_subdirectory(operators)
|
||||
add_subdirectory(pybind)
|
||||
endif()
|
||||
|
||||
if(WITH_C_API)
|
||||
add_subdirectory(capi)
|
||||
endif()
|
||||
endif()
|
||||
|
||||
if(WITH_SWIG_PY)
|
||||
if(WITH_SWIG_PY)
|
||||
add_subdirectory(api)
|
||||
endif()
|
||||
endif()
|
||||
|
@ -0,0 +1,163 @@
|
||||
/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License");
|
||||
you may not use this file except in compliance with the License.
|
||||
You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software
|
||||
distributed under the License is distributed on an "AS IS" BASIS,
|
||||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
See the License for the specific language governing permissions and
|
||||
limitations under the License. */
|
||||
|
||||
#include "paddle/framework/executor.h"
|
||||
|
||||
#include <algorithm>
|
||||
#include <iostream>
|
||||
#include <memory>
|
||||
#include <set>
|
||||
#include <vector>
|
||||
|
||||
#include "paddle/framework/lod_tensor.h"
|
||||
#include "paddle/framework/op_registry.h"
|
||||
#include "paddle/framework/scope.h"
|
||||
|
||||
namespace paddle {
|
||||
namespace framework {
|
||||
|
||||
const std::string kFeedOpType = "feed";
|
||||
const std::string kFetchOpType = "fetch";
|
||||
|
||||
Executor::Executor(const std::vector<platform::Place>& places) {
|
||||
PADDLE_ENFORCE_GT(places.size(), 0);
|
||||
device_contexts_.resize(places.size());
|
||||
for (size_t i = 0; i < places.size(); i++) {
|
||||
if (platform::is_cpu_place(places[i])) {
|
||||
device_contexts_[i] = new platform::CPUDeviceContext(
|
||||
boost::get<platform::CPUPlace>(places[i]));
|
||||
} else if (platform::is_gpu_place(places[i])) {
|
||||
#ifdef PADDLE_WITH_CUDA
|
||||
device_contexts_[i] = new platform::CUDADeviceContext(
|
||||
boost::get<platform::GPUPlace>(places[i]));
|
||||
#else
|
||||
PADDLE_THROW(
|
||||
"'GPUPlace' is not supported, Please re-compile with WITH_GPU "
|
||||
"option");
|
||||
#endif
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
Executor::~Executor() {
|
||||
for (auto& device_context : device_contexts_) {
|
||||
delete device_context;
|
||||
}
|
||||
}
|
||||
|
||||
void Executor::Run(const ProgramDesc& pdesc, Scope* scope, int block_id) {
|
||||
// TODO(tonyyang-svail):
|
||||
// - only runs on the first device (i.e. no interdevice communication)
|
||||
// - will change to use multiple blocks for RNN op and Cond Op
|
||||
PADDLE_ENFORCE_GT(pdesc.blocks_size(), block_id);
|
||||
auto& block = pdesc.blocks(block_id);
|
||||
auto& device = device_contexts_[0];
|
||||
|
||||
// Instantiate all the vars in the global scope
|
||||
for (auto& var : block.vars()) {
|
||||
scope->NewVar(var.name());
|
||||
}
|
||||
|
||||
Scope& local_scope = scope->NewScope();
|
||||
|
||||
std::vector<bool> should_run = Prune(pdesc, block_id);
|
||||
PADDLE_ENFORCE_EQ(should_run.size(), static_cast<size_t>(block.ops_size()));
|
||||
for (size_t i = 0; i < should_run.size(); ++i) {
|
||||
if (should_run[i]) {
|
||||
for (auto& var : block.ops(i).outputs()) {
|
||||
for (auto& argu : var.arguments()) {
|
||||
if (local_scope.FindVar(argu) == nullptr) {
|
||||
local_scope.NewVar(argu);
|
||||
}
|
||||
}
|
||||
}
|
||||
auto op = paddle::framework::OpRegistry::CreateOp(block.ops(i));
|
||||
op->Run(local_scope, *device);
|
||||
}
|
||||
}
|
||||
|
||||
// TODO(tonyyang-svail):
|
||||
// - Destroy local_scope
|
||||
}
|
||||
|
||||
std::vector<bool> Prune(const ProgramDesc& pdesc, int block_id) {
|
||||
// TODO(tonyyang-svail):
|
||||
// - will change to use multiple blocks for RNN op and Cond Op
|
||||
|
||||
auto& block = pdesc.blocks(block_id);
|
||||
auto& ops = block.ops();
|
||||
|
||||
bool expect_feed = true;
|
||||
for (auto& op_desc : ops) {
|
||||
PADDLE_ENFORCE(op_desc.type() != kFeedOpType || expect_feed,
|
||||
"All FeedOps are at the beginning of the ProgramDesc");
|
||||
expect_feed = (op_desc.type() == kFeedOpType);
|
||||
}
|
||||
|
||||
bool expect_fetch = true;
|
||||
for (auto op_iter = ops.rbegin(); op_iter != ops.rend(); ++op_iter) {
|
||||
auto& op_desc = *op_iter;
|
||||
PADDLE_ENFORCE(op_desc.type() != kFetchOpType || expect_fetch,
|
||||
"All FetchOps must at the end of the ProgramDesc");
|
||||
expect_fetch = (op_desc.type() == kFetchOpType);
|
||||
}
|
||||
|
||||
std::set<std::string> dependent_vars;
|
||||
std::vector<bool> should_run;
|
||||
for (auto op_iter = ops.rbegin(); op_iter != ops.rend(); ++op_iter) {
|
||||
auto& op_desc = *op_iter;
|
||||
|
||||
bool found_dependent_vars = false;
|
||||
for (auto& var : op_desc.outputs()) {
|
||||
for (auto& argu : var.arguments()) {
|
||||
if (dependent_vars.count(argu) != 0) {
|
||||
found_dependent_vars = true;
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
if (op_desc.type() == kFetchOpType || found_dependent_vars) {
|
||||
// erase its output to the dependency graph
|
||||
for (auto& var : op_desc.outputs()) {
|
||||
for (auto& argu : var.arguments()) {
|
||||
dependent_vars.erase(argu);
|
||||
}
|
||||
}
|
||||
|
||||
// insert its input to the dependency graph
|
||||
for (auto& var : op_desc.inputs()) {
|
||||
for (auto& argu : var.arguments()) {
|
||||
dependent_vars.insert(argu);
|
||||
}
|
||||
}
|
||||
|
||||
should_run.push_back(true);
|
||||
} else {
|
||||
should_run.push_back(false);
|
||||
}
|
||||
}
|
||||
|
||||
// TODO(tonyyang-svail):
|
||||
// - check this after integration of Init
|
||||
// PADDLE_ENFORCE(dependent_vars.empty());
|
||||
|
||||
// since we are traversing the ProgramDesc in reverse order
|
||||
// we reverse the should_run vector
|
||||
std::reverse(should_run.begin(), should_run.end());
|
||||
|
||||
return should_run;
|
||||
}
|
||||
|
||||
} // namespace framework
|
||||
} // namespace paddle
|
@ -0,0 +1,55 @@
|
||||
/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License");
|
||||
you may not use this file except in compliance with the License.
|
||||
You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software
|
||||
distributed under the License is distributed on an "AS IS" BASIS,
|
||||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
See the License for the specific language governing permissions and
|
||||
limitations under the License. */
|
||||
|
||||
#pragma once
|
||||
|
||||
#include "paddle/framework/framework.pb.h"
|
||||
#include "paddle/framework/op_info.h"
|
||||
#include "paddle/framework/scope.h"
|
||||
#include "paddle/framework/tensor.h"
|
||||
|
||||
namespace paddle {
|
||||
namespace framework {
|
||||
|
||||
class Executor {
|
||||
public:
|
||||
explicit Executor(const std::vector<platform::Place>& places);
|
||||
~Executor();
|
||||
|
||||
/* @Brief
|
||||
* Runtime evaluation of the given ProgramDesc under certain Scope
|
||||
*
|
||||
* @param
|
||||
* ProgramDesc
|
||||
* Scope
|
||||
*/
|
||||
void Run(const ProgramDesc&, Scope*, int);
|
||||
|
||||
private:
|
||||
std::vector<platform::DeviceContext*> device_contexts_;
|
||||
};
|
||||
|
||||
/* @Brief
|
||||
* Pruning the graph
|
||||
*
|
||||
* @param
|
||||
* ProgramDesc
|
||||
*
|
||||
* @return
|
||||
* vector<bool> Same size as ops. Indicates whether an op should be run.
|
||||
*/
|
||||
std::vector<bool> Prune(const ProgramDesc& pdesc, int block_id);
|
||||
|
||||
} // namespace framework
|
||||
} // namespace paddle
|
Some files were not shown because too many files have changed in this diff Show More
Loading…
Reference in new issue