commit
5e78c7aea6
After Width: | Height: | Size: 57 KiB |
File diff suppressed because it is too large
Load Diff
@ -0,0 +1,105 @@
|
||||
## Optimizer Design
|
||||
|
||||
### The Problem
|
||||
|
||||
A PaddlePaddle program, or a block, is a sequence of operators operating variables. A training program needs to do three kinds of works:
|
||||
|
||||
1. the forward pass, which computes intermediate results and the cost(s),
|
||||
1. the backward pass, which derives gradients from intermediate results and costs, and
|
||||
1. the optimization pass, which update model parameters to optimize the cost(s).
|
||||
|
||||
These works rely on three kinds of operators:
|
||||
|
||||
1. forward operators,
|
||||
1. gradient operators, and
|
||||
1. optimization operators.
|
||||
|
||||
It's true that users should be able to create all these operators manually by calling some low-level API, but it would be much more convenient if they could only describe the forward pass and let PaddlePaddle create the backward and optimization operators automatically.
|
||||
|
||||
In this design, we propose a high-level API that automatically derives the optimisation pass and operators from the forward pass.
|
||||
|
||||
|
||||
### High-level Python API to describe the training process
|
||||
|
||||
1. User write code to describe the network:
|
||||
|
||||
```python
|
||||
images = layer.data("images")
|
||||
labels = layer.data("labels")
|
||||
w1 = pd.var("w1")
|
||||
b1 = pd.var("b1")
|
||||
hidden = layer.fc(images, w=w1, b=b1)
|
||||
cost = layer.mse(hidden, labels)
|
||||
```
|
||||
|
||||
The above code snippet will create forward operators in [Block](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/block.md).
|
||||
|
||||
|
||||
2. Users create a certain kind of Optimizer with some argument.
|
||||
|
||||
```python
|
||||
optimizer = AdagradOptimizer(learing_rate=0.001)
|
||||
```
|
||||
|
||||
3. Users use the optimizer to `minimize` a certain `cost` through updating parameters in parameter_list.
|
||||
|
||||
```python
|
||||
opt_op_list = optimizer.minimize(cost, parameter_list=[w1, b1])
|
||||
```
|
||||
The above code snippet will create gradient and optimization operators in Block. The return value of `minimize()` is list of optimization operators that will be run by session.
|
||||
|
||||
4. Users use Session/Executor to run this opt_op_list as target to do training.
|
||||
|
||||
```python
|
||||
sess.run(target= opt_op_list, ...)
|
||||
```
|
||||
|
||||
#### Optimizer Python interface:
|
||||
|
||||
```python
|
||||
class Optimizer(object):
|
||||
"""Optimizer Base class.
|
||||
|
||||
"""
|
||||
|
||||
def __init__(self):
|
||||
pass
|
||||
|
||||
def create_backward_pass(self, loss, parameter_list=None):
|
||||
"""
|
||||
create and add gradient Operators in BlockDesc to Compute gradients of `loss`
|
||||
for parameters in parameter_list
|
||||
|
||||
Args:
|
||||
loss: an variable generated by cost function.
|
||||
parameter_list: parameters that need to compute gradient and update to optimize the lost.
|
||||
|
||||
Returns:
|
||||
list of (parameters, gradients) pair.
|
||||
"""
|
||||
return None
|
||||
|
||||
def create_optimization_pass(self, parameters_and_grads):
|
||||
"""Add optimization operators to update gradients to variables.
|
||||
|
||||
Args:
|
||||
parameters_and_grads: a list of (variable, gradient) pair to update.
|
||||
|
||||
Returns:
|
||||
optmization_op_list: a list of optimization operator that will update parameter using gradient.
|
||||
"""
|
||||
return None
|
||||
|
||||
def minimize(self, loss, parameter_list):
|
||||
"""Add operations to minimize `loss` by updating `parameter_list`.
|
||||
|
||||
This method combines interface `create_backward_pass()` and
|
||||
`create_optimization_pass()` into one.
|
||||
"""
|
||||
params_grads = self.create_backward_pass(loss, parameter_list)
|
||||
update_ops = self.create_optimization_pass(params_grads)
|
||||
return update_ops
|
||||
|
||||
```
|
||||
|
||||
Users can inherit the Optimizer above to create their own Optimizer with some special logic, such as AdagradOptimizer.
|
@ -0,0 +1,74 @@
|
||||
# Design Doc: Selected Rows
|
||||
|
||||
`SelectedRows` is a kind of sparse tensor data type, which is designed to support `embedding` operators. The gradient of embedding table is a sparse tensor. Only a few rows are non-zero values in that tensor. It is straightforward to represent the sparse tensor by the following sparse tensor data structure:
|
||||
|
||||
```cpp
|
||||
class SelectedRows {
|
||||
private:
|
||||
vector<int> rows_;
|
||||
Tensor value_;
|
||||
int height_;
|
||||
};
|
||||
```
|
||||
|
||||
The field `height_` shows the first dimension of `SelectedRows`. The `rows` are the indices of which rows of `SelectedRows` are non-zeros. The `value_` field is an N-dim tensor and shape is `[rows.size() /* NUM_ROWS */, ...]`, which supplies values for each row. The dimension of `SelectedRows` satisfies `[height_] + value_.shape[1:]`.
|
||||
|
||||
Suppose that a SelectedRows-typed variable `x` has many rows, but only two of them have values -- row 73 is `[1, 2]` and row 84 is `[3, 4]`, the `SelectedRows` representation would be:
|
||||
|
||||
```
|
||||
x = SelectedRow {
|
||||
rows = [73, 84],
|
||||
value = [[1, 2], [3,4]]
|
||||
}
|
||||
```
|
||||
|
||||
|
||||
## SelectedRows in Protobuf
|
||||
|
||||
`SelectedRows` is a kind of `Variable`. `VarDesc` in protobuf should describe the `SelectedRows` information. Only the tensor dimension of a `SelectedRows` will be described in compile-time since the `rows_` and `value_` are related to training data.
|
||||
So we use `TensorDesc` to unify `data_type` and `dims`. A LodTensorDesc contains a `TensorDesc` and `lod_level`. The description of `SelectedRows` is a Tensor description.
|
||||
|
||||
```proto
|
||||
message TensorDesc {
|
||||
required DataType data_type = 1;
|
||||
repeated int64 dims = 2; // [UNK, 640, 480] is saved as [-1, 640, 480]
|
||||
}
|
||||
|
||||
message LodTensorDesc {
|
||||
required TensorDesc tensor = 1;
|
||||
optional int lod_level = 2;
|
||||
}
|
||||
|
||||
message VarDesc {
|
||||
required string name = 1;
|
||||
enum VarType {
|
||||
LOD_TENSOR = 0;
|
||||
SELECTED_ROWS = 1;
|
||||
}
|
||||
required VarType type = 2;
|
||||
optional LodTensorDesc lod_desc = 3;
|
||||
optional TensorDesc selected_rows_desc = 4;
|
||||
optional bool persistable = 5 [ default = false ];
|
||||
}
|
||||
```
|
||||
|
||||
## InferShape for Selected Rows
|
||||
|
||||
Just like `LoD` information, `InferShape` method will inference output tensor type as well. The operator should decide whether its output is a `SelectedRows` or `Dense` tensor.
|
||||
|
||||
For example, the gradient operator of `TableLookup` will always generate `SelectedRows`. Its `InferShape` method should be like following
|
||||
|
||||
```cpp
|
||||
void TableLookupGrad::InferShape(context) {
|
||||
...
|
||||
context.SetDataType("Embedding.Grad", kSelectedRows);
|
||||
}
|
||||
```
|
||||
|
||||
|
||||
## Sparse Operators
|
||||
|
||||
There are several operators should be written to support `SelectedRows`. They are:
|
||||
|
||||
1. Operators which generates `SelectedRows` gradient. e.g. Gradient of `TableLookupOp`.
|
||||
2. Optimize operators which support `SelectedRows` gradient. e.g. `SGD` or `AdaGrad` for `SelectedRows`. However, there should be only one `SGD` operator. `OpWithKernel::Run` should select a suitable kernel for both `dense` tensor or `SelectedRows`.
|
@ -0,0 +1,35 @@
|
||||
|
||||
digraph Test {
|
||||
z -> generator -> G_img;
|
||||
G_img -> discriminator -> D_f -> d_loss_f;
|
||||
label0 -> d_loss_f -> d_loss;
|
||||
|
||||
img -> discriminator -> D_t -> d_loss_t;
|
||||
label1 -> d_loss_t -> d_loss;
|
||||
|
||||
d_loss -> d_loss_t[color=red, style=dashed];
|
||||
d_loss -> d_loss_f[color=red, style=dashed];
|
||||
d_loss_t -> D_t[color=red, style=dashed];
|
||||
d_loss_f -> D_f[color=red, style=dashed];
|
||||
D_t -> discriminator[color=red, style=dashed];
|
||||
D_f -> discriminator[color=red, style=dashed];
|
||||
|
||||
D_f -> g_loss;
|
||||
label2 -> g_loss;
|
||||
|
||||
g_loss -> D_f[color=green, style=dashed];
|
||||
D_f -> discriminator[color=green, style=dashed];
|
||||
discriminator -> G_img[color=green, style=dashed];
|
||||
G_img -> generator[color=green, style=dashed];
|
||||
|
||||
discriminator [color=red, shape=box];
|
||||
generator [color=green, shape=box];
|
||||
z [shape=diamond];
|
||||
img [shape=diamond];
|
||||
label0 [shape=diamond];
|
||||
label1 [shape=diamond];
|
||||
label2 [shape=diamond];
|
||||
|
||||
d_loss [color=red];
|
||||
g_loss [color=green];
|
||||
}
|
After Width: | Height: | Size: 58 KiB |
@ -1,27 +1,32 @@
|
||||
add_subdirectory(cuda)
|
||||
add_subdirectory(function)
|
||||
add_subdirectory(utils)
|
||||
add_subdirectory(testing)
|
||||
add_subdirectory(math)
|
||||
add_subdirectory(parameter)
|
||||
add_subdirectory(gserver)
|
||||
add_subdirectory(pserver)
|
||||
add_subdirectory(trainer)
|
||||
add_subdirectory(scripts)
|
||||
add_subdirectory(string)
|
||||
|
||||
if(Boost_FOUND)
|
||||
add_subdirectory(memory)
|
||||
add_subdirectory(platform)
|
||||
add_subdirectory(framework)
|
||||
add_subdirectory(operators)
|
||||
add_subdirectory(pybind)
|
||||
endif()
|
||||
add_subdirectory(parameter)
|
||||
add_subdirectory(testing)
|
||||
|
||||
if(WITH_C_API)
|
||||
if(MOBILE_INFERENCE)
|
||||
add_subdirectory(capi)
|
||||
endif()
|
||||
else()
|
||||
add_subdirectory(pserver)
|
||||
add_subdirectory(trainer)
|
||||
add_subdirectory(string)
|
||||
add_subdirectory(scripts)
|
||||
|
||||
if(WITH_C_API)
|
||||
add_subdirectory(capi)
|
||||
endif()
|
||||
|
||||
if(Boost_FOUND)
|
||||
add_subdirectory(memory)
|
||||
add_subdirectory(platform)
|
||||
add_subdirectory(framework)
|
||||
add_subdirectory(operators)
|
||||
add_subdirectory(pybind)
|
||||
endif()
|
||||
|
||||
if(WITH_SWIG_PY)
|
||||
add_subdirectory(api)
|
||||
if(WITH_SWIG_PY)
|
||||
add_subdirectory(api)
|
||||
endif()
|
||||
endif()
|
||||
|
@ -0,0 +1,163 @@
|
||||
/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License");
|
||||
you may not use this file except in compliance with the License.
|
||||
You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software
|
||||
distributed under the License is distributed on an "AS IS" BASIS,
|
||||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
See the License for the specific language governing permissions and
|
||||
limitations under the License. */
|
||||
|
||||
#include "paddle/framework/executor.h"
|
||||
|
||||
#include <algorithm>
|
||||
#include <iostream>
|
||||
#include <memory>
|
||||
#include <set>
|
||||
#include <vector>
|
||||
|
||||
#include "paddle/framework/lod_tensor.h"
|
||||
#include "paddle/framework/op_registry.h"
|
||||
#include "paddle/framework/scope.h"
|
||||
|
||||
namespace paddle {
|
||||
namespace framework {
|
||||
|
||||
const std::string kFeedOpType = "feed";
|
||||
const std::string kFetchOpType = "fetch";
|
||||
|
||||
Executor::Executor(const std::vector<platform::Place>& places) {
|
||||
PADDLE_ENFORCE_GT(places.size(), 0);
|
||||
device_contexts_.resize(places.size());
|
||||
for (size_t i = 0; i < places.size(); i++) {
|
||||
if (platform::is_cpu_place(places[i])) {
|
||||
device_contexts_[i] = new platform::CPUDeviceContext(
|
||||
boost::get<platform::CPUPlace>(places[i]));
|
||||
} else if (platform::is_gpu_place(places[i])) {
|
||||
#ifdef PADDLE_WITH_CUDA
|
||||
device_contexts_[i] = new platform::CUDADeviceContext(
|
||||
boost::get<platform::GPUPlace>(places[i]));
|
||||
#else
|
||||
PADDLE_THROW(
|
||||
"'GPUPlace' is not supported, Please re-compile with WITH_GPU "
|
||||
"option");
|
||||
#endif
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
Executor::~Executor() {
|
||||
for (auto& device_context : device_contexts_) {
|
||||
delete device_context;
|
||||
}
|
||||
}
|
||||
|
||||
void Executor::Run(const ProgramDesc& pdesc, Scope* scope, int block_id) {
|
||||
// TODO(tonyyang-svail):
|
||||
// - only runs on the first device (i.e. no interdevice communication)
|
||||
// - will change to use multiple blocks for RNN op and Cond Op
|
||||
PADDLE_ENFORCE_GT(pdesc.blocks_size(), block_id);
|
||||
auto& block = pdesc.blocks(block_id);
|
||||
auto& device = device_contexts_[0];
|
||||
|
||||
// Instantiate all the vars in the global scope
|
||||
for (auto& var : block.vars()) {
|
||||
scope->NewVar(var.name());
|
||||
}
|
||||
|
||||
Scope& local_scope = scope->NewScope();
|
||||
|
||||
std::vector<bool> should_run = Prune(pdesc, block_id);
|
||||
PADDLE_ENFORCE_EQ(should_run.size(), static_cast<size_t>(block.ops_size()));
|
||||
for (size_t i = 0; i < should_run.size(); ++i) {
|
||||
if (should_run[i]) {
|
||||
for (auto& var : block.ops(i).outputs()) {
|
||||
for (auto& argu : var.arguments()) {
|
||||
if (local_scope.FindVar(argu) == nullptr) {
|
||||
local_scope.NewVar(argu);
|
||||
}
|
||||
}
|
||||
}
|
||||
auto op = paddle::framework::OpRegistry::CreateOp(block.ops(i));
|
||||
op->Run(local_scope, *device);
|
||||
}
|
||||
}
|
||||
|
||||
// TODO(tonyyang-svail):
|
||||
// - Destroy local_scope
|
||||
}
|
||||
|
||||
std::vector<bool> Prune(const ProgramDesc& pdesc, int block_id) {
|
||||
// TODO(tonyyang-svail):
|
||||
// - will change to use multiple blocks for RNN op and Cond Op
|
||||
|
||||
auto& block = pdesc.blocks(block_id);
|
||||
auto& ops = block.ops();
|
||||
|
||||
bool expect_feed = true;
|
||||
for (auto& op_desc : ops) {
|
||||
PADDLE_ENFORCE(op_desc.type() != kFeedOpType || expect_feed,
|
||||
"All FeedOps are at the beginning of the ProgramDesc");
|
||||
expect_feed = (op_desc.type() == kFeedOpType);
|
||||
}
|
||||
|
||||
bool expect_fetch = true;
|
||||
for (auto op_iter = ops.rbegin(); op_iter != ops.rend(); ++op_iter) {
|
||||
auto& op_desc = *op_iter;
|
||||
PADDLE_ENFORCE(op_desc.type() != kFetchOpType || expect_fetch,
|
||||
"All FetchOps must at the end of the ProgramDesc");
|
||||
expect_fetch = (op_desc.type() == kFetchOpType);
|
||||
}
|
||||
|
||||
std::set<std::string> dependent_vars;
|
||||
std::vector<bool> should_run;
|
||||
for (auto op_iter = ops.rbegin(); op_iter != ops.rend(); ++op_iter) {
|
||||
auto& op_desc = *op_iter;
|
||||
|
||||
bool found_dependent_vars = false;
|
||||
for (auto& var : op_desc.outputs()) {
|
||||
for (auto& argu : var.arguments()) {
|
||||
if (dependent_vars.count(argu) != 0) {
|
||||
found_dependent_vars = true;
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
if (op_desc.type() == kFetchOpType || found_dependent_vars) {
|
||||
// erase its output to the dependency graph
|
||||
for (auto& var : op_desc.outputs()) {
|
||||
for (auto& argu : var.arguments()) {
|
||||
dependent_vars.erase(argu);
|
||||
}
|
||||
}
|
||||
|
||||
// insert its input to the dependency graph
|
||||
for (auto& var : op_desc.inputs()) {
|
||||
for (auto& argu : var.arguments()) {
|
||||
dependent_vars.insert(argu);
|
||||
}
|
||||
}
|
||||
|
||||
should_run.push_back(true);
|
||||
} else {
|
||||
should_run.push_back(false);
|
||||
}
|
||||
}
|
||||
|
||||
// TODO(tonyyang-svail):
|
||||
// - check this after integration of Init
|
||||
// PADDLE_ENFORCE(dependent_vars.empty());
|
||||
|
||||
// since we are traversing the ProgramDesc in reverse order
|
||||
// we reverse the should_run vector
|
||||
std::reverse(should_run.begin(), should_run.end());
|
||||
|
||||
return should_run;
|
||||
}
|
||||
|
||||
} // namespace framework
|
||||
} // namespace paddle
|
@ -0,0 +1,55 @@
|
||||
/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License");
|
||||
you may not use this file except in compliance with the License.
|
||||
You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software
|
||||
distributed under the License is distributed on an "AS IS" BASIS,
|
||||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
See the License for the specific language governing permissions and
|
||||
limitations under the License. */
|
||||
|
||||
#pragma once
|
||||
|
||||
#include "paddle/framework/framework.pb.h"
|
||||
#include "paddle/framework/op_info.h"
|
||||
#include "paddle/framework/scope.h"
|
||||
#include "paddle/framework/tensor.h"
|
||||
|
||||
namespace paddle {
|
||||
namespace framework {
|
||||
|
||||
class Executor {
|
||||
public:
|
||||
explicit Executor(const std::vector<platform::Place>& places);
|
||||
~Executor();
|
||||
|
||||
/* @Brief
|
||||
* Runtime evaluation of the given ProgramDesc under certain Scope
|
||||
*
|
||||
* @param
|
||||
* ProgramDesc
|
||||
* Scope
|
||||
*/
|
||||
void Run(const ProgramDesc&, Scope*, int);
|
||||
|
||||
private:
|
||||
std::vector<platform::DeviceContext*> device_contexts_;
|
||||
};
|
||||
|
||||
/* @Brief
|
||||
* Pruning the graph
|
||||
*
|
||||
* @param
|
||||
* ProgramDesc
|
||||
*
|
||||
* @return
|
||||
* vector<bool> Same size as ops. Indicates whether an op should be run.
|
||||
*/
|
||||
std::vector<bool> Prune(const ProgramDesc& pdesc, int block_id);
|
||||
|
||||
} // namespace framework
|
||||
} // namespace paddle
|
File diff suppressed because it is too large
Load Diff
Some files were not shown because too many files have changed in this diff Show More
Loading…
Reference in new issue