commit
f6106ffa7e
@ -0,0 +1,30 @@
|
||||
if (NOT WITH_GPU)
|
||||
return ()
|
||||
endif()
|
||||
|
||||
set(NCCL_ROOT "/usr" CACHE PATH "CUDNN ROOT")
|
||||
find_path(NCCL_INCLUDE_DIR nccl.h PATHS
|
||||
${NCCL_ROOT} ${NCCL_ROOT}/include
|
||||
$ENV{NCCL_ROOT} $ENV{NCCL_ROOT}/include ${CUDA_TOOLKIT_INCLUDE}
|
||||
NO_DEFAULT_PATH)
|
||||
|
||||
get_filename_component(__libpath_hist ${CUDA_CUDART_LIBRARY} PATH)
|
||||
|
||||
set(TARGET_ARCH "x86_64")
|
||||
if(NOT ${CMAKE_SYSTEM_PROCESSOR})
|
||||
set(TARGET_ARCH ${CMAKE_SYSTEM_PROCESSOR})
|
||||
endif()
|
||||
|
||||
list(APPEND NCCL_CHECK_LIBRARY_DIRS
|
||||
${NCCL_ROOT}
|
||||
${NCCL_ROOT}/lib64
|
||||
${NCCL_ROOT}/lib
|
||||
${NCCL_ROOT}/lib/${TARGET_ARCH}-linux-gnu
|
||||
$ENV{NCCL_ROOT}
|
||||
$ENV{NCCL_ROOT}/lib64
|
||||
$ENV{NCCL_ROOT}/lib
|
||||
/usr/lib)
|
||||
find_library(NCCL_LIBRARY NAMES libnccl.so libnccl.dylib # libcudnn_static.a
|
||||
PATHS ${NCCL_CHECK_LIBRARY_DIRS} ${NCCL_INCLUDE_DIR} ${__libpath_hist}
|
||||
NO_DEFAULT_PATH
|
||||
DOC "Path to nccl library.")
|
Binary file not shown.
@ -0,0 +1,23 @@
|
||||
# Executor Design Doc
|
||||
|
||||
## Motivation
|
||||
|
||||
We use executor to do the runtime evaluation of a `ProgramDesc`.
|
||||
|
||||
## Overview
|
||||
|
||||
An executor takes a `ProgramDesc`, a `block_id` and a `Scope`. The `ProgramDesc` is a list of blocks and each block contains the protobuf definition of all the parameters and operators. The `block_id` specifies the entrance block. And the `Scope` is the container of all the variable instance, which is persistent throughout different runs.
|
||||
|
||||
### What does executor do?
|
||||
|
||||
It evaluates all the operators in the `block_id`th block of a `ProgramDesc`.
|
||||
|
||||
### What does executor NOT do?
|
||||
|
||||
It does not do runtime optimization, meaning intelligently parse the dependency of each op a choose which one to be run and in which order they should be run.
|
||||
|
||||
It does not do graph partitioning, meaning dividing the `ProgramDesc` into several small pieces and executing them on different devices.
|
||||
|
||||
## Implementation
|
||||
|
||||
`Executor` evaluates a `ProgramDesc`. Essentially, it instantiates Variables and Operators, then run all the operators in sequence. [[code]](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/executor.cc)
|
After Width: | Height: | Size: 32 KiB |
After Width: | Height: | Size: 45 KiB |
After Width: | Height: | Size: 1.1 KiB |
After Width: | Height: | Size: 989 B |
After Width: | Height: | Size: 1.6 KiB |
@ -0,0 +1,78 @@
|
||||
# Design Doc: InferVarType
|
||||
|
||||
## The Problem Posed
|
||||
|
||||
The variable in our design can hold variant types. Such as `LoDTensor` and `SelectedRows`. An operator should be able to inference the variable types of its output.
|
||||
|
||||
For example, a `lookup table` operator takes two `LoDTensor`; one is a float tensor as the embedding table, the other is an int tensor as word ID. The gradient operator of `lookup table` will generate a `SelectedRows` as its output. A `sum` operator can take both `LoDTensor` and `SelectedRows` as its inputs and will generate a `LoDTensor` if any of its inputs is `LoDTensor`, otherwise, the `sum` operator will generate `SelectedRows` as its output.
|
||||
|
||||
The variable type will be constant at runtime. Every variable's type can either be set by the user (input data and parameter) or be inferred by the operator in compile time.
|
||||
|
||||
## Proposed Solution
|
||||
|
||||
The `InferVarType` is a compile-time function which is registered to each operator. The inferface of that function is:
|
||||
|
||||
|
||||
```c++
|
||||
using InferVarTypeFN = std::function<
|
||||
void (const OpDescBind& /*op_desc*/, BlockDescBind* /*block*/)>;
|
||||
```
|
||||
|
||||
It takes an operator description as its input and will write the output variable type and store them in block description.
|
||||
|
||||
The `InferVarTypeFN` will be registered in `OpInfo`, to replace `infer_var_type_` field. The `OpInfo` should be
|
||||
|
||||
```cpp
|
||||
struct OpInfo {
|
||||
InferVarTypeFN infer_var_type_;
|
||||
...
|
||||
};
|
||||
```
|
||||
|
||||
The default `InferVarType` will set output type as `LoDTensor`. It can be done by `GetInferVarType()`.
|
||||
|
||||
```cpp
|
||||
void DefaultInferVarType(const OpDescBind& op_desc, BlockDescBind* block) {
|
||||
// set the output type of variable as `LoDTensor`.
|
||||
// ...
|
||||
}
|
||||
|
||||
struct OpInfo {
|
||||
InferVarTypeFN infer_var_type_;
|
||||
InferVarTypeFN GetInferVarType() const {
|
||||
if (infer_var_type_) {
|
||||
return infer_var_type_;
|
||||
} else {
|
||||
return DefaultInferVarType;
|
||||
}
|
||||
}
|
||||
};
|
||||
```
|
||||
|
||||
## Register InferVarType
|
||||
|
||||
We provide a thin base class for registering an `InferVarTypeFN`. To use a base class will ease the implementation of registry since we can detect the registry entry is an `InferVarTypeFN` or not.
|
||||
|
||||
```cpp
|
||||
class VarTypeInferer {
|
||||
public:
|
||||
virtual void operator()(const OpDescBind& op_desc, BlockDescBind* block) const = 0;
|
||||
}
|
||||
```
|
||||
|
||||
Operator developers can write the specialize `VarTypeInferer` as follow.
|
||||
|
||||
```cpp
|
||||
class SpecialVarTypeInferer : public VarTypeInferer {
|
||||
public:
|
||||
virtual void operator()(const OpDescBind& op_desc, BlockDescBind* block) const {
|
||||
// .. own logic
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Then user can register the `InferVarType` just like `GradOpDescMaker` and `OpInfoMaker`.
|
||||
|
||||
```
|
||||
REGISTER_OPERATOR(some_op, OpType, SpecialVarTypeInferer, ...);
|
||||
```
|
@ -0,0 +1,63 @@
|
||||
# Prune
|
||||
|
||||
## Motivation
|
||||
|
||||
We want to support running inference, training and checkpointing in one `ProgramDesc`. We implement
|
||||
`void Prune(const ProgramDesc* input, ProgramDesc* output)` function, which takes a `ProgramDesc`
|
||||
and generate a pruned `ProgramDesc`.
|
||||
|
||||
## Challenge
|
||||
|
||||
Pruning need to support both variables and operators being evaluation targets. Consider the following
|
||||
different situations.
|
||||
|
||||
```python
|
||||
# Case 1: run foward pass.
|
||||
cost_np = session.run(target=cost)
|
||||
# Case 2: run backward passing.
|
||||
opts_np, _ = session.run(target=[cost, opt])
|
||||
# Case 3: run checkpointing
|
||||
_ = session.run(target=checkpoint)
|
||||
```
|
||||
|
||||
## Solution
|
||||
|
||||
To support evaluation of operators, we add `is_target` field in the `OpDesc`.
|
||||
|
||||
```c++
|
||||
message OpDesc {
|
||||
required string type = 3;
|
||||
repeated Var inputs = 1;
|
||||
repeated Var outputs = 2;
|
||||
repeated Attr attrs = 4;
|
||||
optional bool is_target = 5 [ default = false ];
|
||||
};
|
||||
```
|
||||
|
||||
To support evaluation of variables, we add [fetch_op](https://github.com/PaddlePaddle/Paddle/pull/4599).
|
||||
For each variable in the `target`, we insert a `fetch_op` into the `ProgramDesc` with `variable` being
|
||||
`fetch_op`'s input. Then we also set `fetch_op` is a target.
|
||||
|
||||
### Algorithm
|
||||
|
||||
If an operator needs to be run, it must fall into one of the following cases:
|
||||
|
||||
1. It is the target.
|
||||
2. It is depended by some other ops, meaning its output is some other op's input.
|
||||
|
||||
The first case can be checked by `op_desc.is_traget()` . The second case can be implement as
|
||||
|
||||
```c++
|
||||
bool HasDependentVar(const OpDesc& op_desc, const std::set<string>& dependent_vars) {
|
||||
for (auto& var : op_desc.outputs()) {
|
||||
for (auto& argu : var.arguments()) {
|
||||
if (dependent_vars.count(argu) != 0) {
|
||||
return true;
|
||||
}
|
||||
}
|
||||
}
|
||||
return false;
|
||||
}
|
||||
```
|
||||
|
||||
Then the whole algorithm can be implemented as the following [code](https://github.com/tonyyang-svail/Paddle/blob/prune_impl/paddle/framework/prune.cc).
|
Some files were not shown because too many files have changed in this diff Show More
Loading…
Reference in new issue