Add Fluid Compiler design doc (#7178)
* Add fluid_compiler.md * Paragraphingadd_depthwiseConv_op_gpu
parent
eee62648cf
commit
2dc5c69ecc
@ -0,0 +1,110 @@
|
|||||||
|
# PaddlePaddle Fluid: Towards a Compiled Programming Language
|
||||||
|
|
||||||
|
As described in [fluid.md](fluid.md), when a Fluid application program
|
||||||
|
runs, it generates a `ProgramDesc` protobuf message as an intermediate
|
||||||
|
representation of itself. The C++ class `Executor` can run this
|
||||||
|
protobuf message as an interpreter. This article describes the Fluid
|
||||||
|
compiler.
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
|
## ProgramDesc
|
||||||
|
|
||||||
|
Before we go deeper into the idea of compiled language, let us take a
|
||||||
|
look at a simple example Fluid application.
|
||||||
|
|
||||||
|
```python
|
||||||
|
import "fluid"
|
||||||
|
|
||||||
|
func paddlepaddle() {
|
||||||
|
X = fluid.read(...)
|
||||||
|
W = fluid.Tensor(...)
|
||||||
|
Y = fluid.mult(X, W)
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
This program consists of a [block](block.md) of three operators --
|
||||||
|
`read`, `assign`, and `mult`. Its `ProgramDesc` message looks like
|
||||||
|
the following
|
||||||
|
|
||||||
|
```protobuf
|
||||||
|
message ProgramDesc {
|
||||||
|
block[0] = Block {
|
||||||
|
vars = [X, W, Y],
|
||||||
|
ops = [
|
||||||
|
read(output = X)
|
||||||
|
assign(input = ..., output = W)
|
||||||
|
mult(input = {X, W}, output = Y)
|
||||||
|
],
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
## Transpilers
|
||||||
|
|
||||||
|
We can write a transpiler program that takes a `ProgramDesc`, e.g.,
|
||||||
|
the above one, and outputs another `ProgramDesc`. Let us take some
|
||||||
|
examples:
|
||||||
|
|
||||||
|
1. *Memory optimization transpiler*: We can write a transpiler that
|
||||||
|
inserts some `FreeMemoryOp`s in the above example `ProgramDesc` so
|
||||||
|
to free memory early, before the end of an iteration, so to keep a
|
||||||
|
small memory footprint.
|
||||||
|
|
||||||
|
1. *Distributed training transpiler*: We can write a transpiler that
|
||||||
|
converts a`ProgramDesc` into its distributed version of two
|
||||||
|
`ProgramDesc`s -- one for running by the trainer processes and the
|
||||||
|
other for the parameter server.
|
||||||
|
|
||||||
|
In the rest of this article, we talk about a special kind of
|
||||||
|
transpiler, *Native code generator*, which takes a `ProgramDesc` and
|
||||||
|
generates a `.cu` (or `.cc`) file, which could be built by C++
|
||||||
|
compilers (gcc, nvcc, icc) into binaries.
|
||||||
|
|
||||||
|
## Native Code Generator
|
||||||
|
|
||||||
|
For the above example, the native code generator transpiler, say, the
|
||||||
|
CUDA code generator, should generate a `main` function:
|
||||||
|
|
||||||
|
```c++
|
||||||
|
void main() {
|
||||||
|
auto X = fluid_cuda_read(...);
|
||||||
|
auto W = fluid_cuda_create_tensor(...);
|
||||||
|
auto Y = fluid_cuda_mult(X, W);
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
and the definitions of functions `fluid_cuda_read`,
|
||||||
|
`fluid_cuda_create_tensor`, and `fluid_cuda_mult`. Please be aware
|
||||||
|
that each function could just define a C++ instance of an operator and
|
||||||
|
run it. For example
|
||||||
|
|
||||||
|
```c++
|
||||||
|
paddle::Tensor fluid_cuda_read(...) {
|
||||||
|
paddle::Tensor t;
|
||||||
|
paddle::operator::Read r(&t, ...);
|
||||||
|
r.Run();
|
||||||
|
return t;
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
For computational operators that have multiple *kernels*, each for a
|
||||||
|
specific hardware platform, for example, the `mult` operator, the
|
||||||
|
generated code should call its CUDA kernel:
|
||||||
|
|
||||||
|
```c++
|
||||||
|
paddle::Tensor fluid_cuda_mult(const paddle::Tensor& a,
|
||||||
|
const paddle::Tensor& b) {
|
||||||
|
paddle::Tensor t;
|
||||||
|
paddle::operator::Mult m(a, b, ...);
|
||||||
|
Mult.Run(cuda_context);
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
where `cuda_context` could be a global variable of type
|
||||||
|
`paddle::CUDADeviceContext`.
|
||||||
|
|
||||||
|
## Multi-Block Code Generation
|
||||||
|
|
||||||
|
Most Fluid application programs may have more than one blocks. To
|
||||||
|
execute them, we need to trace [scopes](scope.md).
|
Loading…
Reference in new issue