You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
Paddle/doc/design/fluid_compiler.md

111 lines
3.1 KiB

# PaddlePaddle Fluid: Towards a Compiled Programming Language
As described in [fluid.md](fluid.md), when a Fluid application program
runs, it generates a `ProgramDesc` protobuf message as an intermediate
representation of itself. The C++ class `Executor` can run this
protobuf message as an interpreter. This article describes the Fluid
compiler.
![](fluid-compiler.png)
## ProgramDesc
Before we go deeper into the idea of compiled language, let us take a
look at a simple example Fluid application.
```python
import "fluid"
func paddlepaddle() {
X = fluid.read(...)
W = fluid.Tensor(...)
Y = fluid.mult(X, W)
}
```
This program consists of a [block](block.md) of three operators --
`read`, `assign`, and `mult`. Its `ProgramDesc` message looks like
the following
```protobuf
message ProgramDesc {
block[0] = Block {
vars = [X, W, Y],
ops = [
read(output = X)
assign(input = ..., output = W)
mult(input = {X, W}, output = Y)
],
}
}
```
## Transpilers
We can write a transpiler program that takes a `ProgramDesc`, e.g.,
the above one, and outputs another `ProgramDesc`. Let us take some
examples:
1. *Memory optimization transpiler*: We can write a transpiler that
inserts some `FreeMemoryOp`s in the above example `ProgramDesc` so
to free memory early, before the end of an iteration, so to keep a
small memory footprint.
1. *Distributed training transpiler*: We can write a transpiler that
converts a`ProgramDesc` into its distributed version of two
`ProgramDesc`s -- one for running by the trainer processes and the
other for the parameter server.
In the rest of this article, we talk about a special kind of
transpiler, *Native code generator*, which takes a `ProgramDesc` and
generates a `.cu` (or `.cc`) file, which could be built by C++
compilers (gcc, nvcc, icc) into binaries.
## Native Code Generator
For the above example, the native code generator transpiler, say, the
CUDA code generator, should generate a `main` function:
```c++
void main() {
auto X = fluid_cuda_read(...);
auto W = fluid_cuda_create_tensor(...);
auto Y = fluid_cuda_mult(X, W);
}
```
and the definitions of functions `fluid_cuda_read`,
`fluid_cuda_create_tensor`, and `fluid_cuda_mult`. Please be aware
that each function could just define a C++ instance of an operator and
run it. For example
```c++
paddle::Tensor fluid_cuda_read(...) {
paddle::Tensor t;
paddle::operator::Read r(&t, ...);
r.Run();
return t;
}
```
For computational operators that have multiple *kernels*, each for a
specific hardware platform, for example, the `mult` operator, the
generated code should call its CUDA kernel:
```c++
paddle::Tensor fluid_cuda_mult(const paddle::Tensor& a,
const paddle::Tensor& b) {
paddle::Tensor t;
paddle::operator::Mult m(a, b, ...);
Mult.Run(cuda_context);
}
```
where `cuda_context` could be a global variable of type
`paddle::CUDADeviceContext`.
## Multi-Block Code Generation
Most Fluid application programs may have more than one blocks. To
execute them, we need to trace [scopes](scope.md).