Paddle/doc/design/fluid_compiler.md

# PaddlePaddle Fluid: Towards a Compiled Programming Language

As described in [fluid.md](fluid.md), when a Fluid application program
runs, it generates a `ProgramDesc` protobuf message as an intermediate
representation of itself.  The C++ class `Executor` can run this
protobuf message as an interpreter.  This article describes the Fluid
compiler.

![](fluid-compiler.png)

## ProgramDesc

Before we go deeper into the idea of compiled language, let us take a
look at a simple example Fluid application.

```python
import "fluid"

func paddlepaddle() {
  X = fluid.read(...)
  W = fluid.Tensor(...)
  Y = fluid.mult(X, W)
}
```

This program consists of a [block](block.md) of three operators --
`read`, `assign`, and `mult`.  Its `ProgramDesc` message looks like
the following

```protobuf
message ProgramDesc {
  block[0] = Block {
    vars = [X, W, Y],
    ops = [
      read(output = X)
      assign(input = ..., output = W)
      mult(input = {X, W}, output = Y)
    ],
  }
}
```

## Transpilers

We can write a transpiler program that takes a `ProgramDesc`, e.g.,
the above one, and outputs another `ProgramDesc`.  Let us take some
examples:

1. *Memory optimization transpiler*: We can write a transpiler that
   inserts some `FreeMemoryOp`s in the above example `ProgramDesc` so
   to free memory early, before the end of an iteration, so to keep a
   small memory footprint.

1. *Distributed training transpiler*: We can write a transpiler that
   converts a`ProgramDesc` into its distributed version of two
   `ProgramDesc`s -- one for running by the trainer processes and the
   other for the parameter server.

In the rest of this article, we talk about a special kind of
transpiler, *Native code generator*, which takes a `ProgramDesc` and
generates a `.cu` (or `.cc`) file, which could be built by C++
compilers (gcc, nvcc, icc) into binaries.

## Native Code Generator

For the above example, the native code generator transpiler, say, the
CUDA code generator, should generate a `main` function:

```c++
void main() {
  auto X = fluid_cuda_read(...);
  auto W = fluid_cuda_create_tensor(...);
  auto Y = fluid_cuda_mult(X, W);
}
```

and the definitions of functions `fluid_cuda_read`,
`fluid_cuda_create_tensor`, and `fluid_cuda_mult`.  Please be aware
that each function could just define a C++ instance of an operator and
run it.  For example

```c++
paddle::Tensor fluid_cuda_read(...) {
  paddle::Tensor t;
  paddle::operator::Read r(&t, ...);
  r.Run();
  return t;
}
```

For computational operators that have multiple *kernels*, each for a
specific hardware platform, for example, the `mult` operator, the
generated code should call its CUDA kernel:

```c++
paddle::Tensor fluid_cuda_mult(const paddle::Tensor& a,
                               const paddle::Tensor& b) {
  paddle::Tensor t;
  paddle::operator::Mult m(a, b, ...);
  Mult.Run(cuda_context);
}
```

where `cuda_context` could be a global variable of type
`paddle::CUDADeviceContext`.

## Multi-Block Code Generation

Most Fluid application programs may have more than one blocks.  To
execute them, we need to trace [scopes](scope.md).