History

wanghuancoder d1b25ed9d7 add some RecordEvent, for dygraph timeline (#30299 ) * add some RecordEvent, for dygraph timeline, test=develop * change GpuMemcpySync to memory::Copy, test=develop * fix compile problem, test=develop * fix compile problem, test=develop * fix, test=develop * fix, test=develop		4 years ago
..
jit	Refine error msg in paddle/fluid/imperative (#27521 )	4 years ago
tests	[Prepare for MultiProcess xpu] unified gen nccl id, refine imperative reducer (#30455 )	4 years ago
CMakeLists.txt	Support dynamic graph distributed (#28997 )	4 years ago
README.md	fix sample code in paddle/fluid/imperative/README.md (#22141 )	5 years ago
all_reduce.cc	[Prepare for MultiProcess xpu] unified gen nccl id, refine imperative reducer (#30455 )	4 years ago
all_reduce.h	[Prepare for MultiProcess xpu] unified gen nccl id, refine imperative reducer (#30455 )	4 years ago
amp_auto_cast.cc	support layer_norm fp16 in dygraph amp (#30430 )	4 years ago
amp_auto_cast.h	support layer_norm fp16 in dygraph amp (#30430 )	4 years ago
basic_engine.cc	add some RecordEvent, for dygraph timeline (#30299 )	4 years ago
basic_engine.h	fix bug of multicard grad ncclAllReduce (#30553 )	4 years ago
data_loader.cc	use iwyu clean include (#27267 )	4 years ago
data_loader.h	Refine DataLoader support multi-processing (#23107 )	5 years ago
dygraph_grad_maker.h	Add Inplace strategy (Output reuse Input Varbase) in dygraph (#30103 )	4 years ago
engine.h	Add dygraph double grad implementation (#22939 )	5 years ago
execution_context.h	support Exhaustive search in dygraph (#23415 )	5 years ago
flags.cc	Fix dygraph mem leak (#18082 )	6 years ago
flags.h	Fix dygraph mem leak (#18082 )	6 years ago
gradient_accumulator.cc	support dygraph in xpu place (#30051 )	4 years ago
gradient_accumulator.h	support grad accumulated across batch (#29942 )	4 years ago
hooks.h	Add basic hook classes for dygraph & implement reduce hook (#28584 )	4 years ago
infer_shape_context.h	[OpDevOptimize] Add common infershape functions (#26096 )	5 years ago
infer_var_type_context.h	improve efficiency of runtime InferVarType (#22778 )	5 years ago
layer.cc	add some RecordEvent, for dygraph timeline (#30299 )	4 years ago
layer.h	Add Inplace strategy (Output reuse Input Varbase) in dygraph (#30103 )	4 years ago
nccl_context.cc	[Prepare for MultiProcess xpu] unified gen nccl id, refine imperative reducer (#30455 )	4 years ago
nccl_context.h	[Prepare for MultiProcess xpu] unified gen nccl id, refine imperative reducer (#30455 )	4 years ago
op_base.h	Add Inplace strategy (Output reuse Input Varbase) in dygraph (#30103 )	4 years ago
parallel_context.h	[Prepare for MultiProcess xpu] unified gen nccl id, refine imperative reducer (#30455 )	4 years ago
partial_grad_engine.cc	Add Inplace strategy (Output reuse Input Varbase) in dygraph (#30103 )	4 years ago
partial_grad_engine.h	Update the demo code and the doc of varbase.backward. (#26506 )	5 years ago
prepared_operator.cc	[Complex] Simplify prepared op impl to improve performance (#30153 )	4 years ago
prepared_operator.h	Fix dtype of ungenerated grad var (#28511 )	4 years ago
profiler.cc	fix header file paths of gflags, commit 1, test=develop (#30271 )	4 years ago
profiler.h	1. Add imperative gperf profiler	6 years ago
reducer.cc	[Prepare for MultiProcess xpu] unified gen nccl id, refine imperative reducer (#30455 )	4 years ago
reducer.h	[Prepare for MultiProcess xpu] unified gen nccl id, refine imperative reducer (#30455 )	4 years ago
saved_variable_wrapper_list.h	Add dygraph double grad implementation (#22939 )	5 years ago
tracer.cc	add some RecordEvent, for dygraph timeline (#30299 )	4 years ago
tracer.h	Add Inplace strategy (Output reuse Input Varbase) in dygraph (#30103 )	4 years ago
type_defs.h	Add dygraph double grad implementation (#22939 )	5 years ago
variable_wrapper.h	fix error message of Inplace strategy (#30520 )	4 years ago

README.md

Unescape Escape

Overview

Imperative Programming is easier to learn, debug and try new ideas.

Pytorch

https://pytorch.org/

TensorFlow Eager

https://www.tensorflow.org/guide/eager

Design

API

class Layer(object):

  def __call__(inputs):
    # build some parameter once.
    # ...
    return self.apply(inputs):

  def forward(inputs):
    # forward logic with paddle operators. backward auto-generated.


class PyLayer(core.PyLayer):

  def __call__(cls, inputs):
    # trace the logic.

  @staticmethod
  def forward(inputs):
    # any forward logic implemented with numpy io.

  @staticmethod
  def backward(inputs):
    # any backward logic implemented with numpy io.

Tracer

Current: Python Variable -> C++ VarBase -> C++ Variable -> C++ Tensor

Longer term.


# Parent class.
class PyVarBase(object):
  pass

# Current python variable.
class Variable(PyVarBase):
  pass

class IVariable(PyVarBase):
  def __init__(self):
    self._ivar = core.VarBase()

  # Move var to a device.
  def to(device): pass
  # Get var value.
  def value(): pass
  # Trigger backward.
  def backward(): pass
  # Get var's gradient value.
  def gradient_value(): pass
  # operators to override.

class Tracer {
 public:
  explicit Tracer(framework::BlockDesc* root_block) : root_block_(root_block) {}

  virtual ~Tracer() {}

  void Trace(OpBase* op,
             const std::map<std::string, std::vector<VarBase*>>& inputs,
             const std::map<std::string, std::vector<VarBase*>>& outputs,
             framework::BlockDesc* block, const bool stop_gradient = false);

  std::vector<VarBase*> PyTrace(OpBase* op, const std::vector<VarBase*>& inputs,
                                bool stop_gradient = false);
};

Trace forward operations
Perform quick shape/type infer, push kernel execution engine and return to user.
Perform autograd to generate gradients.
Clear trace.
Apply gradients with optimizers

Autodiff

Lots of research already. https://autodiff-workshop.github.io/ https://en.wikipedia.org/wiki/Automatic_differentiation

Basically, trace the forward execution, and perform autodiff when needed.

Can be triggered by backward().
Can select a block of code to trace and autodiff.
Use require_grad to drop some forward subgraph that doesn't need autodiff.

Execution Engine

Lazy execution of pushed C++ operations.

Device Placement

Operator executes on the inputs' device.
All inputs should live on the same device.
use Var.to() to explicitly move var to a device.

Save/Load Models

TODO

I/O

TODO

Refactor

All function layers with parameters converted to class Layers.
Existing models converted to imperative mode.
All op tests run once in static graph, once in imperative mode.

Examples

class MyLayer(fluid.imperative.Layer):
    def __init__(self):
        super(MyLayer, self).__init__()

    def forward(self, inputs):
        x = fluid.layers.relu(inputs)
        x = fluid.layers.elementwise_mul(x, x)
        x = fluid.layers.reduce_sum(x)
        return [x]


class MyPyLayer(fluid.imperative.PyLayer):
    def __init__(self):
        super(MyPyLayer, self).__init__()

    @staticmethod
    def forward(inputs):
        return np.tanh(inputs[0])

    @staticmethod
    def backward(inputs):
        return np.array(dout) * (1 - np.square(np.array(out)))


np_inp = np.ones([2, 2], np.float32)
with fluid.imperative.guard():
    my_py_layer = MyPyLayer()
    outs = my_py_layer(np_inp)
    dy_out = np.sum(outs[0]._numpy())
    outs[0]._backward()
    dy_grad = var_inp._gradient()


class MLP(fluid.Layer):
    def __init__(self, input_size):
        super(MLP, self).__init__()
        self._linear1 = Linear(input_size,
                       3,
                       fluid.ParamAttr(
                           initializer=fluid.initializer.Constant(value=0.1)))
        self._linear2 = Linear(3,
                       4,
                       fluid.ParamAttr(
                           initializer=fluid.initializer.Constant(value=0.1)))

    def forward(self, inputs):
        x = self._linear1(inputs)
        x = self._linear2(x)
        x = fluid.layers.reduce_sum(x)
        return x


 np_inp = np.array([[1.0, 2.0], [3.0, 4.0]], dtype=np.float32)
 with fluid.dygraph.guard():
     var_inp = fluid.dygraph.base.to_variable(np_inp)
     mlp = MLP(input_size=2)
     out = mlp(var_inp)
     dy_out = out.numpy()
     out.backward()

Plan

2.1，3 fulltime, Can run a few simple models. (Currently, 2 20% engs)

4.1, 4 fulltime, Can run 6 models, Performance 70% Pytorch. Release alpha.

6.1, 5 fulltime, Performance close to Pytorch, can run multi-devices. Release Beta.

8.1, 5 fulltime, Works in general. Update existing models. Can compile to static graph, support more optimizations.

12.1 Done.

Discussion

TODO.

README.md Unescape Escape