You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
Paddle/paddle/fluid/imperative
wanghuancoder d1b25ed9d7
add some RecordEvent, for dygraph timeline (#30299)
4 years ago
..
jit Refine error msg in paddle/fluid/imperative (#27521) 4 years ago
tests [Prepare for MultiProcess xpu] unified gen nccl id, refine imperative reducer (#30455) 4 years ago
CMakeLists.txt Support dynamic graph distributed (#28997) 4 years ago
README.md fix sample code in paddle/fluid/imperative/README.md (#22141) 5 years ago
all_reduce.cc [Prepare for MultiProcess xpu] unified gen nccl id, refine imperative reducer (#30455) 4 years ago
all_reduce.h [Prepare for MultiProcess xpu] unified gen nccl id, refine imperative reducer (#30455) 4 years ago
amp_auto_cast.cc support layer_norm fp16 in dygraph amp (#30430) 4 years ago
amp_auto_cast.h support layer_norm fp16 in dygraph amp (#30430) 4 years ago
basic_engine.cc add some RecordEvent, for dygraph timeline (#30299) 4 years ago
basic_engine.h fix bug of multicard grad ncclAllReduce (#30553) 4 years ago
data_loader.cc use iwyu clean include (#27267) 4 years ago
data_loader.h Refine DataLoader support multi-processing (#23107) 5 years ago
dygraph_grad_maker.h Add Inplace strategy (Output reuse Input Varbase) in dygraph (#30103) 4 years ago
engine.h Add dygraph double grad implementation (#22939) 5 years ago
execution_context.h support Exhaustive search in dygraph (#23415) 5 years ago
flags.cc Fix dygraph mem leak (#18082) 6 years ago
flags.h Fix dygraph mem leak (#18082) 6 years ago
gradient_accumulator.cc support dygraph in xpu place (#30051) 4 years ago
gradient_accumulator.h support grad accumulated across batch (#29942) 4 years ago
hooks.h Add basic hook classes for dygraph & implement reduce hook (#28584) 4 years ago
infer_shape_context.h [OpDevOptimize] Add common infershape functions (#26096) 5 years ago
infer_var_type_context.h improve efficiency of runtime InferVarType (#22778) 5 years ago
layer.cc add some RecordEvent, for dygraph timeline (#30299) 4 years ago
layer.h Add Inplace strategy (Output reuse Input Varbase) in dygraph (#30103) 4 years ago
nccl_context.cc [Prepare for MultiProcess xpu] unified gen nccl id, refine imperative reducer (#30455) 4 years ago
nccl_context.h [Prepare for MultiProcess xpu] unified gen nccl id, refine imperative reducer (#30455) 4 years ago
op_base.h Add Inplace strategy (Output reuse Input Varbase) in dygraph (#30103) 4 years ago
parallel_context.h [Prepare for MultiProcess xpu] unified gen nccl id, refine imperative reducer (#30455) 4 years ago
partial_grad_engine.cc Add Inplace strategy (Output reuse Input Varbase) in dygraph (#30103) 4 years ago
partial_grad_engine.h Update the demo code and the doc of varbase.backward. (#26506) 5 years ago
prepared_operator.cc [Complex] Simplify prepared op impl to improve performance (#30153) 4 years ago
prepared_operator.h Fix dtype of ungenerated grad var (#28511) 4 years ago
profiler.cc fix header file paths of gflags, commit 1, test=develop (#30271) 4 years ago
profiler.h 1. Add imperative gperf profiler 6 years ago
reducer.cc [Prepare for MultiProcess xpu] unified gen nccl id, refine imperative reducer (#30455) 4 years ago
reducer.h [Prepare for MultiProcess xpu] unified gen nccl id, refine imperative reducer (#30455) 4 years ago
saved_variable_wrapper_list.h Add dygraph double grad implementation (#22939) 5 years ago
tracer.cc add some RecordEvent, for dygraph timeline (#30299) 4 years ago
tracer.h Add Inplace strategy (Output reuse Input Varbase) in dygraph (#30103) 4 years ago
type_defs.h Add dygraph double grad implementation (#22939) 5 years ago
variable_wrapper.h fix error message of Inplace strategy (#30520) 4 years ago

README.md

Overview

Imperative Programming is easier to learn, debug and try new ideas.

Related Works

Pytorch

https://pytorch.org/

TensorFlow Eager

https://www.tensorflow.org/guide/eager

Design

API

class Layer(object):

  def __call__(inputs):
    # build some parameter once.
    # ...
    return self.apply(inputs):

  def forward(inputs):
    # forward logic with paddle operators. backward auto-generated.


class PyLayer(core.PyLayer):

  def __call__(cls, inputs):
    # trace the logic.

  @staticmethod
  def forward(inputs):
    # any forward logic implemented with numpy io.

  @staticmethod
  def backward(inputs):
    # any backward logic implemented with numpy io.

Tracer

Current: Python Variable -> C++ VarBase -> C++ Variable -> C++ Tensor

Longer term.


# Parent class.
class PyVarBase(object):
  pass

# Current python variable.
class Variable(PyVarBase):
  pass

class IVariable(PyVarBase):
  def __init__(self):
    self._ivar = core.VarBase()

  # Move var to a device.
  def to(device): pass
  # Get var value.
  def value(): pass
  # Trigger backward.
  def backward(): pass
  # Get var's gradient value.
  def gradient_value(): pass
  # operators to override.
class Tracer {
 public:
  explicit Tracer(framework::BlockDesc* root_block) : root_block_(root_block) {}

  virtual ~Tracer() {}

  void Trace(OpBase* op,
             const std::map<std::string, std::vector<VarBase*>>& inputs,
             const std::map<std::string, std::vector<VarBase*>>& outputs,
             framework::BlockDesc* block, const bool stop_gradient = false);

  std::vector<VarBase*> PyTrace(OpBase* op, const std::vector<VarBase*>& inputs,
                                bool stop_gradient = false);
};
  • Trace forward operations
  • Perform quick shape/type infer, push kernel execution engine and return to user.
  • Perform autograd to generate gradients.
  • Clear trace.
  • Apply gradients with optimizers

Autodiff

Lots of research already. https://autodiff-workshop.github.io/ https://en.wikipedia.org/wiki/Automatic_differentiation

Basically, trace the forward execution, and perform autodiff when needed.

  • Can be triggered by backward().
  • Can select a block of code to trace and autodiff.
  • Use require_grad to drop some forward subgraph that doesn't need autodiff.

Execution Engine

Lazy execution of pushed C++ operations.

Device Placement

  • Operator executes on the inputs' device.
  • All inputs should live on the same device.
  • use Var.to() to explicitly move var to a device.

Save/Load Models

TODO

I/O

TODO

Refactor

  • All function layers with parameters converted to class Layers.
  • Existing models converted to imperative mode.
  • All op tests run once in static graph, once in imperative mode.

Examples

class MyLayer(fluid.imperative.Layer):
    def __init__(self):
        super(MyLayer, self).__init__()

    def forward(self, inputs):
        x = fluid.layers.relu(inputs)
        x = fluid.layers.elementwise_mul(x, x)
        x = fluid.layers.reduce_sum(x)
        return [x]


class MyPyLayer(fluid.imperative.PyLayer):
    def __init__(self):
        super(MyPyLayer, self).__init__()

    @staticmethod
    def forward(inputs):
        return np.tanh(inputs[0])

    @staticmethod
    def backward(inputs):
        return np.array(dout) * (1 - np.square(np.array(out)))


np_inp = np.ones([2, 2], np.float32)
with fluid.imperative.guard():
    my_py_layer = MyPyLayer()
    outs = my_py_layer(np_inp)
    dy_out = np.sum(outs[0]._numpy())
    outs[0]._backward()
    dy_grad = var_inp._gradient()


class MLP(fluid.Layer):
    def __init__(self, input_size):
        super(MLP, self).__init__()
        self._linear1 = Linear(input_size,
                       3,
                       fluid.ParamAttr(
                           initializer=fluid.initializer.Constant(value=0.1)))
        self._linear2 = Linear(3,
                       4,
                       fluid.ParamAttr(
                           initializer=fluid.initializer.Constant(value=0.1)))

    def forward(self, inputs):
        x = self._linear1(inputs)
        x = self._linear2(x)
        x = fluid.layers.reduce_sum(x)
        return x


 np_inp = np.array([[1.0, 2.0], [3.0, 4.0]], dtype=np.float32)
 with fluid.dygraph.guard():
     var_inp = fluid.dygraph.base.to_variable(np_inp)
     mlp = MLP(input_size=2)
     out = mlp(var_inp)
     dy_out = out.numpy()
     out.backward()

Plan

2.13 fulltime, Can run a few simple models. (Currently, 2 20% engs)

4.1, 4 fulltime, Can run 6 models, Performance 70% Pytorch. Release alpha.

6.1, 5 fulltime, Performance close to Pytorch, can run multi-devices. Release Beta.

8.1, 5 fulltime, Works in general. Update existing models. Can compile to static graph, support more optimizations.

12.1 Done.

Discussion

TODO.