It will be used for LoD information in LoDTensor since LoD is a copy
on write field.
It is pretty slow for copying LoD information between operators. For
resnet it will cost roughly 10% time of whole time, including reading
data.
* "add c++ side kernel selection"
* "add multiple kernel op test"
* "kernel selection only support cudnn"
* "better formatter"
* "small fix with UseCPU"
* "depends on change interface Get(Place, Library)"
* "fix CI"
* "fix python cudnn test"
* "leave the register cudnn op to another PR"
* "fix CI"
* "use all kernel by default"
* "fix CI"
* implement selectedrows serialize and deserialize
* make serialize/deserialize as global function
* recover send_imp.cc
* delete unused brackets
* fix compile error
* serialize version in LodTensor and SelecetedRows
* fix ci
* fix ci
* implement a simple threadpool
* unlock before cv.notify
* add done function
* add lock with GetAvailable function
* delete done_
* using call_once in GetInstance
* update by comment
* update comment
* enhance unit test for multi threads task
* Add LoDRankTable
LoD Rank Table stores the `level` of `lod` which is ordered by sequence
length in descending order. It is useful when implement dynamic RNN and
is shared by dynamic RNN memory, dynamic RNN slice input and dynamic
RNN slice output operators.
* Add InferVarType
* add sparse support for sum op
* typo fix
* fix gpu build error
* fix unittest error
* typo fix
* infer var type and shape in op_test
* follow comments
* fix build error
* bypass some unittests depend on NetOp
* Simplize Gradient Check
* Stash
* Extract apply_backward_pass to backward.py
Rename apply_backward_pass to append_backward_ops
* Use graph API to check gradient
* Fix ci
* Fix CI
* Fix backward for double precision
* Stash
* Fix CI
* Fix ci
* Ignore GRU test
* Ignore xe op
* Fix CI
* Fix softmax with xe gradient
The correct equation should be IG = OG * (d_softmax_with_xe())
* Fix typo
* Fix merge error
* Disable LRN
* "add model format design doc"
* "add restore function"
* "add parse protobuf"
* "move necessary information to saver.proto"
* "format code"
* "add gpu option"
* "add lod info"
* "add saveop python test wrapper"
* "checkpoint reuse save operator"
* "rewrite model format design doc"
* "async support needed"
* "fix run once"
* "fix doc based on comments"
* "refine based on comments"
* "fix based comments"
* "remove persistable flag from framework.proto"
* "add IndicateDataType to restore op"
* "add save test"
* "modify save restore code"
* "modified the restore logic"
* rm checkpoint_op.cc
* rm test_checkpoint_op.py
* "get inputs outputs name from execution context"
* Saving each variable to a independent file
* Fix bugs
* Rewrite save_restore_op_test with new Python framework
* Move `SaveOp` and `RestoreOp` from OpWithKernel to OpBase
* Refine unit test of SaveOp and RestoreOp
* fix compile errorwq
* Implement FC layer with helper
* Update LayerHelper
* Add debug string for Python ProtoBuf
and Rename `Sync` to `Flush`
* Add check of ProtoBuf initialization
* Layer wrapper for FC
* Fix unittest
* Fix CI
* Add code generator
* AttributeChecker Better error log and speicalize bool
Since lots of types can be cast to bool
* Complete mlp, fit_a_line
* Expose get global scope
* Make global scope not thread-safe
1. It is no need to make global scope thread-safe, since it will be
invoked in Python main thread.
2. Do not free the global scope when C++ exit. Let the OS free memories,
otherwise, we need to handle the destroy dependencies.
See
https://google.github.io/styleguide/cppguide.html#Static_and_Global_Variables
* Fix
* Implementation of simple conv_2d layer
* Stash
* Remove private data members in OpRegister
* Fix bugs
* Stash
* Expose FeedFetchList as VarType
* Change ProgramDesc not a global variable
* Polish code style
* Stash
* Correct implement BlockDesc destructor
* Correct implement BlockDesc destructor
* Unify program as parameter name
* Fix bugs
* Add unittest
* Fix unit test error
* Remove unused functions
* Add clone for Python Program
* Working on executor
* Stash
* Add glog as dependencies of ops
* Use VLOG to logging some information is helpful when we debug Paddle
* Expose VarDesc::persistable to Python
* Test executor
* Complete unittest
* Polish code
* Fix merge error
* Follow comment
* Polish Python Code
* Implement FC layer with helper
* Update LayerHelper
* Add debug string for Python ProtoBuf
and Rename `Sync` to `Flush`
* Add check of ProtoBuf initialization
* Layer wrapper for FC
* Fix unittest
* Fix CI
* Add code generator
* AttributeChecker Better error log and speicalize bool
Since lots of types can be cast to bool
* Complete mlp, fit_a_line
* Implementation of simple conv_2d layer
* Fix bugs
* Change ProgramDesc not a global variable
* Polish code style
* Stash
* Correct implement BlockDesc destructor
* Correct implement BlockDesc destructor
* Unify program as parameter name
* Fix bugs
* Add unittest
* Fix unit test error
* Remove unused functions
* Add clone for Python Program
* Compare OpDescBind directly
* add target to Backward, generate var in block when call backward
* modify backward_test
* fix executor_test
* set var desc default type to LOD_TENSOR
* update backward_test
* insert loss in the top level of backward
* create grad vars for all blocks in current program
* optimize code
* update test_program.py
* only create var for newly create blocks when backward