* add logs and fix a bug
* fix break buf
* modify path bugs
* fix by comments
* fix by comments
* add batch
* add float32tostring
* add pb support
* moidfy gotpaht
* compile ok
* add proto
* delete not need
* add proto
* add empty proto
* clean not need
* clean not need
* modify deps
* fix by comments and update depend
* fix compile error
* fix loop bugs
* add batch_norm_layer
* add img_conv_group layer and test
* add check to Tensor.type()
* forward can run
* with backward
* change label data time from int32 to int64
* refine code
* follow comment
* add sparse support for sum op
* typo fix
* fix gpu build error
* fix unittest error
* typo fix
* infer var type and shape in op_test
* follow comments
* fix build error
* bypass some unittests depend on NetOp
* support sparse output for lookup table grad op
* refine codes
* fix gpu build error
* fix lookup table grad gpu kernel
* fix ci
* fix ci
* fix ci
* fix bug in lookup_table_grad op
* fix bug in test_word2vec
* register double kernel for some operators
* set is_sparse=True in test_word2vec
* fix lookup table grad op CUDA kernel bug
* disable test_modified_huber_loss_op temporarily
* disable test_lstm_unit_op temporarily
* add sparse support for sum op
* typo fix
* fix gpu build error
* fix unittest error
* typo fix
* infer var type and shape in op_test
* follow comments
* fix build error
* bypass some unittests depend on NetOp
* Simplize Gradient Check
* Stash
* Extract apply_backward_pass to backward.py
Rename apply_backward_pass to append_backward_ops
* Use graph API to check gradient
* Fix ci
* Fix CI
* Fix backward for double precision
* Stash
* Fix CI
* Fix ci
* Ignore GRU test
* Ignore xe op
* Fix CI
* Fix softmax with xe gradient
The correct equation should be IG = OG * (d_softmax_with_xe())
* Fix typo
* Fix merge error
* Disable LRN
* init batch norm op
* prepare input output
* compute mean_out var_out save_mean save_var on CPU
* active is test
* use eigen to do computation
* complete batch norm forward
* set default momentum to 0.9
* add batch norm grad op in CPU
* add tensor_format and NHWC support, add python test
* add test training
* add batch norm gradient test
* improve comment, fix foward Python UnitTest
* add gradient test
* fix eigen warning
* follow name style
* fix a bug
* change float to T
* add simple forward test
* test with different place
* add backward test
* refine python test
* remove old python test code
* code clean
* follow code style
* update comment
* "add model format design doc"
* "add restore function"
* "add parse protobuf"
* "move necessary information to saver.proto"
* "format code"
* "add gpu option"
* "add lod info"
* "add saveop python test wrapper"
* "checkpoint reuse save operator"
* "rewrite model format design doc"
* "async support needed"
* "fix run once"
* "fix doc based on comments"
* "refine based on comments"
* "fix based comments"
* "remove persistable flag from framework.proto"
* "add IndicateDataType to restore op"
* "add save test"
* "modify save restore code"
* "modified the restore logic"
* rm checkpoint_op.cc
* rm test_checkpoint_op.py
* "get inputs outputs name from execution context"
* Saving each variable to a independent file
* Fix bugs
* Rewrite save_restore_op_test with new Python framework
* Move `SaveOp` and `RestoreOp` from OpWithKernel to OpBase
* Refine unit test of SaveOp and RestoreOp
* fix compile errorwq
* add test_fit_a_line
* Update
* fix persistable bug
* fix elementwise add bug
* set correct attr for bias op in fc layer
* set correct attr for bias op in fc layer
* Update
1. Add init_program to hold initializers
2. bug fix
* add test_fit_a_line
* fix persistable bug
* fix elementwise add bug
* fix type
* add gitignore
* Complete fit_a_line test
* revert code
* Clean up
* Revert "revert code"
This reverts commit eb1aa015cda4fc12b6dc778ada6c3507b98134f5.
* Refine
* Fix unit test
* Implement FC layer with helper
* Update LayerHelper
* Add debug string for Python ProtoBuf
and Rename `Sync` to `Flush`
* Add check of ProtoBuf initialization
* Layer wrapper for FC
* Fix unittest
* Fix CI
* Add code generator
* AttributeChecker Better error log and speicalize bool
Since lots of types can be cast to bool
* Complete mlp, fit_a_line
* Expose get global scope
* Make global scope not thread-safe
1. It is no need to make global scope thread-safe, since it will be
invoked in Python main thread.
2. Do not free the global scope when C++ exit. Let the OS free memories,
otherwise, we need to handle the destroy dependencies.
See
https://google.github.io/styleguide/cppguide.html#Static_and_Global_Variables
* Fix
* Implementation of simple conv_2d layer
* Stash
* Remove private data members in OpRegister
* Fix bugs
* Stash
* Expose FeedFetchList as VarType
* Change ProgramDesc not a global variable
* Polish code style
* Stash
* Correct implement BlockDesc destructor
* Correct implement BlockDesc destructor
* Unify program as parameter name
* Fix bugs
* Add unittest
* Fix unit test error
* Remove unused functions
* Add clone for Python Program
* Working on executor
* Stash
* Add glog as dependencies of ops
* Use VLOG to logging some information is helpful when we debug Paddle
* Expose VarDesc::persistable to Python
* Test executor
* Complete unittest
* Polish code
* Fix merge error
* Follow comment
* Polish Python Code
* Implement FC layer with helper
* Update LayerHelper
* Add debug string for Python ProtoBuf
and Rename `Sync` to `Flush`
* Add check of ProtoBuf initialization
* Layer wrapper for FC
* Fix unittest
* Fix CI
* Add code generator
* AttributeChecker Better error log and speicalize bool
Since lots of types can be cast to bool
* Complete mlp, fit_a_line
* Implementation of simple conv_2d layer
* Fix bugs
* Change ProgramDesc not a global variable
* Polish code style
* Stash
* Correct implement BlockDesc destructor
* Correct implement BlockDesc destructor
* Unify program as parameter name
* Fix bugs
* Add unittest
* Fix unit test error
* Remove unused functions
* Add clone for Python Program
* Compare OpDescBind directly
* Implement FC layer with helper
* Update LayerHelper
* Add debug string for Python ProtoBuf
and Rename `Sync` to `Flush`
* Add check of ProtoBuf initialization
* Layer wrapper for FC
* Fix unittest
* Fix CI
* Add code generator
* AttributeChecker Better error log and speicalize bool
Since lots of types can be cast to bool
* Complete mlp, fit_a_line
* Implementation of simple conv_2d layer
* Fix bugs
* Correct implement BlockDesc destructor
* Fix bugs
* Fix unit test error
* Follow comments
* Implement FC layer with helper
* Update LayerHelper
* Add debug string for Python ProtoBuf
and Rename `Sync` to `Flush`
* Add check of ProtoBuf initialization
* Layer wrapper for FC
* Fix unittest
* Fix CI
* Add code generator
* AttributeChecker Better error log and speicalize bool
Since lots of types can be cast to bool
* Complete mlp, fit_a_line
* Implementation of simple conv_2d layer
* Fix bugs
* Remove debug code
* initial matmul operator
Similar to np.matmul, but also has transpose_X and transpose_Y flags,
and only supports tensors from rank 1 to 3 inclusive.
For GPU, uses cublas?gemmStridedBatched. For CPU, uses
cblas_?gemm_batch if available via MKL; otherwise a simple serial
implementation that loops over the batch dimension is employed for now.
* init parameter base class
* optimize the Comments of optimizer
* basic implimentation of optimizer
* add test_optimizer
* add no_grad_set to interface
* update optimizer.py
* python code can run
* fix some problem
* add sync_with_cpp to Python Program and Block
* sync vars and ops in block from cpp
* optimize code and add some comment
* add more check for sync
* update optimizer with return value of Backward
* rm unused code
* infer shape when create gradient vairiable
* update test_optimizer
* update test_program.py
* update backward test
* follow comment
* Implement FC layer with helper
* Update LayerHelper
* Add debug string for Python ProtoBuf
and Rename `Sync` to `Flush`
* Add check of ProtoBuf initialization
* Layer wrapper for FC
* Fix unittest
* Fix CI
* Add code generator
* AttributeChecker Better error log and speicalize bool
Since lots of types can be cast to bool
* Complete mlp, fit_a_line
* add target to Backward, generate var in block when call backward
* modify backward_test
* fix executor_test
* set var desc default type to LOD_TENSOR
* update backward_test
* insert loss in the top level of backward
* create grad vars for all blocks in current program
* optimize code
* update test_program.py
* only create var for newly create blocks when backward