* Remove some superfluous std::move calls
The std:move triggered a build error (with -Werror):
```
[ 9%] Building CXX object paddle/fluid/memory/allocation/CMakeFiles/allocator_facade.dir/allocator_facade.cc.o
/home/tej/code/gbuella_paddle/paddle/fluid/memory/allocation/allocator_facade.cc:86:29: error: moving a temporary object prevents copy elision [-Werror,-Wpessimizing-move]
[this] { return std::move(CreateAllocatorWithChunk()); }, capacity);
^
/home/tej/code/gbuella_paddle/paddle/fluid/memory/allocation/allocator_facade.cc:86:29: note: remove std::move call here
[this] { return std::move(CreateAllocatorWithChunk()); }, capacity);
^~~~~~~~~~ ~
1 error generated.
```
See: https://reviews.llvm.org/D7633
* Remove a superfluous lambda capture from framework/operator.h
```
[ 10%] Building CXX object paddle/fluid/platform/CMakeFiles/device_context.dir/init.cc.o
In file included from /home/tej/code/gbuella_paddle/paddle/fluid/platform/init.cc:19:
/home/tej/code/gbuella_paddle/paddle/fluid/framework/operator.h:229:21: error: lambda capture 'this' is not used [-Werror,-Wunused-lambda-capture]
[this](Variable* var) { return var; });
^~~~
1 error generated.
```
Changing it to `return it->second;`, as is in the function below.
* Rethrow an exception (instead of copying it)
```
[ 11%] Building CXX object paddle/fluid/framework/CMakeFiles/operator.dir/operator.cc.o
/home/tej/code/gbuella_paddle/paddle/fluid/framework/operator.cc:191:13: error: local variable 'exception' will be copied despite being thrown by name [-Werror,-Wreturn-std-move]
throw exception;
^~~~~~~~~
/home/tej/code/gbuella_paddle/paddle/fluid/framework/operator.cc:191:13: note: call 'std::move' explicitly to avoid copying
throw exception;
^~~~~~~~~
std::move(exception)
```
See https://reviews.llvm.org/D43322 for an explanation of this diagnostic message.
* Remove an unused variable
```
/home/tej/code/gbuella_paddle/paddle/fluid/framework/operator.cc:884:16: error: private field 'scope_' is not used [-Werror,-Wunused-private-field]
const Scope& scope_;
^
```
* struct ComputationOpHandle -> class ComputationOpHandle
```
[ 13%] Building CXX object paddle/fluid/framework/details/CMakeFiles/memory_early_delete_pass.dir/memory_early_delete_pass.cc.o
In file included from /home/tej/code/gbuella_paddle/paddle/fluid/framework/details/memory_early_delete_pass.cc:21:
/home/tej/code/gbuella_paddle/paddle/fluid/framework/details/reference_count_pass_helper.h:30:1: error: class 'ComputationOpHandle' was previously declared as a struct; this is valid, but may result in linker errors under the Microsoft C++ ABI [-Werror,-Wmismatched-tags]
class ComputationOpHandle;
^
/home/tej/code/gbuella_paddle/paddle/fluid/framework/details/computation_op_handle.h:29:8: note: previous use is here
struct ComputationOpHandle : public OpHandleBase {
^
/home/tej/code/gbuella_paddle/paddle/fluid/framework/details/reference_count_pass_helper.h:30:1: note: did you mean struct here?
class ComputationOpHandle;
^~~~~
struct
1 error generated.
```
* Fix name() methods under fluid/operators
```
In file included from /home/tej/code/gbuella_paddle/paddle/fluid/operators/jit/gen/act.cc:15:
In file included from /home/tej/code/gbuella_paddle/paddle/fluid/operators/jit/gen/act.h:19:
/home/tej/code/gbuella_paddle/paddle/fluid/operators/jit/gen/jitcode.h:71:23: error: 'name' overrides a member function but is not marked 'override' [-Werror,-Winconsistent-missing-override]
virtual const char* name() const = 0;
^
/home/tej/code/gbuella_paddle/paddle/fluid/operators/jit/gen_base.h:31:23: note: overridden virtual function is here
virtual const char* name() const = 0;
^
```
test=develop
* AsyncExecutor: C++ side
* Google naming conventions
* Rename MultiExecutor to AsyncExecutor
* pybind with async_executor
* Naming convention
* remove some flags and unused code
* add refactored file of async_executor and data_feed
* clear async executor interface and add data feed factory
* split async executor into executor_thread_worker and async_executor, refactor pybind, add datafeed and corresponding proto
* Fix async_executor interfaces: 1) Remove all protobufs; 2) Stop after each epoch
* refine async_executor_refactor.cc
* add some files about datafeed
* Revert "add some files about datafeed"
This reverts commit 8ee8133ab841196925a2812b76f18d2812a6701d.
* Interface rework
* add MultiSlotDataFeed
* Creating DataFeedDesc from .proto file, then manipulate it (add/del fields etc) from python side
* update data_feed for add MultiSlotDataFeed
* update datafeed and async_executor to run bow_net demo
* fix bug that finish_set_filelist failed in multithread
* delete finish_binding_memory_(flag), because it can not be marked under the current interface
* Fix bug
* update async_executor.py for support set_use_slots
* update async_executor.py for support set_use_slots and set set_dense_slots
* fix bug that when the number of files is less than the number of threads, it will fetch nan
* remove redundant code, and make executor exit when set a illegal queue size
* add batch_size check
* add MultiSlotDesc
* Revert "add MultiSlotDesc"
This reverts commit 2e72ebfad364ed6b5dcc75f38ffb2a1fdec83d8e.
* add some checkpoint in DataFeedDesc
* add CheckFile function in MultiSlotDataFeed
* update something error info
* fix deaded lock bug
* Fix fetch variable
* Merge error
* fix code style in async_executor
* using one lock blocking queue replace two lock blocking queue because of some bugs
* update code style
* add utest for data_feed
* Fix fetch var
* update utest for data_feed for multithread
* update SetFileList info
* fix bug in utest of data_feed
* Add comments for python
* Add comments for python code
* Fix pybind.cc with new pybind11 version
* add note for DataFeedDesc's set_use_slots function
* Add save_model
* update data_feed_test for multi-type
* add comment for executor_thread_worker
* Remove unused code
* update data_feed_test for generate test data file
* removed unnecessary interfaces and add comments
* c++ style check
* update data_feed.cc
* AsyncExecutor: C++ side
Google naming conventions
Rename MultiExecutor to AsyncExecutor
pybind with async_executor
Naming convention
remove some flags and unused code
add refactored file of async_executor and data_feed
clear async executor interface and add data feed factory
split async executor into executor_thread_worker and async_executor, refactor pybind, add datafeed and corresponding proto
Fix async_executor interfaces: 1) Remove all protobufs; 2) Stop after each epoch
refine async_executor_refactor.cc
add some files about datafeed
Revert "add some files about datafeed"
This reverts commit 8ee8133ab841196925a2812b76f18d2812a6701d.
add MultiSlotDataFeed
Interface rework
Creating DataFeedDesc from .proto file, then manipulate it (add/del fields etc) from python side
update datafeed and async_executor to run bow_net demo
update async_executor.py for support set_use_slots
Fix bug
update async_executor.py for support set_use_slots and set set_dense_slots
fix bug that when the number of files is less than the number of threads, it will fetch nan
remove redundant code, and make executor exit when set a illegal queue size
add MultiSlotDesc
Revert "add MultiSlotDesc"
This reverts commit 2e72ebfad364ed6b5dcc75f38ffb2a1fdec83d8e.
add some checkpoint in DataFeedDesc
Fix fetch variable
fix code style in async_executor
Fix fetch var
add utest for data_feed
Add comments for python
update utest for data_feed for multithread
fix bug in utest of data_feed
Add comments for python code
Fix pybind.cc with new pybind11 version
add note for DataFeedDesc's set_use_slots function
update data_feed_test for multi-type
Add save_model
update data_feed_test for generate test data file
removed unnecessary interfaces and add comments
add comment for executor_thread_worker
Remove unused code
update data_feed.cc
c++ style check
* commit for code style
* commit for code style
* commit for code style
* commit for code style
* Comment away __init__ in async_executor.py
* clang-format fix test=develop
* use PADDLE_THROW instead of exit(-1); use unique_ptr to manage scope var in data_feed_test.cc
* commit for update code style
* commit for update code style
* Add async_executor demo; Remove some methods
test=develop
* commit for update code style
* commit for update code style
* commit for update code style
* update API.spec
* AsyncExecutor
test=develop
* AsyncExecutor
test=develop
* AsyncExecutor
test=develop
* AsyncExecutor
test=develop
* Fix API.spec
test=develop
* Fix API.spec
test=develop
* Fix windows build error
test=develop
* FIx windows build error
test=develop
* FIx windows build error
test=develop
* FIx windows build error
test=develop
* Fix Windows Build
test=develop
* Fix Windows Build
test=develop
* Fix Windows Build
test=develop
* Fix code style
test=develop
* Fix code style
test=develop
* update datafeed
* Fix code style
test=develop
* update data_feed_test for test Tensor test=develop
* Fix code style
test=develop
* Fix windows build failure
test=develop
* Fix code style and windows build failure
test=develop
* Fix PYTHON3.5 build failure
test=develop
* AsyncExecutor API
test=develop
* HIP cmake.
Enable whole archieve build for pybind library.
Disable two warning.
Rollback to C++11.
Link RCCL to WA gpu kernel loading issue.
Update eigen to fix build failure.
Add more include directories.
Fix O3 build failure.
Update eigen.
fix tensor_util_test segment fault issue
add more macro check in hip.cmake.
we may consider refine hip.cmake to inherit all add_definitions() in parrent scope, in the future.
Fix rocRAND load.
Update eigen to fix gru_unit_op and reduce_op.
Add HIP support to testing.
Update eigen to support int16 and int8 in arg min and arg max.
* add rocprim as cub library used by nv implementation
* Reduce build time in rocprim.
* Add rocprim introduction, remove useless cmake code.
* Remove useless flags and format cmake file.
* add recordio support
* disable the openblas multi-thread on windows since no support
adjust the python script
* code style
* code style
test=develop
* add create_recordio_file_reader back
* fix code style
test=develop
* fix the gtest.cmake on windows
* fix cc_test on windows
* fix the win build
test=develop
* remove fused compile support on windows
test=develop
* add the jit support
test=develop
* add the jit support, test=develop
* add the jit support, test=develop
* add the jit back
fix compile error on windows
* rollback test=develop
* test case fix
* disable DSO by default on windows
* exclude warpctc_op on windows
* exclude the dynload_warpctc out on windows
test=develop
* fix the scripts error
test=develop
* disable avx on windows by default
test=develop
* re-organize the cmake file
* disable mkl on windows by default
* add warp_ctc back
* fix the dependency
* fix the dependency
* fix the build issue on windows
* remove unsupported flag on windows
* code style
* code style
test=develop
* fix issue
* add profiler, parallel_executor back
* clean up the pre-definitions on windows
* fix build issue
* test=develop
1. Non utf-8 characters within comments of OPs may lead to protobuf fail to parse_from_string
2. comment out some ops which not supported on windows
3. cuda libs may not be correctly linked to target on windows
* wip
* clean up
* should fix running with memopt
* add ut
* mark lr schedule op role
* hide lr_schedule_guard
* use op_role_var instead of ufind
* unify dist test name
* wip for py3 support
* fix var deref
* fix python3 mem_opt order
* remove comments
* Add Python Callstacks when Op::Run error
* Skip op with sub-block
* refactor: refine callstack info's format
* Reshape only support matrix
* Polish Python code
* Fix UT
* Fix Py3
* add lod_tensor util and modify pybind
* refind pybind LoDTensor API and modify LoDTensor and DataFeeder test
* fix test error
* fix detection map op test
* fix reorder_lod_tensor test
* fix seq_concat_op
* fix chunk evel op test
* fix target assign op
* fix warp ctc op
* address comments step 1: reverse reset_lod op
* step 2: modify op test
* add warning message
* remove has_valid_lod
* add back has_valid_lod
* address comments
* add exception catching trial
* remove net op and cond_op
* fix cpplint
* fix dependency
* delete backward_test; fix compile
* disable batch_norm backward
* rm test_net.py
* make batchnorm test independent of backward.cc
* make test_layer_norm_op independent of backward.cc
* make test_layer_norm_op independent of backward.cc
* delete unused code
* clean up
1. Add option WITH_AMD_GPU.
2. Add cmake/hip.cmake for HIP toolchain.
3. Some external module such as eigen may need HIP port.
4. Add macro hip_library/hip_binary/hip_test to cmake/generic.cmake.
5. Add one HIP source concat.hip.cu as an example. Each .cu may have its corresponding .hip.cu.
* add fp16 mul op support
* small fix
* fix bug
* small fix
* fix PADDLE_WITH_CUDA compiling issue
* reorg code
* test for pybind
* treate as float16 as uint16_t in pybind
* bind np.float16 to paddle float16
* small fix
* clean code
* remove redundancy
* fix mul_op test
* address comments
* small fix
* add is_float16_supported func
* Adding set_capacity method support
* Adding Python for make_channel
* Updating notest_concurrency
* Write python for make_channel method
* Write python for make_channel method
* Fix make_channel and test
* Placeholder ops for channel send, recv and close
* Adding ToTypeIndex method to var_type.h
* Add var_type.h to channel:
* Added POD_Type to the method
* Add CHANNEL to executor
* Updated get and set DataType to accomodate Channels
* Updating get and set to incorporate channels
* Adding CHANNEL as supported VarType in protobuf
* Removing unecessary import
* Fixing VarDesc to adapt to Channel as VarType
* Add channel.h to executor
* Remove innclude from channel
* Updated var_type to support Channel as var type
* Adding get_channel to pybind
* Added ChannelHolder
* Adding make_channel as an op
* Adding ChannelHolder in channel
* Fixing typo
* Commenting out operators in concurrency
* Removing totypeid right now since we don't need it.
* Reverting python changes
* Fixing typo in framework.py
* Modify comments for ReaderHolder