* refactor dygraph,test=develop
* fix failed unittest,test=develop
* polish code,test=develop
* check windows ci error,test=develop
try to fix windows ci error by np.allclose,test=develop
* polish vlog and profiler, test=develop
* try to fix preceding ops order,test=develop
* test transformer in windows ci, test=develop
* use python c-api to speed up tracer.trace,test=develop
* test=develop, fix docker with paddle nccl problem
* test=develop, add ut for debug string and gradient_accumulator
* test=develop, add tests for layer/gradient_accumulator/prepared_op
* test=develop, fix complie error for test_prepared_op
* test=develop, add more ut for dygraph
* test=develop, create API.spec for dygraph api change
* test=develop, refoctor name to make it easier to understand
* test=develop, refoctor name to make it easier to understand
* test=develop, fix multi-gpu failed problem , add Tracer tests, change PADDLEENFORCE to PADDLEENFORCE_EQ
* test=develop, fix ut failed on parallel se-resnext
* test=develop, change one more PADDLE_ENFORCE
* test=develop add a argument for softshrink python api
* test=develop fix doc format
test=develop fix doc format
* test=develop fix API.spec
test=develop fix API.spec
* test=develop
Fix the scatter op bug when use the add mode, and support the int64 data type of scatter_op Index(#18804).
* test=develop
Remove the PADDLE_ENFORCE and use PADDLE_ENFORCE_EQ
* test=develop
Remove the fix bug of scatter_add, and just add the support of int64 in scatter_add
* test=develop
Add the test case for scatter op, the test case just for index int64
* add to and detach for Variable in dygraph, test=develop
* add detach for Variable in dygraph, test=develop
* add detach for Variable in dygraph, test=develop
* add detach for Variable in dygraph, test=develop
* add detach for Variable in dygraph, test=develop
* add detach for Variable in dygraph, test=develop
* add exception check, test=develop
* Support looking up embeddings from BoxPS.
* Add a _pull_box_sparse op, for now this op is not exposed to users.
* Add a BoxHelper class, providing 'BeginPass', 'EndPass', 'FeedPass' functions and so on.
* Add 'BoxPSDataset' in python code.
* Add a compile options WITH_BOX_PS and a MACRO PADDLE_WITH_BOX_PS.
* Add UT.
* More concrete information pls refer to: https://github.com/PaddlePaddle/Paddle/pull/18982
- Refactor step 1
- Compilation fix
- Yet another compilation fix
- Even more compilation fix
- Lint fixes
test=develop
- Removed deprectaed PADDLE_ENFORCE occurance
test=develop
- Candidate fix to BN forward
- Lint fixes
test=develop
- Refactoring in data_layout_transform
- compilation fix
- Another comppilation fix
- Step further into darkness
- Yet another compilation fix
- Yet another compilation fix
- missing header
- compilation fix
- Added MKLDNN -> Paddle conversion in fetch op
test=develop
- Compilation fix
test=develop
- Lint
test=develop
- Mul fix
- Fix to MKLDNN MUL op and Elementwise MUL UT
test=develop
- Workaround for diffrent weights with groups representation Paddle vs
MKL-DNN.
test=develop
- Candidate fix for 5D convolution with groups
- Refactor of fix for conv3d and conv2d in fetch op
test=develop
- Compilation fix
- Still same compilation fix
- Compilation fix
- Compilation fix
- Reverted refactoring of fixes
- Adapted test_conv2d_int8_mkldnn so it exects data in NCHW format
not NHWC
test=develop
- minor fix in UT
test=develop
- Lint fixes
test=develop
* fix con2d transpose bias by create and init it in build_onee
* fix API spec
* test=develop, invoke ci
* fix bias_attr and act has no effect error on layer norm, conv2dTranpose, billinearTensorProduct, sequece_conv. fix original_mode not used error on GRUunit. fix sample_weight not set error on NCE. Add ut for all thoese layer
* test=develop, change success standard for conv2dTranspose
* test=develop, fix test_layers to invoke some error branch
* test=develop, fix sample code
* test=develop, fix BilinearTensorProduct failed in dygraph mode
* test=develop, fix test_layers segment fault error
* fix correctness of the communicator
* fix a bug in send thread when sending var context is empty, test=develop
* add lookup_table_prefetch_op and prefetch optimize, test=develop
* remove remote prefetch GPU supported
* word2vec force with CPU, test=develop
* test dist remote lookup table force with CPU, test=develop
* supports multiple NCCL communicators preserved in NCCLCommContext
test=develop
* add ut for c_comm_init_all operator and fix cuda resource release problem
test=develop
* support tensor input with padding for warpctc op
* merge with develop
* test=develop
* modified python API examples test=develop
* nn.py is modified for code coverage test=develop
* update documents info about warpctc op in API.spec test=develop
* add test_warpctc_with_padding in test_layers test=develop
* add warning log for cuda_version back to warpctc_op.cc
* modify API.spec for warpctc op test=develop
* modify API.spec
* update warpctc test to new CompiledProgram API test=develop
* modify code examples for warpctc op test=develop
* modify API.spec for warpctc op test=develop
* modify API.spec for warpctc op test=develop
* Implement the operator with sprase matrix multiply
* Update the URL of mklml library.
test=develop
* Disable MKLML implematation when using no-linux.
test=develop
* optimize bp with mkl sparse matrix
test=develop
* add pybind interface to get all inplace ops, test=develop
* enhance OpTest to check whether the consistency of operator when using and not using inplace, test=develop
* handle corner cases in op_test, test=develop
* support outputs without tensor holder_, like XShape in reshape_op, test=develop
* fix bug, some op has GradOpMaker, but actually no grad_op in OpInfoMap, test=develop
* use reshape_grad instead of reshape in FlattenGradOp, test=develop
* fix error debug dims info for variables like XShape, test=develop
* change computational order in sum_op to relieve computation difference using inplace, test=develop
* add inplace_atol to check group_norm, and skip inplace_grad for mkldnn, test=develop
* follow sneaxiy's comments, test=develop
* remove unused DefaultGradOpDescMaker in mkldnn op, test=develop
* increase test_parallel_executor_seresnext time limit
test=develop
* split test_parallel_executor_seresnext
test=develop
* temporally disable reduce_and_allreduce test because of the random failure.
test=develop
* split gpu and cpu
test=develop
add fl_listen_and_serv op for Federated_learning and fl_distribute_transpiler add this op to pserver program . This op just listen the endpoint and sum&scale.
* change the default value of summarize from -1 to 20 in Print op to improve ease of use, test=develop
* change the doc of API Print to make the document easier to understand, test=develop
* instag lod tensor impl
* First PR for instag
* First PR for instag
* Before adding Selection Rows.
* Change name from instag to filter_instag, add upgrade the impl of filter_instag
* Change name from instag to filter_instag, add upgrade the impl of filter_instag
* Fix yapf error in gradient_checker.py to pass Travis-CI
* Fix Filter Instag Grad test=develop
* Fix Filter Instag Grad test=develop
* 1) Fix API.spec, add filter_instag Op. 2) Add Vector Support for CUDA. test=develop
* Impl Loss_weight and empty output handler
* change Loss Weight datatype to Float32, and add Loss Weight as 2nd output
* 1) Support Tensor Input(without LOD) 2) Add Unit test
* Filter By Instag Final test=develop
* Update API.spec for filter_by_instag test=develop
* Update API.spec for filter_by_instag 2 test=develop
* Add Filter By Instag Coverage
* code format of test_layers.py
* code format test_layers.py test=develop
* Make API args more readable test=develop
* Make API args more readable and pass code format test=develop
* Filter By Instag Op, Rename Map to Index Map test=develop
* Filter By Instag Op, code format err in filter_by_instag_op.cc test=develop
* Filter by instag op: code format of cpp files test=develop
* Filter by instag Op: Api spec modification test=develop
* Filter by instag Op: Api spec doc id modification test=develop
* Filter by instag Op: Api spec and doc preview test=develop test=document_preview
* Filter By Instag Op, fix doc erro test=document_preview test=develop
* Filter By Instag Op, fix doc err and Api spec test=document_preview test=develop
* Filter By Instag Op, fix Api spec test=document_preview test=develop
* Filter By Instag Op, fix Paddle Encoforce deprecated warning test=document_preview test=develop
* Filter By Instag Op, fix Paddle Encoforce deprecated and code format warning test=document_preview test=develop
* add hard_swish activation op (new op)
test=develop
* remove redundancy files
* modify document content of HardSwish OP
* add API test in test_layers.py
* add dynamic_graph for test_hard_swish
* add a place field in DataFeed to denote which place it will feed data to.
* abstract the copy process in CopyToFeedTensor function
* add UT for float32 type and for CUDAPlace
* Add call stack info during runtime and compile time
test=develop
* Rename operator_call_stack
test=develop
* Add unit test
test=develop
* follow comment
test=develop
* add train demo for imdb text classification task
* make inference library release data_feed dataset dataset_factory data_feed_factory
* add String Data Generator
* new feature of train demo: save model params
* New feature of train demo: set training config using gflags
* change code style for CI
* add readme and dataset for imdb demo trainer
* fix warpctc.dll not found issue, test=develop
* revert the linux platform change, test=develop
* delete warpctc_lib_path.h.in, test=develop
* add SetPySitePackagePath function
* fix warpctc.dylib not found issue on Mac, test=develop
* improve the paddle lib path setting logic, test=develop
* fix mac ci issue caused by test_warpctc_op unittest, test=develop
* tweak code, test=develop
* open gc by default, test=develop
* fix test_train_recognize_digits and disable gc when ngraph is enabled, test=develop
* fix conditional_block op eager deletion bug, test=develop
* add some comments to reviewers, test=develop
* support filelist size < trainer num
* pull dense when stop, to make sure local dense params are same as pserver, so save paddle model will save dense model same as pserver
* enable QueueDataset train same filelist for serveral times