* test=develop
Fix the scatter op bug when use the add mode, and support the int64 data type of scatter_op Index(#18804).
* test=develop
Remove the PADDLE_ENFORCE and use PADDLE_ENFORCE_EQ
* test=develop
Remove the fix bug of scatter_add, and just add the support of int64 in scatter_add
* test=develop
Add the test case for scatter op, the test case just for index int64
* add to and detach for Variable in dygraph, test=develop
* add detach for Variable in dygraph, test=develop
* add detach for Variable in dygraph, test=develop
* add detach for Variable in dygraph, test=develop
* add detach for Variable in dygraph, test=develop
* add detach for Variable in dygraph, test=develop
* add exception check, test=develop
* Support looking up embeddings from BoxPS.
* Add a _pull_box_sparse op, for now this op is not exposed to users.
* Add a BoxHelper class, providing 'BeginPass', 'EndPass', 'FeedPass' functions and so on.
* Add 'BoxPSDataset' in python code.
* Add a compile options WITH_BOX_PS and a MACRO PADDLE_WITH_BOX_PS.
* Add UT.
* More concrete information pls refer to: https://github.com/PaddlePaddle/Paddle/pull/18982
- Refactor step 1
- Compilation fix
- Yet another compilation fix
- Even more compilation fix
- Lint fixes
test=develop
- Removed deprectaed PADDLE_ENFORCE occurance
test=develop
- Candidate fix to BN forward
- Lint fixes
test=develop
- Refactoring in data_layout_transform
- compilation fix
- Another comppilation fix
- Step further into darkness
- Yet another compilation fix
- Yet another compilation fix
- missing header
- compilation fix
- Added MKLDNN -> Paddle conversion in fetch op
test=develop
- Compilation fix
test=develop
- Lint
test=develop
- Mul fix
- Fix to MKLDNN MUL op and Elementwise MUL UT
test=develop
- Workaround for diffrent weights with groups representation Paddle vs
MKL-DNN.
test=develop
- Candidate fix for 5D convolution with groups
- Refactor of fix for conv3d and conv2d in fetch op
test=develop
- Compilation fix
- Still same compilation fix
- Compilation fix
- Compilation fix
- Reverted refactoring of fixes
- Adapted test_conv2d_int8_mkldnn so it exects data in NCHW format
not NHWC
test=develop
- minor fix in UT
test=develop
- Lint fixes
test=develop
* fix con2d transpose bias by create and init it in build_onee
* fix API spec
* test=develop, invoke ci
* fix bias_attr and act has no effect error on layer norm, conv2dTranpose, billinearTensorProduct, sequece_conv. fix original_mode not used error on GRUunit. fix sample_weight not set error on NCE. Add ut for all thoese layer
* test=develop, change success standard for conv2dTranspose
* test=develop, fix test_layers to invoke some error branch
* test=develop, fix sample code
* test=develop, fix BilinearTensorProduct failed in dygraph mode
* test=develop, fix test_layers segment fault error
* fix correctness of the communicator
* fix a bug in send thread when sending var context is empty, test=develop
* add lookup_table_prefetch_op and prefetch optimize, test=develop
* remove remote prefetch GPU supported
* word2vec force with CPU, test=develop
* test dist remote lookup table force with CPU, test=develop
* supports multiple NCCL communicators preserved in NCCLCommContext
test=develop
* add ut for c_comm_init_all operator and fix cuda resource release problem
test=develop
* support tensor input with padding for warpctc op
* merge with develop
* test=develop
* modified python API examples test=develop
* nn.py is modified for code coverage test=develop
* update documents info about warpctc op in API.spec test=develop
* add test_warpctc_with_padding in test_layers test=develop
* add warning log for cuda_version back to warpctc_op.cc
* modify API.spec for warpctc op test=develop
* modify API.spec
* update warpctc test to new CompiledProgram API test=develop
* modify code examples for warpctc op test=develop
* modify API.spec for warpctc op test=develop
* modify API.spec for warpctc op test=develop
* Implement the operator with sprase matrix multiply
* Update the URL of mklml library.
test=develop
* Disable MKLML implematation when using no-linux.
test=develop
* optimize bp with mkl sparse matrix
test=develop
* add pybind interface to get all inplace ops, test=develop
* enhance OpTest to check whether the consistency of operator when using and not using inplace, test=develop
* handle corner cases in op_test, test=develop
* support outputs without tensor holder_, like XShape in reshape_op, test=develop
* fix bug, some op has GradOpMaker, but actually no grad_op in OpInfoMap, test=develop
* use reshape_grad instead of reshape in FlattenGradOp, test=develop
* fix error debug dims info for variables like XShape, test=develop
* change computational order in sum_op to relieve computation difference using inplace, test=develop
* add inplace_atol to check group_norm, and skip inplace_grad for mkldnn, test=develop
* follow sneaxiy's comments, test=develop
* remove unused DefaultGradOpDescMaker in mkldnn op, test=develop
* increase test_parallel_executor_seresnext time limit
test=develop
* split test_parallel_executor_seresnext
test=develop
* temporally disable reduce_and_allreduce test because of the random failure.
test=develop
* split gpu and cpu
test=develop
add fl_listen_and_serv op for Federated_learning and fl_distribute_transpiler add this op to pserver program . This op just listen the endpoint and sum&scale.
* change the default value of summarize from -1 to 20 in Print op to improve ease of use, test=develop
* change the doc of API Print to make the document easier to understand, test=develop
* instag lod tensor impl
* First PR for instag
* First PR for instag
* Before adding Selection Rows.
* Change name from instag to filter_instag, add upgrade the impl of filter_instag
* Change name from instag to filter_instag, add upgrade the impl of filter_instag
* Fix yapf error in gradient_checker.py to pass Travis-CI
* Fix Filter Instag Grad test=develop
* Fix Filter Instag Grad test=develop
* 1) Fix API.spec, add filter_instag Op. 2) Add Vector Support for CUDA. test=develop
* Impl Loss_weight and empty output handler
* change Loss Weight datatype to Float32, and add Loss Weight as 2nd output
* 1) Support Tensor Input(without LOD) 2) Add Unit test
* Filter By Instag Final test=develop
* Update API.spec for filter_by_instag test=develop
* Update API.spec for filter_by_instag 2 test=develop
* Add Filter By Instag Coverage
* code format of test_layers.py
* code format test_layers.py test=develop
* Make API args more readable test=develop
* Make API args more readable and pass code format test=develop
* Filter By Instag Op, Rename Map to Index Map test=develop
* Filter By Instag Op, code format err in filter_by_instag_op.cc test=develop
* Filter by instag op: code format of cpp files test=develop
* Filter by instag Op: Api spec modification test=develop
* Filter by instag Op: Api spec doc id modification test=develop
* Filter by instag Op: Api spec and doc preview test=develop test=document_preview
* Filter By Instag Op, fix doc erro test=document_preview test=develop
* Filter By Instag Op, fix doc err and Api spec test=document_preview test=develop
* Filter By Instag Op, fix Api spec test=document_preview test=develop
* Filter By Instag Op, fix Paddle Encoforce deprecated warning test=document_preview test=develop
* Filter By Instag Op, fix Paddle Encoforce deprecated and code format warning test=document_preview test=develop
* add hard_swish activation op (new op)
test=develop
* remove redundancy files
* modify document content of HardSwish OP
* add API test in test_layers.py
* add dynamic_graph for test_hard_swish
* add a place field in DataFeed to denote which place it will feed data to.
* abstract the copy process in CopyToFeedTensor function
* add UT for float32 type and for CUDAPlace
* Add call stack info during runtime and compile time
test=develop
* Rename operator_call_stack
test=develop
* Add unit test
test=develop
* follow comment
test=develop
* add train demo for imdb text classification task
* make inference library release data_feed dataset dataset_factory data_feed_factory
* add String Data Generator
* new feature of train demo: save model params
* New feature of train demo: set training config using gflags
* change code style for CI
* add readme and dataset for imdb demo trainer
* fix warpctc.dll not found issue, test=develop
* revert the linux platform change, test=develop
* delete warpctc_lib_path.h.in, test=develop
* add SetPySitePackagePath function
* fix warpctc.dylib not found issue on Mac, test=develop
* improve the paddle lib path setting logic, test=develop
* fix mac ci issue caused by test_warpctc_op unittest, test=develop
* tweak code, test=develop
* open gc by default, test=develop
* fix test_train_recognize_digits and disable gc when ngraph is enabled, test=develop
* fix conditional_block op eager deletion bug, test=develop
* add some comments to reviewers, test=develop
* support filelist size < trainer num
* pull dense when stop, to make sure local dense params are same as pserver, so save paddle model will save dense model same as pserver
* enable QueueDataset train same filelist for serveral times
* test=develop
Add the op of unique_with_counts, the op is calc the unqiue input of data, and output the corresponding indices and count of data.
* test=develop
Check the input and dtype in the op of unique_with_counts
* test=develop
test=document_preview
update the API.spec for `unique_with_counts`, at the same time, optimize the python api in the op of `unique_with_count`
* test=develop
test=document_preview
Fix some python api problem in the op of `unique_with_counts`, and change the error messsage in this op.
* Fix some API problem in the op of `unique_with_counts`
test=develop
test=document_preview
* test=develop
test=document_preview
Fix the api sample of op `unique_with_counts`, and update api.spec
(1) set fleet_send_batch_num a default value according to trainer num, the previous 80000 is fixed,if trainer num is much less or larger than 100,global shuffle may have timeout error.
(2) fix load one table bug, add barrier
* support center loss
* change tensor copy api to high level api tensorcopy
* test=develop rewrite the center_loss cuda_kernel to make it faster
and add document of the center loss api,also update test function
* test=document_preview test=develop
update document of center loss
* test=document_preview test=develop
modify API.spec modify test code remove nouse const_cast