improve pow op according to reviews:
1. Delete unnecessary judgement statements in PowGradOpDescMaker;
2. Improve test of test_api;
overload GetKernelTypeForVar
add stop_gradient=True when attr(factor) is tensor Variable, change examples in API pow.
test=develop,test=document_preview
add support parameter inference when argument shape is a list containing integer and tensor variable;
test=develop
fix reshape op according to reviews:
1. improve or message;
2. improve test of test_api.
test=develop,test=document_preview
fix reshape op: Add error message in nn.py, test=develop
add stop_gradient=True when attr(shape) is tensor Variable.
change examples in API reshape.
test=develop,test=document_preview
add support parameter inference when arguments starts or ends is a list containing integer and tensor variable;
test=develop,test=document_preview
improve slice op according to review(from hongyu). test=develop
fix slice op according to review: infer_flags, test=develop
fix slice op: improve overload operator __getitem__ to support attrs(starts and ends) are Variable.
test=develop,test=document_preview
fix test_slice_op: add TestSliceOp_decs_dim_6 to resolve conflict with test_slice_ngraph_op. test=develop
add stop_gradient=True when attr(starts) or attr(ends) is tensor Variable.
test=develop,test=document_preview
1. add tensor support for argument expand_times in expand op;
2. add support parameter inference when argument expand_times is a list containing integer and tensor variable;
improve expand op according to reviews:
1. add doc of ExpandTimes in expand_op.cc;
2. improve the test of test_api.
add stop_gradient=True when attr(expand_times) is tensor Variable, change code examples.
test=develop,test=document_preview
* Implement the operator with sprase matrix multiply
* Update the URL of mklml library.
test=develop
* Disable MKLML implematation when using no-linux.
test=develop
* optimize bp with mkl sparse matrix
test=develop
* tmp add fused_emb_seq layer
* Add the support of padding_idx attribute.
test=develop
* add padding_idx support
test=develop
* implement grad refer lego
test=develop
* Refine the codes related to fc op.
* Add GPU implementation for fc functor.
* Apply fc_fuse_pass in GPU inference.
test=develop
* Change the cmake for fc op.
* Change PADDLE_ENFORCE to PADDLE_ENFORCE_EQ.
* Add an attribute to set the activation type in fc_op.
* Enhance the unittest of fc_op.
test=develop
* Remove the declaration of FCOpGrad back to the header file.
test=develop
* Set default value for newly added arguments in test_fc_op.
test=develop
* Enhance fc_fuse_pass to enable fusing relu.
* Allow print the shapes of var_desc in graph.
test=develop
* Enhance fc_fuse_pass_tester.
* Remove the use of PADDLE_ENFORCE.
test=develop
* Correct the number of ops after fusing.
test=develop
* Fix a typo.
test=develop
* Set activation_type to null when there is no relu in fc.
test=develop
* Refine fc_fuse_pass's codes.
* Enable the set of shape for tensor.
* Refine repeated_fc_relu_pass and add unittest.
test=develop
* add kernel for unstack_op, test=develop
* add kernel for unstack_op, test=develop
* add kernel for unstack_op, test=develop
* adjust the code format, test=develop
* modify some comment, test=develop
TemporaryAllocator is a singleton used for allocating memory for Cudnn. Since it is a singleton, we can delete it for better performance in memory.
We replace TemporaryAllocator by CUDADeviceContextAllocator and CUDADeviceContextAllocation, which uses stream callback to delete the memory allocated for the stream to avoid singleton.
Also added data_feed_proto to operator to fix CI in CPU compilation
* Refine the codes related to fc op.
* Add GPU implementation for fc functor.
* Apply fc_fuse_pass in GPU inference.
test=develop
* Change the cmake for fc op.
* Change PADDLE_ENFORCE to PADDLE_ENFORCE_EQ.
* Add an attribute to set the activation type in fc_op.
* Enhance the unittest of fc_op.
test=develop
* Remove the declaration of FCOpGrad back to the header file.
test=develop
* Set default value for newly added arguments in test_fc_op.
test=develop
* Remove constraint that last dimension is forced to be 1 in huber_loss
test=develop
* add y[rank-1] == 1 when x_rank=y_rank test=develop
* modify into contain_unknown_dim test=develop
* add extra error message hint in optimizer ops
* polish format & delete useless change, test=develop
* extract init judue from shape compare, test=develop
* test=develop
Fix the scatter op bug when use the add mode, and support the int64 data type of scatter_op Index(#18804).
* test=develop
Remove the PADDLE_ENFORCE and use PADDLE_ENFORCE_EQ
* test=develop
Remove the fix bug of scatter_add, and just add the support of int64 in scatter_add
* test=develop
Add the test case for scatter op, the test case just for index int64
* - First set of modifications
- Compilation fixes
- compilation fix
- Another compilation fix
- Moved AcquireSoftmaxPrimitiveDescriptor call into handler
- MKL-DNN Softmax PD refactor
test=develop
- Compilation fix
test=develop
- another compilation fix
- cosmetcis
test=develop
- Compilation fix
- Fix to crash when softmax backward is created
* - Fixes after review of softmax refactoring
test=develop
* Support looking up embeddings from BoxPS.
* Add a _pull_box_sparse op, for now this op is not exposed to users.
* Add a BoxHelper class, providing 'BeginPass', 'EndPass', 'FeedPass' functions and so on.
* Add 'BoxPSDataset' in python code.
* Add a compile options WITH_BOX_PS and a MACRO PADDLE_WITH_BOX_PS.
* Add UT.
* More concrete information pls refer to: https://github.com/PaddlePaddle/Paddle/pull/18982
- Refactor step 1
- Compilation fix
- Yet another compilation fix
- Even more compilation fix
- Lint fixes
test=develop
- Removed deprectaed PADDLE_ENFORCE occurance
test=develop
- Candidate fix to BN forward
- Lint fixes
test=develop
- Refactoring in data_layout_transform
- compilation fix
- Another comppilation fix
- Step further into darkness
- Yet another compilation fix
- Yet another compilation fix
- missing header
- compilation fix
- Added MKLDNN -> Paddle conversion in fetch op
test=develop
- Compilation fix
test=develop
- Lint
test=develop
- Mul fix
- Fix to MKLDNN MUL op and Elementwise MUL UT
test=develop
- Workaround for diffrent weights with groups representation Paddle vs
MKL-DNN.
test=develop
- Candidate fix for 5D convolution with groups
- Refactor of fix for conv3d and conv2d in fetch op
test=develop
- Compilation fix
- Still same compilation fix
- Compilation fix
- Compilation fix
- Reverted refactoring of fixes
- Adapted test_conv2d_int8_mkldnn so it exects data in NCHW format
not NHWC
test=develop
- minor fix in UT
test=develop
- Lint fixes
test=develop
* fix correctness of the communicator
* fix a bug in send thread when sending var context is empty, test=develop
* add lookup_table_prefetch_op and prefetch optimize, test=develop
* remove remote prefetch GPU supported
* word2vec force with CPU, test=develop
* test dist remote lookup table force with CPU, test=develop
* supports multiple NCCL communicators preserved in NCCLCommContext
test=develop
* add ut for c_comm_init_all operator and fix cuda resource release problem
test=develop
* support tensor input with padding for warpctc op
* merge with develop
* test=develop
* modified python API examples test=develop
* nn.py is modified for code coverage test=develop
* update documents info about warpctc op in API.spec test=develop
* add test_warpctc_with_padding in test_layers test=develop
* add warning log for cuda_version back to warpctc_op.cc
* modify API.spec for warpctc op test=develop
* modify API.spec
* update warpctc test to new CompiledProgram API test=develop
* modify code examples for warpctc op test=develop
* modify API.spec for warpctc op test=develop
* modify API.spec for warpctc op test=develop
* Implement the operator with sprase matrix multiply
* Update the URL of mklml library.
test=develop
* Disable MKLML implematation when using no-linux.
test=develop
* optimize bp with mkl sparse matrix
test=develop
* add pybind interface to get all inplace ops, test=develop
* enhance OpTest to check whether the consistency of operator when using and not using inplace, test=develop
* handle corner cases in op_test, test=develop
* support outputs without tensor holder_, like XShape in reshape_op, test=develop
* fix bug, some op has GradOpMaker, but actually no grad_op in OpInfoMap, test=develop
* use reshape_grad instead of reshape in FlattenGradOp, test=develop
* fix error debug dims info for variables like XShape, test=develop
* change computational order in sum_op to relieve computation difference using inplace, test=develop
* add inplace_atol to check group_norm, and skip inplace_grad for mkldnn, test=develop
* follow sneaxiy's comments, test=develop
* remove unused DefaultGradOpDescMaker in mkldnn op, test=develop
* Implement the operator with sprase matrix multiply
* Update the URL of mklml library.
test=develop
* Disable MKLML implematation when using no-linux.
test=develop
* Ignore the deprecated status for windows
test=develop
add fl_listen_and_serv op for Federated_learning and fl_distribute_transpiler add this op to pserver program . This op just listen the endpoint and sum&scale.
* remove unused DefaultGradOpDescMaker in REGISTER_OPERATOR(), test=develop
* remove SplitIdsOpGradMaker since it is buggy and not tested, update spec file, test=develop
* instag lod tensor impl
* First PR for instag
* First PR for instag
* Before adding Selection Rows.
* Change name from instag to filter_instag, add upgrade the impl of filter_instag
* Change name from instag to filter_instag, add upgrade the impl of filter_instag
* Fix yapf error in gradient_checker.py to pass Travis-CI
* Fix Filter Instag Grad test=develop
* Fix Filter Instag Grad test=develop
* 1) Fix API.spec, add filter_instag Op. 2) Add Vector Support for CUDA. test=develop
* Impl Loss_weight and empty output handler
* change Loss Weight datatype to Float32, and add Loss Weight as 2nd output
* 1) Support Tensor Input(without LOD) 2) Add Unit test
* Filter By Instag Final test=develop
* Update API.spec for filter_by_instag test=develop
* Update API.spec for filter_by_instag 2 test=develop
* Add Filter By Instag Coverage
* code format of test_layers.py
* code format test_layers.py test=develop
* Make API args more readable test=develop
* Make API args more readable and pass code format test=develop
* Filter By Instag Op, Rename Map to Index Map test=develop
* Filter By Instag Op, code format err in filter_by_instag_op.cc test=develop
* Filter by instag op: code format of cpp files test=develop
* Filter by instag Op: Api spec modification test=develop
* Filter by instag Op: Api spec doc id modification test=develop
* Filter by instag Op: Api spec and doc preview test=develop test=document_preview
* Filter By Instag Op, fix doc erro test=document_preview test=develop
* Filter By Instag Op, fix doc err and Api spec test=document_preview test=develop
* Filter By Instag Op, fix Api spec test=document_preview test=develop
* Filter By Instag Op, fix Paddle Encoforce deprecated warning test=document_preview test=develop
* Filter By Instag Op, fix Paddle Encoforce deprecated and code format warning test=document_preview test=develop
* add hard_swish activation op (new op)
test=develop
* remove redundancy files
* modify document content of HardSwish OP
* add API test in test_layers.py
* add dynamic_graph for test_hard_swish