improve pow op according to reviews:
1. Delete unnecessary judgement statements in PowGradOpDescMaker;
2. Improve test of test_api;
overload GetKernelTypeForVar
add stop_gradient=True when attr(factor) is tensor Variable, change examples in API pow.
test=develop,test=document_preview
add support parameter inference when argument shape is a list containing integer and tensor variable;
test=develop
fix reshape op according to reviews:
1. improve or message;
2. improve test of test_api.
test=develop,test=document_preview
fix reshape op: Add error message in nn.py, test=develop
add stop_gradient=True when attr(shape) is tensor Variable.
change examples in API reshape.
test=develop,test=document_preview
add support parameter inference when arguments starts or ends is a list containing integer and tensor variable;
test=develop,test=document_preview
improve slice op according to review(from hongyu). test=develop
fix slice op according to review: infer_flags, test=develop
fix slice op: improve overload operator __getitem__ to support attrs(starts and ends) are Variable.
test=develop,test=document_preview
fix test_slice_op: add TestSliceOp_decs_dim_6 to resolve conflict with test_slice_ngraph_op. test=develop
add stop_gradient=True when attr(starts) or attr(ends) is tensor Variable.
test=develop,test=document_preview
1. add tensor support for argument expand_times in expand op;
2. add support parameter inference when argument expand_times is a list containing integer and tensor variable;
improve expand op according to reviews:
1. add doc of ExpandTimes in expand_op.cc;
2. improve the test of test_api.
add stop_gradient=True when attr(expand_times) is tensor Variable, change code examples.
test=develop,test=document_preview
* refactor dygraph,test=develop
* fix failed unittest,test=develop
* polish code,test=develop
* check windows ci error,test=develop
try to fix windows ci error by np.allclose,test=develop
* polish vlog and profiler, test=develop
* try to fix preceding ops order,test=develop
* test transformer in windows ci, test=develop
* use python c-api to speed up tracer.trace,test=develop
* test=develop, fix docker with paddle nccl problem
* test=develop, add ut for debug string and gradient_accumulator
* test=develop, add tests for layer/gradient_accumulator/prepared_op
* test=develop, fix complie error for test_prepared_op
* test=develop, add more ut for dygraph
* test=develop, create API.spec for dygraph api change
* add transform_data to dygraph
* test=develop, refoctor name to make it easier to understand
* test=develop, refoctor name to make it easier to understand
* add test and change input to const ref for safety
* test=develop, fix multi-gpu failed problem , add Tracer tests, change PADDLEENFORCE to PADDLEENFORCE_EQ
* add ut for data transform
* refine ut for data_transform
* test=develop, fix ut failed on parallel se-resnext
* test=develop, change one more PADDLE_ENFORCE
* add test_tracer on multiple devices
* test=develop, change place to mutable for data transform
* test=develop, add transform data on same place test and remove useless log
* test=develop, Add to do for data layout and and ut for conv2d with no bias
* Implement the operator with sprase matrix multiply
* Update the URL of mklml library.
test=develop
* Disable MKLML implematation when using no-linux.
test=develop
* optimize bp with mkl sparse matrix
test=develop
* tmp add fused_emb_seq layer
* Add the support of padding_idx attribute.
test=develop
* add padding_idx support
test=develop
* implement grad refer lego
test=develop
TemporaryAllocator is a singleton used for allocating memory for Cudnn. Since it is a singleton, we can delete it for better performance in memory.
We replace TemporaryAllocator by CUDADeviceContextAllocator and CUDADeviceContextAllocation, which uses stream callback to delete the memory allocated for the stream to avoid singleton.
Also added data_feed_proto to operator to fix CI in CPU compilation
* Refine the codes related to fc op.
* Add GPU implementation for fc functor.
* Apply fc_fuse_pass in GPU inference.
test=develop
* Change the cmake for fc op.
* Change PADDLE_ENFORCE to PADDLE_ENFORCE_EQ.
* Add an attribute to set the activation type in fc_op.
* Enhance the unittest of fc_op.
test=develop
* Remove the declaration of FCOpGrad back to the header file.
test=develop
* Set default value for newly added arguments in test_fc_op.
test=develop
* Remove constraint that last dimension is forced to be 1 in huber_loss
test=develop
* add y[rank-1] == 1 when x_rank=y_rank test=develop
* modify into contain_unknown_dim test=develop
* refactor dygraph,test=develop
* fix failed unittest,test=develop
* polish code,test=develop
* check windows ci error,test=develop
try to fix windows ci error by np.allclose,test=develop
* polish vlog and profiler, test=develop
* try to fix preceding ops order,test=develop
* test transformer in windows ci, test=develop
* use python c-api to speed up tracer.trace,test=develop
* test=develop, fix docker with paddle nccl problem
* test=develop, add ut for debug string and gradient_accumulator
* test=develop, add tests for layer/gradient_accumulator/prepared_op
* test=develop, fix complie error for test_prepared_op
* test=develop, add more ut for dygraph
* test=develop, create API.spec for dygraph api change
* test=develop, refoctor name to make it easier to understand
* test=develop, refoctor name to make it easier to understand
* test=develop, fix multi-gpu failed problem , add Tracer tests, change PADDLEENFORCE to PADDLEENFORCE_EQ
* test=develop, fix ut failed on parallel se-resnext
* test=develop, change one more PADDLE_ENFORCE
* test=develop add a argument for softshrink python api
* test=develop fix doc format
test=develop fix doc format
* test=develop fix API.spec
test=develop fix API.spec
* test=develop
Fix the scatter op bug when use the add mode, and support the int64 data type of scatter_op Index(#18804).
* test=develop
Remove the PADDLE_ENFORCE and use PADDLE_ENFORCE_EQ
* test=develop
Remove the fix bug of scatter_add, and just add the support of int64 in scatter_add
* test=develop
Add the test case for scatter op, the test case just for index int64
* add to and detach for Variable in dygraph, test=develop
* add detach for Variable in dygraph, test=develop
* add detach for Variable in dygraph, test=develop
* add detach for Variable in dygraph, test=develop
* add detach for Variable in dygraph, test=develop
* add detach for Variable in dygraph, test=develop
* add exception check, test=develop
* Support looking up embeddings from BoxPS.
* Add a _pull_box_sparse op, for now this op is not exposed to users.
* Add a BoxHelper class, providing 'BeginPass', 'EndPass', 'FeedPass' functions and so on.
* Add 'BoxPSDataset' in python code.
* Add a compile options WITH_BOX_PS and a MACRO PADDLE_WITH_BOX_PS.
* Add UT.
* More concrete information pls refer to: https://github.com/PaddlePaddle/Paddle/pull/18982
- Refactor step 1
- Compilation fix
- Yet another compilation fix
- Even more compilation fix
- Lint fixes
test=develop
- Removed deprectaed PADDLE_ENFORCE occurance
test=develop
- Candidate fix to BN forward
- Lint fixes
test=develop
- Refactoring in data_layout_transform
- compilation fix
- Another comppilation fix
- Step further into darkness
- Yet another compilation fix
- Yet another compilation fix
- missing header
- compilation fix
- Added MKLDNN -> Paddle conversion in fetch op
test=develop
- Compilation fix
test=develop
- Lint
test=develop
- Mul fix
- Fix to MKLDNN MUL op and Elementwise MUL UT
test=develop
- Workaround for diffrent weights with groups representation Paddle vs
MKL-DNN.
test=develop
- Candidate fix for 5D convolution with groups
- Refactor of fix for conv3d and conv2d in fetch op
test=develop
- Compilation fix
- Still same compilation fix
- Compilation fix
- Compilation fix
- Reverted refactoring of fixes
- Adapted test_conv2d_int8_mkldnn so it exects data in NCHW format
not NHWC
test=develop
- minor fix in UT
test=develop
- Lint fixes
test=develop
* fix con2d transpose bias by create and init it in build_onee
* fix API spec
* test=develop, invoke ci
* fix bias_attr and act has no effect error on layer norm, conv2dTranpose, billinearTensorProduct, sequece_conv. fix original_mode not used error on GRUunit. fix sample_weight not set error on NCE. Add ut for all thoese layer
* test=develop, change success standard for conv2dTranspose
* test=develop, fix test_layers to invoke some error branch
* test=develop, fix sample code
* test=develop, fix BilinearTensorProduct failed in dygraph mode
* test=develop, fix test_layers segment fault error
* fix correctness of the communicator
* fix a bug in send thread when sending var context is empty, test=develop
* add lookup_table_prefetch_op and prefetch optimize, test=develop
* remove remote prefetch GPU supported
* word2vec force with CPU, test=develop
* test dist remote lookup table force with CPU, test=develop
* supports multiple NCCL communicators preserved in NCCLCommContext
test=develop
* add ut for c_comm_init_all operator and fix cuda resource release problem
test=develop
* support tensor input with padding for warpctc op
* merge with develop
* test=develop
* modified python API examples test=develop
* nn.py is modified for code coverage test=develop
* update documents info about warpctc op in API.spec test=develop
* add test_warpctc_with_padding in test_layers test=develop
* add warning log for cuda_version back to warpctc_op.cc
* modify API.spec for warpctc op test=develop
* modify API.spec
* update warpctc test to new CompiledProgram API test=develop
* modify code examples for warpctc op test=develop
* modify API.spec for warpctc op test=develop
* modify API.spec for warpctc op test=develop