* add the weight decay func for the momentum op
* Add the multi_precision function in Momentum Optimizer.
* Make sure that the initial value of master weights are same with the fp16 weights.
* add static loss scaling.
* add the rescale_grad function in the pure fp16 training.
* use the original momentum updating method.
* Polish some codes, such as variable names.
* add docstring for apis.
* update the var creation details of _create_master_weight.
* not modify codes about imperative momentum updating.
* Fix the error of test_dist_sparse_tensor_load_momentum UT.
* add unit test for multi precision fp16 training.
* add more unit tests for CI.
* Use lower threshold values for allclose comparing in test_multi_precision_fp16_train UT.
* For CI Coverage Checking.
* add fp16 for layer_norm op
* revert layernorm api
* fix forward
* fix forward
* fix backward for layernorm with fp16
* fix unit test for layernorm with fp16
* fix with_mkldnn compile error for layernorm with fp16
* 1. revert to PADDLE_ENFORCE_NOT_NULL, 2. change static_cast<float> to static_cast<U>
* fix with_mkldnn compile error for layernorm with fp16
* fix with_mkldnn compile error for layernorm with fp16
Co-authored-by: zhiqiu <chenqiuliang@baidu.com>
* add compile option WITH_TENSORRT
* add WITH_TENSORRT to ci paddle_buils.sh
* add WITH_TENSORRT to paddle_build.sh
* change FATAL to WARNING when TensorRT is not found and WITN_TENSORRT=ON, just to pass ci-py3 temporarily
* add complex64 and complex128 type; add +-*/@ and slice opreator for complex types
* add test cases for complex elementwise, matmul and getitem unittest
* add test cases for complex types
* add test cases for complex matmul unittest
* The leaf tensor concept is exposed and the gradient accumulation of leaf tensor
* The leaf tensor concept is exposed and the gradient accumulation of leaf tensor
* fix coverage
* fix api doc
* fix CI unittest
* fix CI unittest
* fix unitest
* empty tensor does’t need inner_var_
* fix some error message
* Add a class TensorInplaceVersion to count the inplace version and put it in framework::Tensor instead of Allocation or Variable.
* Add a new attribute `_inplace_version` for VarBase.
* Raise exception if an inplace operation can result in incorrect gradient computation.
* Add a new interface _bump_inplace_version() for VarBase to bump the version whenever the Tensor is modified through an inplace operation.
* For api assign, call _bump_inplace_version() when it's an inplace operation inn dynamic mode.
* Use original var_wrapper if the inplace_version is not changed.
* Replace SnapshotVarWrapperList with SnapshotVarWrapper to optimize performane.
* Generate code coverage reports only for incremental files, test=develop
* Generate code coverage reports only for incremental files, test=develop
* Generate code coverage reports only for incremental files, test=develop
* test for diff python file, test=develop
* fix no python diff report, test=develop
* add cc test file, test=develop
* fix bug in generic.cmake, test=develop
* for debug no cc report, test=develp
* modify compire branch form test_pr to test, test=develop
* fix bug, test=develop
* test for h file changed, test=develop
* debug for redefinition of argument optimize error, test=develop
* close -o3 for test, test=develop
* remove -o3 for test, test=develop
* remove coverage option for nvcc, test=develop
* use CMAKE_CXX_FLAGS open coverage option when header file changed, test=develop
* reopen -o3, test=develop
* remove debug code, test=develop
* remove unused code, test=develop
* add reducer
* refine envent for memorycopy
* add concat&split for allreduce
* apply concat & split for fuse tensor
* fix nccl dep
* fix the untest, compile problem and ddp initialize problem
* fix untest for mac & add some comments & solve the repeated param in sublayers
* fix untest for windows & fix document
* add lstm, simple rnn op kernel
* fix the test_lstm for the rnn op
* change func name
* fix forward postprocess bug
* add gru forward, backward code
* remove unittest.skipIf; use a big rnn op instead of combination op
* fix input doesn't have gradient bug
* add eigen lstm forward, backward
Co-authored-by: wawltor <fangzeyang0904@hotmail.com>
* Fix gradients with ignore_idx in softmax_with_cross_entropy.
test=develop
* Fix gradients with ignore_idx in softmax_with_cross_entropy on cpu.
Remove softmax_with_cross_entropy from op_threshold_white_list.
test=develop
* Fix test_softmax_cross_entropy_op.py.
test=develop
* add monitoring for executive ut at night
* fix some error for paddle_build.bat
* fix some error
* fix some error in windows
* fix some error on windows
* disable ut test_parallel_executor_fetch_isolated_var,test=document_fix
* test for limiting ut exec time as 15S
* fix an error caused by cannot find ut
* fix some error
* can not find test_transformer
* fix error caused by ut not run in windows
* fix error caused by Compiler Options
* fix error caused by setting timeout value as 15 in python/paddle/tests/CMakeLists.txt
* setting timeout value to 120s for old ut
* add the timeout value setting
* fix error caused by ut only run in coverage_ci
* add analyzer_transformer_profile_tester
* fix some error
* fix some error
* fix error with inference option
* fix error with inference option setting as ON_INFER
* add some ut to set timeout
* modified some option
* fix error
* fix some timeout error
* fix error
* fix error
* fix timeout for test_analyzer_bfloat16_resnet50
* fix error
* setting timeout properity for some ut
* first pr for new ut timeout as 15S
* retry will not be executed when the number of failed ut is greater than 20
* add log display
* fix some error
* fix some error
* fix some error
* fix some error
* fp16 result ok
* change -DWITH_NVINFER_PLUGIN toconfig.EnableTensorRtOSS
* auto detect special slice op converter for ernie with trt oss
* ernie oss only support fp16
* fix special_slice_plugin serialize bug
* matmul in tensorrt ok
* ernie unittest ok
* add matmul tensorrt unittest
* remove demo code
This PR is follow up of #28213. On that PR we tried to decrease GPU usage, however the CI still randomly failed. So I added retry logic for the initialization of nccl and cusolver. If the initialization failed, we can retry to avoid the random failure.
* fix strided_slice_op's GetExpectedKernelType when input tensor is at CUDAPinnedPlace
* add unittest for tensors in cuda pinned place
* skip test for cuda pinned place on cpu machines
* fix bug of fetch_async_op_handle
* revert some changes of test_buffer_shared_memory_reuse_pass
* revert some changes of test_buffer_shared_memory_reuse_pass
* Still has bugs.
* Fixed allclose_op bug, which cannot deal with some cases of fp64 inputs.
* improved CUDA kernel performance.
* Changed CUDA code.
* Fixed a bug in cuda kernel which cannot deal with large dimension input, and added an unittest for it.
* Add a test case for float32 input.
* fix multinomial doc
* fix multinomial error message
* little doc change
* fix Categorical class doc
* optimize format of error message
* fix CPU Kernel error message format
* fix isinf and isnan error in WindowsOPENBLAS CI
* delete inf and nan
* add manual_seed in sample code
* little error message change
* change error message to InvalidArgument
* add full point for error message and add manual_seed in CPU environment
* Add truncated_gaussian_random_op XPU kernel
* Add truncated_gaussian_random_op XPU kernel, test=kunlun
* little change, test=kunlun
* change boost_get to BOOST_GET_CONST
* change boost_get to BOOST_GET_CONST, test=kunlun
* little change, test=kunlun
* use Generator to generate random number and optimize format, test=kunlun
* little change, test=kunlun
* add TODO, test=kunlun
* Add gaussian_random XPU kernels
* commit kunlun, test=kunlun
* new version, test=kunlun
* change boost_get to BOOST_GET_CONST, test=kunlun
* use Generator to generate random number and optimize format, test=kunlun
* add TODO, test=kunlun
* support uniform_random op on Baidu Kunlun
* change dtype of attr shape from int to int64_t
* kunlun ci, test=kunlun
* new version, test=kunlun
* change boost_get to BOOST_GET_CONST
* change boost_get to BOOST_GET_CONST, test=kunlun
* use Generator to generate random number and optimize format
* run Kunlun CI, test=kunlun
* add TODO, test=kunlun
* Incorporate cudnn_lstm into LSTM api.
test=develop
* Make coalesce_tensor support alignment optionally.
test=develop
* Reorganize RNN apis. test=develop
* Fix cudnn rnn layout conversion.
test=develop
* Add sequence_length support for RNN cudnn implement.
Add optional init_h and init_c gradient for cudnn_lstm_op.
test=develop
* Use create_parameter for rnn cudnn impl.
test=develop
* Move `self._flat_weight = self.create_parameter()` in RNNBase to main_program.
test=develop
* Update RNN api unittest to use set_device.
test=develop
* Fix set_place for unit tests of RNN apis.
test=develop
* Fix use_align in coalesce_tensor_op.
test=develop
* Adjust RNN apis arguments according to comments.
test=develop
* Polish documents for SimpleRNN apis.
test=develop
* Refine random seed in cudnn_lstm_op.
Expose rnn params from sublayers to RNN.
test=develop
* Fix RNN saving for jit.save.
Refine cudnn_lstm dropout behavior.
test=develop
* Fix doc of GRU. test=develop
* Use ShareDataWith to avoid copying for cudnn_lstm_op test.
test=develop
* Remove updates on cudnn_lstm temporarily.
test=develop
* Use ShareDataWith to avoid copying for cudnn_lstm_op test.
test=develop
* Refine random seed in cudnn_lstm_op.
test=develop
* Fix test_lstm by adjust ConcreteProgram buffer getter.
test=develop
* Use create_parameter instead of create_var for rnn._flat_weight for static graph usage.
test=develop
* Remove W input for cudnn_lstm to pass unused_var_check.
test=develop
* Add test_predict for RNN unit tests coverage.
test=develop
* Fix code style of rnn.
test=develop
* Fix F.rnn usage in rnn.py.
test=develop
* test=kunlun;
Add elementwise XPU OP kernel for KUNLUN core, including (but still cannot process common broadcast):
* elementwise_div op
* elementwise_max op
* elementwise_mul op (with grad op)
* elementwise_sub op (with grad op)
* 0.05->0.01
* add xpu error message description;test=kunlun
1. support channel last in BatchNorm*d (#27875)
2. fix a bug in batch_norm_op cuda kernel by extracting ResizeToChannelFist(Last), TransToChannelFirst(Last) to operators/layer_utils.h
* disable ut quickly
* fix some error
* fix some error
* install urllib2 package
* use requests package instead of urllib2
* fix error caused by windows regular parameter
* fix error on windows
* fix some error
* fix with format error
* show disable ut in log
* fix some error
* fix some error
* add the handling of error in executing get_quickly_disable_ut
* modify cond while_loop to paddle.static.nn.cond
* modify crop_tensor to paddle.crop
* modify Variable to paddle.static.Variable
* remove nn.beam_search, nn.beam_search_decode, nn.gather_tree
* remove bpr_loss, center_loss, rank_loss, smooth_l1, teacher_student_sigmoid_loss, edit_distance, sampled_softmax_with_cross_entropy in nn.functional
* remove apis in nn.functional.learn_rate.py
* remove pool2d, pool3d, adaptive_pool2d, adaptive_pool3d in nn.functional
* remove apis in nn.functional.vision
* remove erf, soft_relu in nn.functional.activation
* remove apis in nn.functional.extension
* remove nn.functional.rnn
* remove hash from nn.functional.lod
* remove row_conv from nn.functional.extension
* remove one_hot, pad2d, pad_constant_like from nn.functional.common
* remove nn.gather_tree, nn.BilinearTensorProduct, nn.Pool2D, nn.Pad2D
* remove apis from optimizer.__init
* remove tensor.creation.fill_constant
* remove elementwise_mul in nn.functional.common and modify to paddle.multiply
* remove tensor.stat.reduce_mean
* remove reduce_all, reduce_any in tensor.logic
* remove apis in tensor.math
* remove apis in tensor.__init__
* remove has_inf, has_nan in tensor.search
* remove apis in framework.__init__
* remove apis in paddle.__init__
* remove apis in nn.functional.__init__
* modify removed alias apis to raw api in doc and unittests
* fix remove grid_sample bug
* modify removed alias apis to raw api in doc and unittests
* modify removed alias apis to raw api in doc and unittests
* modify removed alias apis to raw api in doc and unittests
* modify removed alias apis to raw api in doc and unittests
* modify removed alias apis to raw api in doc and unittests
* modify removed alias apis to raw api in doc and unittests
* delete alias api relastions in doc
* reserve paddle.compat, paddle.sysconfig
* remove unittest for paddle.reduce_all, paddle.reduce_any
* modify removed alias apis to raw api in doc and unittests
* recover paddle.save and paddle.load
* resolve conflicts
* fix sample code missing paddle.enable_static() bug
* fix sample code missing paddle.enable_static() bug
* fix to_string sample code error