* Release 2.0rc cherry pick api rename #28108 (#28184)
* rename count_include_pad-->exclusive return_indices-->return_mask
* remove track_running_stats
* fix typo.
* rename xxxd-->xxxxD
* solve conflicts
* 2.0rc api add all any (#28199)
* reduce trt warning message (#28011)
add paddle.enable_static() on sample code
alias recude_all-->all, reduce_any-->any
add import reduce_all and reduce_any in python/paddle/tensor/math.py
import all and any in python/paddle/tensor/__init__.py
remove all and any OP in python/paddle/tensor/logic.py, add all and any OP in python/paddle/tensor/math.py
fix import error
remove TestAllAPI temporary
* fix doc of recdue_all and reduce_any, test=document_fix
* fix typo
* fix unittest for all and any API
Co-authored-by: Pei Yang <peiyang@baidu.com>
* rename conv_transposeXd-->convXd_transpose (#28198)
* fix sample code of reduce_all and reduce_any
Co-authored-by: Pei Yang <peiyang@baidu.com>
Recently, test_parallel_executor_test_while_train randomly failed on CI. On all CI logs, it showed NCCL initialization failed or cusolver initialization failed. I found online that those failure is usually caused by GPU shortage. Those API calls CUDA APIs directly so it shouldn't be the problem of allocator. It may be somewhere in PaddlePaddle increases GPU usage.
However, I run this test for 1000 times on my machine and the CI machine, either of them can reproduce the random failure. Maybe there is something related to the environment only happened in test env.
To verify my assumption that somewhere in PaddlePaddle increases GPU usage and also fix this CI, I decreased the batch_size to see whether the random failure disappears in test env.
* fix strided_slice_op's GetExpectedKernelType when input tensor is at CUDAPinnedPlace
* add unittest for tensors in cuda pinned place
* skip test for cuda pinned place on cpu machines
* fix bug of fetch_async_op_handle
* revert some changes of test_buffer_shared_memory_reuse_pass
* revert some changes of test_buffer_shared_memory_reuse_pass
* transfer from paddle.fluid.layers.assign() into creation.py,test=develop
* fix ut fail,add support for paddle.assign,test=develop
* fix,test=develop
* fix UT coverage,test=coverage
* fix UT fail,test=coverage
* fix doc,test=develop
* Still has bugs.
* Fixed allclose_op bug, which cannot deal with some cases of fp64 inputs.
* improved CUDA kernel performance.
* Changed CUDA code.
* Fixed a bug in cuda kernel which cannot deal with large dimension input, and added an unittest for it.
* Add a test case for float32 input.
* fix multinomial doc
* fix multinomial error message
* little doc change
* fix Categorical class doc
* optimize format of error message
* fix CPU Kernel error message format
* fix isinf and isnan error in WindowsOPENBLAS CI
* delete inf and nan
* add manual_seed in sample code
* little error message change
* change error message to InvalidArgument
* add full point for error message and add manual_seed in CPU environment
* Add truncated_gaussian_random_op XPU kernel
* Add truncated_gaussian_random_op XPU kernel, test=kunlun
* little change, test=kunlun
* change boost_get to BOOST_GET_CONST
* change boost_get to BOOST_GET_CONST, test=kunlun
* little change, test=kunlun
* use Generator to generate random number and optimize format, test=kunlun
* little change, test=kunlun
* add TODO, test=kunlun
* Add gaussian_random XPU kernels
* commit kunlun, test=kunlun
* new version, test=kunlun
* change boost_get to BOOST_GET_CONST, test=kunlun
* use Generator to generate random number and optimize format, test=kunlun
* add TODO, test=kunlun
* support uniform_random op on Baidu Kunlun
* change dtype of attr shape from int to int64_t
* kunlun ci, test=kunlun
* new version, test=kunlun
* change boost_get to BOOST_GET_CONST
* change boost_get to BOOST_GET_CONST, test=kunlun
* use Generator to generate random number and optimize format
* run Kunlun CI, test=kunlun
* add TODO, test=kunlun
* Incorporate cudnn_lstm into LSTM api.
test=develop
* Make coalesce_tensor support alignment optionally.
test=develop
* Reorganize RNN apis. test=develop
* Fix cudnn rnn layout conversion.
test=develop
* Add sequence_length support for RNN cudnn implement.
Add optional init_h and init_c gradient for cudnn_lstm_op.
test=develop
* Use create_parameter for rnn cudnn impl.
test=develop
* Move `self._flat_weight = self.create_parameter()` in RNNBase to main_program.
test=develop
* Update RNN api unittest to use set_device.
test=develop
* Fix set_place for unit tests of RNN apis.
test=develop
* Fix use_align in coalesce_tensor_op.
test=develop
* Adjust RNN apis arguments according to comments.
test=develop
* Polish documents for SimpleRNN apis.
test=develop
* Refine random seed in cudnn_lstm_op.
Expose rnn params from sublayers to RNN.
test=develop
* Fix RNN saving for jit.save.
Refine cudnn_lstm dropout behavior.
test=develop
* Fix doc of GRU. test=develop
* Use ShareDataWith to avoid copying for cudnn_lstm_op test.
test=develop
* Remove updates on cudnn_lstm temporarily.
test=develop
* Use ShareDataWith to avoid copying for cudnn_lstm_op test.
test=develop
* Refine random seed in cudnn_lstm_op.
test=develop
* Fix test_lstm by adjust ConcreteProgram buffer getter.
test=develop
* Use create_parameter instead of create_var for rnn._flat_weight for static graph usage.
test=develop
* Remove W input for cudnn_lstm to pass unused_var_check.
test=develop
* Add test_predict for RNN unit tests coverage.
test=develop
* Fix code style of rnn.
test=develop
* Fix F.rnn usage in rnn.py.
test=develop
* test=kunlun;
Add elementwise XPU OP kernel for KUNLUN core, including (but still cannot process common broadcast):
* elementwise_div op
* elementwise_max op
* elementwise_mul op (with grad op)
* elementwise_sub op (with grad op)
* 0.05->0.01
* add xpu error message description;test=kunlun
* Make dynamic_decode support dygraph and expose to API 2.0
test=develop
* update info about BeamSearchDecoder and dynamic_decode
* remove all APIs in paddle.text, expose BeamSearchDecoder and dynamic_decode
* update example code
* delete test_text.py, decode.py, update some doc, fix example code float64
* delete decode import from paddle.nn
* fix unittest bugs
* use dygraph.Embedding instead of nn.Embedding, add paddle.enbale_static()
* update, correct doc
* move dynamic_decode, BeamSearchDecoder API to paddle.nn
* fix code style
* update unittest param, delete import pf text.py
* set dtype of beamsearchtest float64
* update example code of BeamSearchDecoder, dynamic_decode
Co-authored-by: LiuChiaChi <709153940@qq.com>
1. support channel last in BatchNorm*d (#27875)
2. fix a bug in batch_norm_op cuda kernel by extracting ResizeToChannelFist(Last), TransToChannelFirst(Last) to operators/layer_utils.h
* modify cond while_loop to paddle.static.nn.cond
* modify crop_tensor to paddle.crop
* modify Variable to paddle.static.Variable
* remove nn.beam_search, nn.beam_search_decode, nn.gather_tree
* remove bpr_loss, center_loss, rank_loss, smooth_l1, teacher_student_sigmoid_loss, edit_distance, sampled_softmax_with_cross_entropy in nn.functional
* remove apis in nn.functional.learn_rate.py
* remove pool2d, pool3d, adaptive_pool2d, adaptive_pool3d in nn.functional
* remove apis in nn.functional.vision
* remove erf, soft_relu in nn.functional.activation
* remove apis in nn.functional.extension
* remove nn.functional.rnn
* remove hash from nn.functional.lod
* remove row_conv from nn.functional.extension
* remove one_hot, pad2d, pad_constant_like from nn.functional.common
* remove nn.gather_tree, nn.BilinearTensorProduct, nn.Pool2D, nn.Pad2D
* remove apis from optimizer.__init
* remove tensor.creation.fill_constant
* remove elementwise_mul in nn.functional.common and modify to paddle.multiply
* remove tensor.stat.reduce_mean
* remove reduce_all, reduce_any in tensor.logic
* remove apis in tensor.math
* remove apis in tensor.__init__
* remove has_inf, has_nan in tensor.search
* remove apis in framework.__init__
* remove apis in paddle.__init__
* remove apis in nn.functional.__init__
* modify removed alias apis to raw api in doc and unittests
* fix remove grid_sample bug
* modify removed alias apis to raw api in doc and unittests
* modify removed alias apis to raw api in doc and unittests
* modify removed alias apis to raw api in doc and unittests
* modify removed alias apis to raw api in doc and unittests
* modify removed alias apis to raw api in doc and unittests
* modify removed alias apis to raw api in doc and unittests
* delete alias api relastions in doc
* reserve paddle.compat, paddle.sysconfig
* remove unittest for paddle.reduce_all, paddle.reduce_any
* modify removed alias apis to raw api in doc and unittests
* recover paddle.save and paddle.load
* resolve conflicts
* fix sample code missing paddle.enable_static() bug
* fix sample code missing paddle.enable_static() bug
* fix to_string sample code error
* del the DEFINE_ALIAS of sigmoid_cross_entropy_with_logits
* del sigmoid_cross_entropy_with_logits in python/paddle/nn/functional/loss.py, test=develop
* call paddle.fluid.layers.sigmoid_cross_entropy_with_logits in bce_with_logits_loss, test=develop
* Add api of constant in paddle.nn.initializer
* Add api of constant in paddle.nn.initializer
* Add api of constant in paddle.nn.initializer
* Add api of constant in paddle.nn.initializer
* Add api of constant in paddle.nn.initializer
* Add api of constant in paddle.nn.initializer
* Add api of constant in paddle.nn.initializer
* Add api of KaimingUniform & KaimingNormal in paddle.nn.initializer
* Add api of KaimingUniform & KaimingNormal in paddle.nn.initializer
* Add api of KaimingUniform & KaimingNormal in paddle.nn.initializer
* Add api of KaimingUniform & KaimingNormal in paddle.nn.initializer
* Add api of KaimingUniform & KaimingNormal in paddle.nn.initializer
* Add api of KaimingUniform & KaimingNormal in paddle.nn.initializer
* test=document_fix
* test=document_fix