* Join break cond with while cond
* remove usless code
* refine the if code
* Split into BreakTransfromOptimizer
* add BreakTransformOptimizer in ast_transformer
* add more comment
* Release 2.0rc cherry pick api rename #28108 (#28184)
* rename count_include_pad-->exclusive return_indices-->return_mask
* remove track_running_stats
* fix typo.
* rename xxxd-->xxxxD
* solve conflicts
* 2.0rc api add all any (#28199)
* reduce trt warning message (#28011)
add paddle.enable_static() on sample code
alias recude_all-->all, reduce_any-->any
add import reduce_all and reduce_any in python/paddle/tensor/math.py
import all and any in python/paddle/tensor/__init__.py
remove all and any OP in python/paddle/tensor/logic.py, add all and any OP in python/paddle/tensor/math.py
fix import error
remove TestAllAPI temporary
* fix doc of recdue_all and reduce_any, test=document_fix
* fix typo
* fix unittest for all and any API
Co-authored-by: Pei Yang <peiyang@baidu.com>
* rename conv_transposeXd-->convXd_transpose (#28198)
* fix sample code of reduce_all and reduce_any
Co-authored-by: Pei Yang <peiyang@baidu.com>
Recently, test_parallel_executor_test_while_train randomly failed on CI. On all CI logs, it showed NCCL initialization failed or cusolver initialization failed. I found online that those failure is usually caused by GPU shortage. Those API calls CUDA APIs directly so it shouldn't be the problem of allocator. It may be somewhere in PaddlePaddle increases GPU usage.
However, I run this test for 1000 times on my machine and the CI machine, either of them can reproduce the random failure. Maybe there is something related to the environment only happened in test env.
To verify my assumption that somewhere in PaddlePaddle increases GPU usage and also fix this CI, I decreased the batch_size to see whether the random failure disappears in test env.
* fix strided_slice_op's GetExpectedKernelType when input tensor is at CUDAPinnedPlace
* add unittest for tensors in cuda pinned place
* skip test for cuda pinned place on cpu machines
* fix bug of fetch_async_op_handle
* revert some changes of test_buffer_shared_memory_reuse_pass
* revert some changes of test_buffer_shared_memory_reuse_pass
* transfer from paddle.fluid.layers.assign() into creation.py,test=develop
* fix ut fail,add support for paddle.assign,test=develop
* fix,test=develop
* fix UT coverage,test=coverage
* fix UT fail,test=coverage
* fix doc,test=develop
* Still has bugs.
* Fixed allclose_op bug, which cannot deal with some cases of fp64 inputs.
* improved CUDA kernel performance.
* Changed CUDA code.
* Fixed a bug in cuda kernel which cannot deal with large dimension input, and added an unittest for it.
* Add a test case for float32 input.
* fix multinomial doc
* fix multinomial error message
* little doc change
* fix Categorical class doc
* optimize format of error message
* fix CPU Kernel error message format
* fix isinf and isnan error in WindowsOPENBLAS CI
* delete inf and nan
* add manual_seed in sample code
* little error message change
* change error message to InvalidArgument
* add full point for error message and add manual_seed in CPU environment
* Add truncated_gaussian_random_op XPU kernel
* Add truncated_gaussian_random_op XPU kernel, test=kunlun
* little change, test=kunlun
* change boost_get to BOOST_GET_CONST
* change boost_get to BOOST_GET_CONST, test=kunlun
* little change, test=kunlun
* use Generator to generate random number and optimize format, test=kunlun
* little change, test=kunlun
* add TODO, test=kunlun
* Add gaussian_random XPU kernels
* commit kunlun, test=kunlun
* new version, test=kunlun
* change boost_get to BOOST_GET_CONST, test=kunlun
* use Generator to generate random number and optimize format, test=kunlun
* add TODO, test=kunlun
* support uniform_random op on Baidu Kunlun
* change dtype of attr shape from int to int64_t
* kunlun ci, test=kunlun
* new version, test=kunlun
* change boost_get to BOOST_GET_CONST
* change boost_get to BOOST_GET_CONST, test=kunlun
* use Generator to generate random number and optimize format
* run Kunlun CI, test=kunlun
* add TODO, test=kunlun
* Incorporate cudnn_lstm into LSTM api.
test=develop
* Make coalesce_tensor support alignment optionally.
test=develop
* Reorganize RNN apis. test=develop
* Fix cudnn rnn layout conversion.
test=develop
* Add sequence_length support for RNN cudnn implement.
Add optional init_h and init_c gradient for cudnn_lstm_op.
test=develop
* Use create_parameter for rnn cudnn impl.
test=develop
* Move `self._flat_weight = self.create_parameter()` in RNNBase to main_program.
test=develop
* Update RNN api unittest to use set_device.
test=develop
* Fix set_place for unit tests of RNN apis.
test=develop
* Fix use_align in coalesce_tensor_op.
test=develop
* Adjust RNN apis arguments according to comments.
test=develop
* Polish documents for SimpleRNN apis.
test=develop
* Refine random seed in cudnn_lstm_op.
Expose rnn params from sublayers to RNN.
test=develop
* Fix RNN saving for jit.save.
Refine cudnn_lstm dropout behavior.
test=develop
* Fix doc of GRU. test=develop
* Use ShareDataWith to avoid copying for cudnn_lstm_op test.
test=develop
* Remove updates on cudnn_lstm temporarily.
test=develop
* Use ShareDataWith to avoid copying for cudnn_lstm_op test.
test=develop
* Refine random seed in cudnn_lstm_op.
test=develop
* Fix test_lstm by adjust ConcreteProgram buffer getter.
test=develop
* Use create_parameter instead of create_var for rnn._flat_weight for static graph usage.
test=develop
* Remove W input for cudnn_lstm to pass unused_var_check.
test=develop
* Add test_predict for RNN unit tests coverage.
test=develop
* Fix code style of rnn.
test=develop
* Fix F.rnn usage in rnn.py.
test=develop
* test=kunlun;
Add elementwise XPU OP kernel for KUNLUN core, including (but still cannot process common broadcast):
* elementwise_div op
* elementwise_max op
* elementwise_mul op (with grad op)
* elementwise_sub op (with grad op)
* 0.05->0.01
* add xpu error message description;test=kunlun
* Make dynamic_decode support dygraph and expose to API 2.0
test=develop
* update info about BeamSearchDecoder and dynamic_decode
* remove all APIs in paddle.text, expose BeamSearchDecoder and dynamic_decode
* update example code
* delete test_text.py, decode.py, update some doc, fix example code float64
* delete decode import from paddle.nn
* fix unittest bugs
* use dygraph.Embedding instead of nn.Embedding, add paddle.enbale_static()
* update, correct doc
* move dynamic_decode, BeamSearchDecoder API to paddle.nn
* fix code style
* update unittest param, delete import pf text.py
* set dtype of beamsearchtest float64
* update example code of BeamSearchDecoder, dynamic_decode
Co-authored-by: LiuChiaChi <709153940@qq.com>
1. support channel last in BatchNorm*d (#27875)
2. fix a bug in batch_norm_op cuda kernel by extracting ResizeToChannelFist(Last), TransToChannelFirst(Last) to operators/layer_utils.h
* modify cond while_loop to paddle.static.nn.cond
* modify crop_tensor to paddle.crop
* modify Variable to paddle.static.Variable
* remove nn.beam_search, nn.beam_search_decode, nn.gather_tree
* remove bpr_loss, center_loss, rank_loss, smooth_l1, teacher_student_sigmoid_loss, edit_distance, sampled_softmax_with_cross_entropy in nn.functional
* remove apis in nn.functional.learn_rate.py
* remove pool2d, pool3d, adaptive_pool2d, adaptive_pool3d in nn.functional
* remove apis in nn.functional.vision
* remove erf, soft_relu in nn.functional.activation
* remove apis in nn.functional.extension
* remove nn.functional.rnn
* remove hash from nn.functional.lod
* remove row_conv from nn.functional.extension
* remove one_hot, pad2d, pad_constant_like from nn.functional.common
* remove nn.gather_tree, nn.BilinearTensorProduct, nn.Pool2D, nn.Pad2D
* remove apis from optimizer.__init
* remove tensor.creation.fill_constant
* remove elementwise_mul in nn.functional.common and modify to paddle.multiply
* remove tensor.stat.reduce_mean
* remove reduce_all, reduce_any in tensor.logic
* remove apis in tensor.math
* remove apis in tensor.__init__
* remove has_inf, has_nan in tensor.search
* remove apis in framework.__init__
* remove apis in paddle.__init__
* remove apis in nn.functional.__init__
* modify removed alias apis to raw api in doc and unittests
* fix remove grid_sample bug
* modify removed alias apis to raw api in doc and unittests
* modify removed alias apis to raw api in doc and unittests
* modify removed alias apis to raw api in doc and unittests
* modify removed alias apis to raw api in doc and unittests
* modify removed alias apis to raw api in doc and unittests
* modify removed alias apis to raw api in doc and unittests
* delete alias api relastions in doc
* reserve paddle.compat, paddle.sysconfig
* remove unittest for paddle.reduce_all, paddle.reduce_any
* modify removed alias apis to raw api in doc and unittests
* recover paddle.save and paddle.load
* resolve conflicts
* fix sample code missing paddle.enable_static() bug
* fix sample code missing paddle.enable_static() bug
* fix to_string sample code error
* Add api of constant in paddle.nn.initializer
* Add api of constant in paddle.nn.initializer
* Add api of constant in paddle.nn.initializer
* Add api of constant in paddle.nn.initializer
* Add api of constant in paddle.nn.initializer
* Add api of constant in paddle.nn.initializer
* Add api of constant in paddle.nn.initializer
* Add api of KaimingUniform & KaimingNormal in paddle.nn.initializer
* Add api of KaimingUniform & KaimingNormal in paddle.nn.initializer
* Add api of KaimingUniform & KaimingNormal in paddle.nn.initializer
* Add api of KaimingUniform & KaimingNormal in paddle.nn.initializer
* Add api of KaimingUniform & KaimingNormal in paddle.nn.initializer
* Add api of KaimingUniform & KaimingNormal in paddle.nn.initializer
* test=document_fix
* test=document_fix
* fix gpu version paddle Error when have no CUDA device
* optimize format and add new unittest
* fix coverage problem
* fix unittest format
* change static mode to dygraph mode
* use subprocess in unittest
* 1. remove paddle.unique_with_counts api, which counts as unique api
2. add paddle.math.increment(x, value=1.0, name=None) api
3. replace paddle.sums with paddle.add_n api
4. update paddle.metric.accuracy api (add name parameter)
* fix doc and unittest of 2.0 lr_scheduler
* fix doc of 2.0 lr_scheduler
* fix unittest
* fix english doc of lr_scheduler
* fix api name of lr scheduler
* fix api name of lr scheduler
* add load_op_xpu for Baidu Kunlun, test=kunlun
* add is_compiled_with_xpu for unit test, test=kunlun
* add is_compiled_with_xpu for unit test, test=kunlun
* replace config by kwargs
* change save path form dir to prefix
* fix failed unittests
* revert unittest name change
* polish en docs
* add more tests for coverage
* Add api of constant in paddle.nn.initializer
* Add api of constant in paddle.nn.initializer
* Add api of constant in paddle.nn.initializer
* Add api of constant in paddle.nn.initializer
* Add api of constant in paddle.nn.initializer
* Add api of constant in paddle.nn.initializer
* Add api of constant in paddle.nn.initializer
* modify doc for Layer, test=develop
* modify doc for Layer2, test=develop
* set dtype default value to float32, test=develop
* add example code for paddle.nn.Layer, test=develop
* set create_parameter and create_variable dtype default value to None, test=develop
* modify some example code, tet=develop
* refine, test=develop
* del no ues code, test=develop
* modify doc, example code, args, test=develop
* modify doc, test=develop
* fix huber_loss ans npair_loss doc and example code, test=document_fix
* remove disable_static in example code, test=document_fix
* remove huber_loss and refine npair_loss example code, test=document_fix
* remove huber_loss in functional/__init__.py, test=document_fix
* add multinomial cpu kernel
* fix C++ notype error
* fix windows ci array len error
* let array len be const
* change array to vector
* add cuda kernrl with num_distribution is 1, and not support replacement=False
* add multinomial python api
* support num_distribution different multinomial distributions
* add categorical class
* fix test_distribution enable_static error
* add unittest for different setting of Categorical
* optimize format
* little change
* little change
* add raise error if shape not match, optimize format
* fix windows CI dtype error in concat
* little changes
* little changes2
* change values type to int64
* change values type to int64
* change values type to int64
* add multinomial cpu kernel
* fix C++ notype error
* fix windows ci array len error
* let array len be const
* change array to vector
* add cuda kernrl with num_distribution is 1, and not support replacement=False
* add multinomial python api
* support num_distribution different multinomial distributions
* add multinomial python api unittest
* change output dtype to int64
* fix coverage prob
* optimize format
* fix dtype of output error, should be int64_t
* increase tolerance
* increase the difference between low and high
* change tolerance of Normal log_prob method
* change probs tolerance to 1e-4
* change tolerance of Normal kl method