* The leaf tensor concept is exposed and the gradient accumulation of leaf tensor
* The leaf tensor concept is exposed and the gradient accumulation of leaf tensor
* fix coverage
* fix api doc
* fix CI unittest
* fix CI unittest
* fix unitest
* empty tensor does’t need inner_var_
* fix some error message
* Add a class TensorInplaceVersion to count the inplace version and put it in framework::Tensor instead of Allocation or Variable.
* Add a new attribute `_inplace_version` for VarBase.
* Raise exception if an inplace operation can result in incorrect gradient computation.
* Add a new interface _bump_inplace_version() for VarBase to bump the version whenever the Tensor is modified through an inplace operation.
* For api assign, call _bump_inplace_version() when it's an inplace operation inn dynamic mode.
* Use original var_wrapper if the inplace_version is not changed.
* Replace SnapshotVarWrapperList with SnapshotVarWrapper to optimize performane.
* Changed a variable name error
* Add comments
* Move member functions of TranslatedLayer out of function
* edit code according to review
* Edit input argument of '_run_static_graph'
* reset due to Segmentation fault
* rename variables when stitching graph
* modify code according CI
* Add comments to '__i_m_p_l__'
* remove blanks befor 'Get...'
* edit code according to review
* Add a comment to '_execution_method_creator'
* Edit a comment to '_execution_method_creator'
* Generate code coverage reports only for incremental files, test=develop
* Generate code coverage reports only for incremental files, test=develop
* Generate code coverage reports only for incremental files, test=develop
* test for diff python file, test=develop
* fix no python diff report, test=develop
* add cc test file, test=develop
* fix bug in generic.cmake, test=develop
* for debug no cc report, test=develp
* modify compire branch form test_pr to test, test=develop
* fix bug, test=develop
* test for h file changed, test=develop
* debug for redefinition of argument optimize error, test=develop
* close -o3 for test, test=develop
* remove -o3 for test, test=develop
* remove coverage option for nvcc, test=develop
* use CMAKE_CXX_FLAGS open coverage option when header file changed, test=develop
* reopen -o3, test=develop
* remove debug code, test=develop
* remove unused code, test=develop
test_mnist failed on CUDA11. We found that it is due to PaddleInference IR Optimization after debugging. We disable it in this PR and we will re-enable it after PaddleInference fixes it.
GridGenerator model failed because the output shape of `linspace` is (-1). The reason is that C++ InferShape fixes the shape to (-1):
5da3d514eb/paddle/fluid/operators/linspace_op.cc (L49)
We cannot set the shape in C++ infer shape because this Tensor may not be initialized during compile time, but when input `num` of `linspace` is an integer, we know the shape at compiler time. This PR simply set the shape in Python and add GridGenerator as unittest.
* add reducer
* refine envent for memorycopy
* add concat&split for allreduce
* apply concat & split for fuse tensor
* fix nccl dep
* fix the untest, compile problem and ddp initialize problem
* fix untest for mac & add some comments & solve the repeated param in sublayers
* fix untest for windows & fix document
* add lars to fleet meta optimizer
* add lamb to proto
* add lamb to fleet meta optimizer
* fixed syntax bug
* fixed syntax bug
* fixed syntax error in lamb, add config setter of lamb in distributed_strategy
* trigger unitest to rerun
* add new unitest func for lamb
* revise unitest for lars and lamb
* revise dgc meta unitest
* revise lars document in distribute_strategy
* revise lars lamb document in distributed_strategy.py
* revise lars lamb document in distributed_strategy.py
* add weight decay exclude logic to lars
* restore optimzier.py
* restore optimizer.py as develop except lars
* add epsilon and exclude fn to distributed_sttrategy
* add lars epsilon
* revise unitest for fleet lars and lamb
* revise lars lamb unitest for CI coverage
* revise lars argument api
* revise lars argument api
* revise lars argument api
* revise api doc of lars
* fix op role
* add sharding save and add_sync_comm_for_test function
* add comm_analyse to utlis
* revise sharding_utils
* add sharding saving unittest
* revise sharding utils for unittest
* revise sharding en doc
* update sharding utils api
* add doc for sharding
* fixed bug in sharding var size count
* update varsize count in sharding
* fix sharding num_nccl_comm
* Revert "fix sharding num_nccl_comm"
This reverts commit d51587c15e9323acf226ddd36154275f0d1daf76.
* add static_only for static api
* addd static_only for class init
* remove static_only for default_main_program
* remove creater_parameter & startup_program
* remove failed apis
* revert py_func import
* remove global scope
* remove some api
* remove cuda pinned place
* add hapi api flops
* fix bug
* fix some bug
* add unit test
* fix unit test
* solve ci coverage
* fix doc
* fix doc
* fix static flops
* delete the comment
* fix some grammar problem in doc
* fix some bug
* fix some doc
* fix some doc
* Rename variables when use 'jit.load'
* Check whether the original graph contains the variable with the same name
* add comment
* rename output/input of op and edit unittest
* modify the code according to CI
* edit code according to CI
* edit code according to CI
* edit code according to CI
* edit code according to CI
* edit code according to CI
* edit code according to CI
* rewrite the sigmoid_focal_loss code example. test=develop
* fix spelling mistake in comments of code example.test=develop
* change print([.*].numpy()) to print([.*]) in example codes of sigmoid_focal_loss. test=document_fix
* save one name in cross_entropy and softmax_cross_entropy, test=develop
* change used function in CrossEntropy from softmax_cross_entropy to cross_entropy, test=develop
* Impelement 2.0 API version Conv2d and Linear layer quantization in imperative mode.
* use cudnn softmax in static Lenet
* Modified ChannelwiseQAT Unittest for 2.0 API.
* For CI python coverage.
* fix some docs test=develop;test=document_fix
* add code example test=develop;test=document_fix
* fix code example test=develop;test=document_fix
* fix code example test=develop;test=document_fix
* fix code example test=develop;test=document_fix
1) The operands are executed sequentially according to the running logic of Python.
2) If the left hand operand is True(for convert_logical_or)/False(for convert_logical_and), the right hand operand should be executed.
* fix eng doc, test=develop
* add import deprecated for layers, test=develop
* add block line for doc generate, test=develop
* remove todo for create_variable, test=develop
* add blank line for doc generate, test=develop
* add blank line for doc generate, test=develop
* add lstm, simple rnn op kernel
* fix the test_lstm for the rnn op
* change func name
* fix forward postprocess bug
* add gru forward, backward code
* remove unittest.skipIf; use a big rnn op instead of combination op
* fix input doesn't have gradient bug
* add eigen lstm forward, backward
Co-authored-by: wawltor <fangzeyang0904@hotmail.com>
* Support dy2stat error message when call jit.save;
* Polish dy2stat error message:
(1) the original dygraph code is marked with (* user code *) ;
(2) "In user code:" -> "In transformed code:"
* add lars to fleet meta optimizer
* add lamb to proto
* add lamb to fleet meta optimizer
* fixed syntax bug
* fixed syntax bug
* fixed syntax error in lamb, add config setter of lamb in distributed_strategy
* trigger unitest to rerun
* add new unitest func for lamb
* revise unitest for lars and lamb
* revise dgc meta unitest
* revise lars document in distribute_strategy
* revise lars lamb document in distributed_strategy.py
* revise lars lamb document in distributed_strategy.py
* add weight decay exclude logic to lars
* restore optimzier.py
* restore optimizer.py as develop except lars
* add epsilon and exclude fn to distributed_sttrategy
* add lars epsilon
* revise unitest for fleet lars and lamb
* revise lars lamb unitest for CI coverage
* revise lars argument api
* revise lars argument api
* revise lars argument api
* revise api doc of lars
* fix op role
* add sharding save and add_sync_comm_for_test function
* add comm_analyse to utlis
* revise sharding_utils
* add sharding saving unittest
* revise sharding utils for unittest
* add two apis: paddle.static.io.save_inference_model and paddle.static.io.load_inference_mode, which are campatible with paddle.fluid.io.save_inference_model and paddle.fluid.io.load_inference_model respectively.
* add unittest for new save_inference_model and load_inference_model. test=develop
* enhance doc. test=develop
* add paddle.enable_static() to test_inference_model_io.py. test=develop
* Fix gradients with ignore_idx in softmax_with_cross_entropy.
test=develop
* Fix gradients with ignore_idx in softmax_with_cross_entropy on cpu.
Remove softmax_with_cross_entropy from op_threshold_white_list.
test=develop
* Fix test_softmax_cross_entropy_op.py.
test=develop
* disable ut test_parallel_executor_fetch_isolated_var,test=document_fix
* test for limiting ut exec time as 15S
* fix an error caused by cannot find ut
* fix some error
* can not find test_transformer
* fix error caused by ut not run in windows
* fix error caused by Compiler Options
* fix error caused by setting timeout value as 15 in python/paddle/tests/CMakeLists.txt
* setting timeout value to 120s for old ut
* add the timeout value setting
* fix error caused by ut only run in coverage_ci
* add analyzer_transformer_profile_tester
* fix some error
* fix some error
* fix error with inference option
* fix error with inference option setting as ON_INFER
* add some ut to set timeout
* modified some option
* fix error
* fix some timeout error
* fix error
* fix error
* fix timeout for test_analyzer_bfloat16_resnet50
* fix error
* setting timeout properity for some ut
* first pr for new ut timeout as 15S
* refine jit.save/load to add support for other method, not only forward
* refine the code based on unit tests
* Add unit test for the code
* Add unit test for the code
* Modify the code according to the unit test
* Delete useless comments, save only one info file, etc.
* remove static_mode_white_list.pyc
* edit the code that generate 'extra_var_info'
* fp16 result ok
* change -DWITH_NVINFER_PLUGIN toconfig.EnableTensorRtOSS
* auto detect special slice op converter for ernie with trt oss
* ernie oss only support fp16
* fix special_slice_plugin serialize bug
* matmul in tensorrt ok
* ernie unittest ok
* add matmul tensorrt unittest
* remove demo code
This PR is follow up of #28213. On that PR we tried to decrease GPU usage, however the CI still randomly failed. So I added retry logic for the initialization of nccl and cusolver. If the initialization failed, we can retry to avoid the random failure.
* add + - * / @ [] operator to ComplexVariable, also add unittest
* fix circular reference bug
* fit for py2.7
* remove reverse oprators which not supported now
* Join break cond with while cond
* remove usless code
* refine the if code
* Split into BreakTransfromOptimizer
* add BreakTransformOptimizer in ast_transformer
* add more comment
* Release 2.0rc cherry pick api rename #28108 (#28184)
* rename count_include_pad-->exclusive return_indices-->return_mask
* remove track_running_stats
* fix typo.
* rename xxxd-->xxxxD
* solve conflicts
* 2.0rc api add all any (#28199)
* reduce trt warning message (#28011)
add paddle.enable_static() on sample code
alias recude_all-->all, reduce_any-->any
add import reduce_all and reduce_any in python/paddle/tensor/math.py
import all and any in python/paddle/tensor/__init__.py
remove all and any OP in python/paddle/tensor/logic.py, add all and any OP in python/paddle/tensor/math.py
fix import error
remove TestAllAPI temporary
* fix doc of recdue_all and reduce_any, test=document_fix
* fix typo
* fix unittest for all and any API
Co-authored-by: Pei Yang <peiyang@baidu.com>
* rename conv_transposeXd-->convXd_transpose (#28198)
* fix sample code of reduce_all and reduce_any
Co-authored-by: Pei Yang <peiyang@baidu.com>
Recently, test_parallel_executor_test_while_train randomly failed on CI. On all CI logs, it showed NCCL initialization failed or cusolver initialization failed. I found online that those failure is usually caused by GPU shortage. Those API calls CUDA APIs directly so it shouldn't be the problem of allocator. It may be somewhere in PaddlePaddle increases GPU usage.
However, I run this test for 1000 times on my machine and the CI machine, either of them can reproduce the random failure. Maybe there is something related to the environment only happened in test env.
To verify my assumption that somewhere in PaddlePaddle increases GPU usage and also fix this CI, I decreased the batch_size to see whether the random failure disappears in test env.
* fix strided_slice_op's GetExpectedKernelType when input tensor is at CUDAPinnedPlace
* add unittest for tensors in cuda pinned place
* skip test for cuda pinned place on cpu machines
* fix bug of fetch_async_op_handle
* revert some changes of test_buffer_shared_memory_reuse_pass
* revert some changes of test_buffer_shared_memory_reuse_pass
* transfer from paddle.fluid.layers.assign() into creation.py,test=develop
* fix ut fail,add support for paddle.assign,test=develop
* fix,test=develop
* fix UT coverage,test=coverage
* fix UT fail,test=coverage
* fix doc,test=develop
* Still has bugs.
* Fixed allclose_op bug, which cannot deal with some cases of fp64 inputs.
* improved CUDA kernel performance.
* Changed CUDA code.
* Fixed a bug in cuda kernel which cannot deal with large dimension input, and added an unittest for it.
* Add a test case for float32 input.
* fix multinomial doc
* fix multinomial error message
* little doc change
* fix Categorical class doc
* optimize format of error message
* fix CPU Kernel error message format
* fix isinf and isnan error in WindowsOPENBLAS CI
* delete inf and nan
* add manual_seed in sample code
* little error message change
* change error message to InvalidArgument
* add full point for error message and add manual_seed in CPU environment
* Add truncated_gaussian_random_op XPU kernel
* Add truncated_gaussian_random_op XPU kernel, test=kunlun
* little change, test=kunlun
* change boost_get to BOOST_GET_CONST
* change boost_get to BOOST_GET_CONST, test=kunlun
* little change, test=kunlun
* use Generator to generate random number and optimize format, test=kunlun
* little change, test=kunlun
* add TODO, test=kunlun
* Add gaussian_random XPU kernels
* commit kunlun, test=kunlun
* new version, test=kunlun
* change boost_get to BOOST_GET_CONST, test=kunlun
* use Generator to generate random number and optimize format, test=kunlun
* add TODO, test=kunlun
* support uniform_random op on Baidu Kunlun
* change dtype of attr shape from int to int64_t
* kunlun ci, test=kunlun
* new version, test=kunlun
* change boost_get to BOOST_GET_CONST
* change boost_get to BOOST_GET_CONST, test=kunlun
* use Generator to generate random number and optimize format
* run Kunlun CI, test=kunlun
* add TODO, test=kunlun
* Incorporate cudnn_lstm into LSTM api.
test=develop
* Make coalesce_tensor support alignment optionally.
test=develop
* Reorganize RNN apis. test=develop
* Fix cudnn rnn layout conversion.
test=develop
* Add sequence_length support for RNN cudnn implement.
Add optional init_h and init_c gradient for cudnn_lstm_op.
test=develop
* Use create_parameter for rnn cudnn impl.
test=develop
* Move `self._flat_weight = self.create_parameter()` in RNNBase to main_program.
test=develop
* Update RNN api unittest to use set_device.
test=develop
* Fix set_place for unit tests of RNN apis.
test=develop
* Fix use_align in coalesce_tensor_op.
test=develop
* Adjust RNN apis arguments according to comments.
test=develop
* Polish documents for SimpleRNN apis.
test=develop
* Refine random seed in cudnn_lstm_op.
Expose rnn params from sublayers to RNN.
test=develop
* Fix RNN saving for jit.save.
Refine cudnn_lstm dropout behavior.
test=develop
* Fix doc of GRU. test=develop
* Use ShareDataWith to avoid copying for cudnn_lstm_op test.
test=develop
* Remove updates on cudnn_lstm temporarily.
test=develop
* Use ShareDataWith to avoid copying for cudnn_lstm_op test.
test=develop
* Refine random seed in cudnn_lstm_op.
test=develop
* Fix test_lstm by adjust ConcreteProgram buffer getter.
test=develop
* Use create_parameter instead of create_var for rnn._flat_weight for static graph usage.
test=develop
* Remove W input for cudnn_lstm to pass unused_var_check.
test=develop
* Add test_predict for RNN unit tests coverage.
test=develop
* Fix code style of rnn.
test=develop
* Fix F.rnn usage in rnn.py.
test=develop
* test=kunlun;
Add elementwise XPU OP kernel for KUNLUN core, including (but still cannot process common broadcast):
* elementwise_div op
* elementwise_max op
* elementwise_mul op (with grad op)
* elementwise_sub op (with grad op)
* 0.05->0.01
* add xpu error message description;test=kunlun
* Make dynamic_decode support dygraph and expose to API 2.0
test=develop
* update info about BeamSearchDecoder and dynamic_decode
* remove all APIs in paddle.text, expose BeamSearchDecoder and dynamic_decode
* update example code
* delete test_text.py, decode.py, update some doc, fix example code float64
* delete decode import from paddle.nn
* fix unittest bugs
* use dygraph.Embedding instead of nn.Embedding, add paddle.enbale_static()
* update, correct doc
* move dynamic_decode, BeamSearchDecoder API to paddle.nn
* fix code style
* update unittest param, delete import pf text.py
* set dtype of beamsearchtest float64
* update example code of BeamSearchDecoder, dynamic_decode
Co-authored-by: LiuChiaChi <709153940@qq.com>
1. support channel last in BatchNorm*d (#27875)
2. fix a bug in batch_norm_op cuda kernel by extracting ResizeToChannelFist(Last), TransToChannelFirst(Last) to operators/layer_utils.h
* modify cond while_loop to paddle.static.nn.cond
* modify crop_tensor to paddle.crop
* modify Variable to paddle.static.Variable
* remove nn.beam_search, nn.beam_search_decode, nn.gather_tree
* remove bpr_loss, center_loss, rank_loss, smooth_l1, teacher_student_sigmoid_loss, edit_distance, sampled_softmax_with_cross_entropy in nn.functional
* remove apis in nn.functional.learn_rate.py
* remove pool2d, pool3d, adaptive_pool2d, adaptive_pool3d in nn.functional
* remove apis in nn.functional.vision
* remove erf, soft_relu in nn.functional.activation
* remove apis in nn.functional.extension
* remove nn.functional.rnn
* remove hash from nn.functional.lod
* remove row_conv from nn.functional.extension
* remove one_hot, pad2d, pad_constant_like from nn.functional.common
* remove nn.gather_tree, nn.BilinearTensorProduct, nn.Pool2D, nn.Pad2D
* remove apis from optimizer.__init
* remove tensor.creation.fill_constant
* remove elementwise_mul in nn.functional.common and modify to paddle.multiply
* remove tensor.stat.reduce_mean
* remove reduce_all, reduce_any in tensor.logic
* remove apis in tensor.math
* remove apis in tensor.__init__
* remove has_inf, has_nan in tensor.search
* remove apis in framework.__init__
* remove apis in paddle.__init__
* remove apis in nn.functional.__init__
* modify removed alias apis to raw api in doc and unittests
* fix remove grid_sample bug
* modify removed alias apis to raw api in doc and unittests
* modify removed alias apis to raw api in doc and unittests
* modify removed alias apis to raw api in doc and unittests
* modify removed alias apis to raw api in doc and unittests
* modify removed alias apis to raw api in doc and unittests
* modify removed alias apis to raw api in doc and unittests
* delete alias api relastions in doc
* reserve paddle.compat, paddle.sysconfig
* remove unittest for paddle.reduce_all, paddle.reduce_any
* modify removed alias apis to raw api in doc and unittests
* recover paddle.save and paddle.load
* resolve conflicts
* fix sample code missing paddle.enable_static() bug
* fix sample code missing paddle.enable_static() bug
* fix to_string sample code error
* del the DEFINE_ALIAS of sigmoid_cross_entropy_with_logits
* del sigmoid_cross_entropy_with_logits in python/paddle/nn/functional/loss.py, test=develop
* call paddle.fluid.layers.sigmoid_cross_entropy_with_logits in bce_with_logits_loss, test=develop
* Add api of constant in paddle.nn.initializer
* Add api of constant in paddle.nn.initializer
* Add api of constant in paddle.nn.initializer
* Add api of constant in paddle.nn.initializer
* Add api of constant in paddle.nn.initializer
* Add api of constant in paddle.nn.initializer
* Add api of constant in paddle.nn.initializer
* Add api of KaimingUniform & KaimingNormal in paddle.nn.initializer
* Add api of KaimingUniform & KaimingNormal in paddle.nn.initializer
* Add api of KaimingUniform & KaimingNormal in paddle.nn.initializer
* Add api of KaimingUniform & KaimingNormal in paddle.nn.initializer
* Add api of KaimingUniform & KaimingNormal in paddle.nn.initializer
* Add api of KaimingUniform & KaimingNormal in paddle.nn.initializer
* test=document_fix
* test=document_fix
* fix gpu version paddle Error when have no CUDA device
* optimize format and add new unittest
* fix coverage problem
* fix unittest format
* change static mode to dygraph mode
* use subprocess in unittest