* fix expand && concat/transpose to new api
* update xpu_header
* update activation op on kunlun
* update activation op on kunlun
* update activation op on kunlun
* update activation op on kunlun
* update activation op on kunlun
* add nearest_interp on kunlun
* update error message
* added UT should not exceed 15s
* fix error
* UT limit of 15s is the first to be executed
* fix error
* fix error with CI_SKIP_CPP_TEST
* modfied tiemout setting
* fix error
1. Fix error in _build_cond_stmt of for-range stmts.
2. Support that step value is negative in for-range stmts
3. Fix code because of the diff between Py2 and Py3
Fix 3 Windows Unittests
test_fuse_all_reduce_pass: Paddle cannot run multiple-GPU on Windows so set single visible GPU flag
test_feed_data_check_shape_type: Paddle cannot run multiple-GPU on Windows so set single visible GPU flag
test_tsm: Winodws GPU size is not enough so decrease batch size and data size.
* Fix a bug when running on an operating system without "bash."
* add execution condition
* for ci-coverage
* get cpu information to check the precision problem
* Update compilation environment for musl version
* update dependencies
* remove test code
check cpu info
remove test code
review
* update alpine and third_party denpendencies
* add newline for ci Code format
* Fix api docs in RNN, Transformer, layer_norm, WeightNormParamAttr.
test=develop
* Fix api doc for print in label_smooth.
test=develop
* Update api docs according to review comments.
Add name argument in RNN back.
test=develop
* add complex64 and complex128 type; add +-*/@ and slice opreator for complex types
* add test cases for complex elementwise, matmul and getitem unittest
* add test cases for complex types
* add test cases for complex matmul unittest
* kron, reshape, transpose support complex types
* sum and trace op support complex types
* add test case of sum and trace op
* fix the bug of imag part of complex not initialized
* format file
* format code style
* kron support type promotion; modify test cases
* basic impl of type promote
* add comment & another testcase
* fix complex bugs & support python op promote type
* fix failed unittests & polish code
* add unittest for coverage
* change to only promote complex type
* polish code details
* polish several comments
Usage scenarios:A function could have run successfully in static mode, you can use it to decorate a function in the following cases:
1. An unknown error occurs in the dynamic-to-static conversion process of the function;
2. In the internal implementation of the function, it has two branches: dynamic branch and static branch;
3. Users don't want to convert the function in the process of dynamic to static.
* add the weight decay func for the momentum op
* Add the multi_precision function in Momentum Optimizer.
* Make sure that the initial value of master weights are same with the fp16 weights.
* add static loss scaling.
* add the rescale_grad function in the pure fp16 training.
* use the original momentum updating method.
* Polish some codes, such as variable names.
* add docstring for apis.
* update the var creation details of _create_master_weight.
* not modify codes about imperative momentum updating.
* Fix the error of test_dist_sparse_tensor_load_momentum UT.
* add unit test for multi precision fp16 training.
* add more unit tests for CI.
* Use lower threshold values for allclose comparing in test_multi_precision_fp16_train UT.
* For CI Coverage Checking.
* add fp16 for layer_norm op
* revert layernorm api
* fix forward
* fix forward
* fix backward for layernorm with fp16
* fix unit test for layernorm with fp16
* fix with_mkldnn compile error for layernorm with fp16
* 1. revert to PADDLE_ENFORCE_NOT_NULL, 2. change static_cast<float> to static_cast<U>
* fix with_mkldnn compile error for layernorm with fp16
* fix with_mkldnn compile error for layernorm with fp16
Co-authored-by: zhiqiu <chenqiuliang@baidu.com>
This PR fixes several problems in dy2stat for Deoldify model in PaddleGan.
In model, software engineer wrote if x.shape == y.shape, the Tenser shape is a tuple in dygraph so the == returns True/False, but in static graph the == becomes element-wise comparison, which is a different behavior. In this PR we reduce the element-wise comparison result.
If software engineer write computations which uses parameters in hooks, the static graph can loss the parameter variable because we put param_guard at forward of a Layer. In this PR we made param_guard cover pre-hook and post-hook.
In PaddleGan, software engineer calculated some parameter values in __init__ by running some dygraph code. Those code also run during dy2stat. So some variables may be assign as a VarBase (Tensor) first and then Variable, which raised an error. We fixed the bug in this PR by handling the case.
TODO: We just added testcase for the 1. shape comparison. Should add test case for 2. and 3. But since we are chasing 2.0RC, I will do it in the near future PR
* add complex64 and complex128 type; add +-*/@ and slice opreator for complex types
* add test cases for complex elementwise, matmul and getitem unittest
* add test cases for complex types
* add test cases for complex matmul unittest
* The leaf tensor concept is exposed and the gradient accumulation of leaf tensor
* The leaf tensor concept is exposed and the gradient accumulation of leaf tensor
* fix coverage
* fix api doc
* fix CI unittest
* fix CI unittest
* fix unitest
* empty tensor does’t need inner_var_
* fix some error message
* Add a class TensorInplaceVersion to count the inplace version and put it in framework::Tensor instead of Allocation or Variable.
* Add a new attribute `_inplace_version` for VarBase.
* Raise exception if an inplace operation can result in incorrect gradient computation.
* Add a new interface _bump_inplace_version() for VarBase to bump the version whenever the Tensor is modified through an inplace operation.
* For api assign, call _bump_inplace_version() when it's an inplace operation inn dynamic mode.
* Use original var_wrapper if the inplace_version is not changed.
* Replace SnapshotVarWrapperList with SnapshotVarWrapper to optimize performane.
* Changed a variable name error
* Add comments
* Move member functions of TranslatedLayer out of function
* edit code according to review
* Edit input argument of '_run_static_graph'
* reset due to Segmentation fault
* rename variables when stitching graph
* modify code according CI
* Add comments to '__i_m_p_l__'
* remove blanks befor 'Get...'
* edit code according to review
* Add a comment to '_execution_method_creator'
* Edit a comment to '_execution_method_creator'
* Generate code coverage reports only for incremental files, test=develop
* Generate code coverage reports only for incremental files, test=develop
* Generate code coverage reports only for incremental files, test=develop
* test for diff python file, test=develop
* fix no python diff report, test=develop
* add cc test file, test=develop
* fix bug in generic.cmake, test=develop
* for debug no cc report, test=develp
* modify compire branch form test_pr to test, test=develop
* fix bug, test=develop
* test for h file changed, test=develop
* debug for redefinition of argument optimize error, test=develop
* close -o3 for test, test=develop
* remove -o3 for test, test=develop
* remove coverage option for nvcc, test=develop
* use CMAKE_CXX_FLAGS open coverage option when header file changed, test=develop
* reopen -o3, test=develop
* remove debug code, test=develop
* remove unused code, test=develop
test_mnist failed on CUDA11. We found that it is due to PaddleInference IR Optimization after debugging. We disable it in this PR and we will re-enable it after PaddleInference fixes it.
GridGenerator model failed because the output shape of `linspace` is (-1). The reason is that C++ InferShape fixes the shape to (-1):
5da3d514eb/paddle/fluid/operators/linspace_op.cc (L49)
We cannot set the shape in C++ infer shape because this Tensor may not be initialized during compile time, but when input `num` of `linspace` is an integer, we know the shape at compiler time. This PR simply set the shape in Python and add GridGenerator as unittest.
* add reducer
* refine envent for memorycopy
* add concat&split for allreduce
* apply concat & split for fuse tensor
* fix nccl dep
* fix the untest, compile problem and ddp initialize problem
* fix untest for mac & add some comments & solve the repeated param in sublayers
* fix untest for windows & fix document
* add lars to fleet meta optimizer
* add lamb to proto
* add lamb to fleet meta optimizer
* fixed syntax bug
* fixed syntax bug
* fixed syntax error in lamb, add config setter of lamb in distributed_strategy
* trigger unitest to rerun
* add new unitest func for lamb
* revise unitest for lars and lamb
* revise dgc meta unitest
* revise lars document in distribute_strategy
* revise lars lamb document in distributed_strategy.py
* revise lars lamb document in distributed_strategy.py
* add weight decay exclude logic to lars
* restore optimzier.py
* restore optimizer.py as develop except lars
* add epsilon and exclude fn to distributed_sttrategy
* add lars epsilon
* revise unitest for fleet lars and lamb
* revise lars lamb unitest for CI coverage
* revise lars argument api
* revise lars argument api
* revise lars argument api
* revise api doc of lars
* fix op role
* add sharding save and add_sync_comm_for_test function
* add comm_analyse to utlis
* revise sharding_utils
* add sharding saving unittest
* revise sharding utils for unittest
* revise sharding en doc
* update sharding utils api
* add doc for sharding
* fixed bug in sharding var size count
* update varsize count in sharding
* fix sharding num_nccl_comm
* Revert "fix sharding num_nccl_comm"
This reverts commit d51587c15e9323acf226ddd36154275f0d1daf76.
* add static_only for static api
* addd static_only for class init
* remove static_only for default_main_program
* remove creater_parameter & startup_program
* remove failed apis
* revert py_func import
* remove global scope
* remove some api
* remove cuda pinned place
* add hapi api flops
* fix bug
* fix some bug
* add unit test
* fix unit test
* solve ci coverage
* fix doc
* fix doc
* fix static flops
* delete the comment
* fix some grammar problem in doc
* fix some bug
* fix some doc
* fix some doc
* Rename variables when use 'jit.load'
* Check whether the original graph contains the variable with the same name
* add comment
* rename output/input of op and edit unittest
* modify the code according to CI
* edit code according to CI
* edit code according to CI
* edit code according to CI
* edit code according to CI
* edit code according to CI
* edit code according to CI
* rewrite the sigmoid_focal_loss code example. test=develop
* fix spelling mistake in comments of code example.test=develop
* change print([.*].numpy()) to print([.*]) in example codes of sigmoid_focal_loss. test=document_fix
* save one name in cross_entropy and softmax_cross_entropy, test=develop
* change used function in CrossEntropy from softmax_cross_entropy to cross_entropy, test=develop
* Impelement 2.0 API version Conv2d and Linear layer quantization in imperative mode.
* use cudnn softmax in static Lenet
* Modified ChannelwiseQAT Unittest for 2.0 API.
* For CI python coverage.
* fix some docs test=develop;test=document_fix
* add code example test=develop;test=document_fix
* fix code example test=develop;test=document_fix
* fix code example test=develop;test=document_fix
* fix code example test=develop;test=document_fix
1) The operands are executed sequentially according to the running logic of Python.
2) If the left hand operand is True(for convert_logical_or)/False(for convert_logical_and), the right hand operand should be executed.
* fix eng doc, test=develop
* add import deprecated for layers, test=develop
* add block line for doc generate, test=develop
* remove todo for create_variable, test=develop
* add blank line for doc generate, test=develop
* add blank line for doc generate, test=develop
* add lstm, simple rnn op kernel
* fix the test_lstm for the rnn op
* change func name
* fix forward postprocess bug
* add gru forward, backward code
* remove unittest.skipIf; use a big rnn op instead of combination op
* fix input doesn't have gradient bug
* add eigen lstm forward, backward
Co-authored-by: wawltor <fangzeyang0904@hotmail.com>
* Support dy2stat error message when call jit.save;
* Polish dy2stat error message:
(1) the original dygraph code is marked with (* user code *) ;
(2) "In user code:" -> "In transformed code:"
* add lars to fleet meta optimizer
* add lamb to proto
* add lamb to fleet meta optimizer
* fixed syntax bug
* fixed syntax bug
* fixed syntax error in lamb, add config setter of lamb in distributed_strategy
* trigger unitest to rerun
* add new unitest func for lamb
* revise unitest for lars and lamb
* revise dgc meta unitest
* revise lars document in distribute_strategy
* revise lars lamb document in distributed_strategy.py
* revise lars lamb document in distributed_strategy.py
* add weight decay exclude logic to lars
* restore optimzier.py
* restore optimizer.py as develop except lars
* add epsilon and exclude fn to distributed_sttrategy
* add lars epsilon
* revise unitest for fleet lars and lamb
* revise lars lamb unitest for CI coverage
* revise lars argument api
* revise lars argument api
* revise lars argument api
* revise api doc of lars
* fix op role
* add sharding save and add_sync_comm_for_test function
* add comm_analyse to utlis
* revise sharding_utils
* add sharding saving unittest
* revise sharding utils for unittest
* add two apis: paddle.static.io.save_inference_model and paddle.static.io.load_inference_mode, which are campatible with paddle.fluid.io.save_inference_model and paddle.fluid.io.load_inference_model respectively.
* add unittest for new save_inference_model and load_inference_model. test=develop
* enhance doc. test=develop
* add paddle.enable_static() to test_inference_model_io.py. test=develop
* Fix gradients with ignore_idx in softmax_with_cross_entropy.
test=develop
* Fix gradients with ignore_idx in softmax_with_cross_entropy on cpu.
Remove softmax_with_cross_entropy from op_threshold_white_list.
test=develop
* Fix test_softmax_cross_entropy_op.py.
test=develop
* disable ut test_parallel_executor_fetch_isolated_var,test=document_fix
* test for limiting ut exec time as 15S
* fix an error caused by cannot find ut
* fix some error
* can not find test_transformer
* fix error caused by ut not run in windows
* fix error caused by Compiler Options
* fix error caused by setting timeout value as 15 in python/paddle/tests/CMakeLists.txt
* setting timeout value to 120s for old ut
* add the timeout value setting
* fix error caused by ut only run in coverage_ci
* add analyzer_transformer_profile_tester
* fix some error
* fix some error
* fix error with inference option
* fix error with inference option setting as ON_INFER
* add some ut to set timeout
* modified some option
* fix error
* fix some timeout error
* fix error
* fix error
* fix timeout for test_analyzer_bfloat16_resnet50
* fix error
* setting timeout properity for some ut
* first pr for new ut timeout as 15S
* refine jit.save/load to add support for other method, not only forward
* refine the code based on unit tests
* Add unit test for the code
* Add unit test for the code
* Modify the code according to the unit test
* Delete useless comments, save only one info file, etc.
* remove static_mode_white_list.pyc
* edit the code that generate 'extra_var_info'
* fp16 result ok
* change -DWITH_NVINFER_PLUGIN toconfig.EnableTensorRtOSS
* auto detect special slice op converter for ernie with trt oss
* ernie oss only support fp16
* fix special_slice_plugin serialize bug
* matmul in tensorrt ok
* ernie unittest ok
* add matmul tensorrt unittest
* remove demo code
This PR is follow up of #28213. On that PR we tried to decrease GPU usage, however the CI still randomly failed. So I added retry logic for the initialization of nccl and cusolver. If the initialization failed, we can retry to avoid the random failure.
* add + - * / @ [] operator to ComplexVariable, also add unittest
* fix circular reference bug
* fit for py2.7
* remove reverse oprators which not supported now
* Join break cond with while cond
* remove usless code
* refine the if code
* Split into BreakTransfromOptimizer
* add BreakTransformOptimizer in ast_transformer
* add more comment
* Release 2.0rc cherry pick api rename #28108 (#28184)
* rename count_include_pad-->exclusive return_indices-->return_mask
* remove track_running_stats
* fix typo.
* rename xxxd-->xxxxD
* solve conflicts
* 2.0rc api add all any (#28199)
* reduce trt warning message (#28011)
add paddle.enable_static() on sample code
alias recude_all-->all, reduce_any-->any
add import reduce_all and reduce_any in python/paddle/tensor/math.py
import all and any in python/paddle/tensor/__init__.py
remove all and any OP in python/paddle/tensor/logic.py, add all and any OP in python/paddle/tensor/math.py
fix import error
remove TestAllAPI temporary
* fix doc of recdue_all and reduce_any, test=document_fix
* fix typo
* fix unittest for all and any API
Co-authored-by: Pei Yang <peiyang@baidu.com>
* rename conv_transposeXd-->convXd_transpose (#28198)
* fix sample code of reduce_all and reduce_any
Co-authored-by: Pei Yang <peiyang@baidu.com>
Recently, test_parallel_executor_test_while_train randomly failed on CI. On all CI logs, it showed NCCL initialization failed or cusolver initialization failed. I found online that those failure is usually caused by GPU shortage. Those API calls CUDA APIs directly so it shouldn't be the problem of allocator. It may be somewhere in PaddlePaddle increases GPU usage.
However, I run this test for 1000 times on my machine and the CI machine, either of them can reproduce the random failure. Maybe there is something related to the environment only happened in test env.
To verify my assumption that somewhere in PaddlePaddle increases GPU usage and also fix this CI, I decreased the batch_size to see whether the random failure disappears in test env.
* fix strided_slice_op's GetExpectedKernelType when input tensor is at CUDAPinnedPlace
* add unittest for tensors in cuda pinned place
* skip test for cuda pinned place on cpu machines
* fix bug of fetch_async_op_handle
* revert some changes of test_buffer_shared_memory_reuse_pass
* revert some changes of test_buffer_shared_memory_reuse_pass
* transfer from paddle.fluid.layers.assign() into creation.py,test=develop
* fix ut fail,add support for paddle.assign,test=develop
* fix,test=develop
* fix UT coverage,test=coverage
* fix UT fail,test=coverage
* fix doc,test=develop
* Still has bugs.
* Fixed allclose_op bug, which cannot deal with some cases of fp64 inputs.
* improved CUDA kernel performance.
* Changed CUDA code.
* Fixed a bug in cuda kernel which cannot deal with large dimension input, and added an unittest for it.
* Add a test case for float32 input.
* fix multinomial doc
* fix multinomial error message
* little doc change
* fix Categorical class doc
* optimize format of error message
* fix CPU Kernel error message format
* fix isinf and isnan error in WindowsOPENBLAS CI
* delete inf and nan
* add manual_seed in sample code
* little error message change
* change error message to InvalidArgument
* add full point for error message and add manual_seed in CPU environment
* Add truncated_gaussian_random_op XPU kernel
* Add truncated_gaussian_random_op XPU kernel, test=kunlun
* little change, test=kunlun
* change boost_get to BOOST_GET_CONST
* change boost_get to BOOST_GET_CONST, test=kunlun
* little change, test=kunlun
* use Generator to generate random number and optimize format, test=kunlun
* little change, test=kunlun
* add TODO, test=kunlun
* Add gaussian_random XPU kernels
* commit kunlun, test=kunlun
* new version, test=kunlun
* change boost_get to BOOST_GET_CONST, test=kunlun
* use Generator to generate random number and optimize format, test=kunlun
* add TODO, test=kunlun
* support uniform_random op on Baidu Kunlun
* change dtype of attr shape from int to int64_t
* kunlun ci, test=kunlun
* new version, test=kunlun
* change boost_get to BOOST_GET_CONST
* change boost_get to BOOST_GET_CONST, test=kunlun
* use Generator to generate random number and optimize format
* run Kunlun CI, test=kunlun
* add TODO, test=kunlun
* Incorporate cudnn_lstm into LSTM api.
test=develop
* Make coalesce_tensor support alignment optionally.
test=develop
* Reorganize RNN apis. test=develop
* Fix cudnn rnn layout conversion.
test=develop
* Add sequence_length support for RNN cudnn implement.
Add optional init_h and init_c gradient for cudnn_lstm_op.
test=develop
* Use create_parameter for rnn cudnn impl.
test=develop
* Move `self._flat_weight = self.create_parameter()` in RNNBase to main_program.
test=develop
* Update RNN api unittest to use set_device.
test=develop
* Fix set_place for unit tests of RNN apis.
test=develop
* Fix use_align in coalesce_tensor_op.
test=develop
* Adjust RNN apis arguments according to comments.
test=develop
* Polish documents for SimpleRNN apis.
test=develop
* Refine random seed in cudnn_lstm_op.
Expose rnn params from sublayers to RNN.
test=develop
* Fix RNN saving for jit.save.
Refine cudnn_lstm dropout behavior.
test=develop
* Fix doc of GRU. test=develop
* Use ShareDataWith to avoid copying for cudnn_lstm_op test.
test=develop
* Remove updates on cudnn_lstm temporarily.
test=develop
* Use ShareDataWith to avoid copying for cudnn_lstm_op test.
test=develop
* Refine random seed in cudnn_lstm_op.
test=develop
* Fix test_lstm by adjust ConcreteProgram buffer getter.
test=develop
* Use create_parameter instead of create_var for rnn._flat_weight for static graph usage.
test=develop
* Remove W input for cudnn_lstm to pass unused_var_check.
test=develop
* Add test_predict for RNN unit tests coverage.
test=develop
* Fix code style of rnn.
test=develop
* Fix F.rnn usage in rnn.py.
test=develop