* add conj op for complex types
* add conj for complex types
* add more test case
* add conj_op test
* modify conj api and impl
* add complex type for fill_constant_op xpu
* add setConstant for complex type
* remove complex conj test file
* user define grad for test_conj_op
* add test case for static mode of conj api
* modify conj doc
* change input args name to x
* remove useless codes
* conj support real types
* add conj test case for real number
* add complex real op & api & unittest
* add imag op & api & unittest
* refactor op impl
* revert simplify writing due to complile failed
* polish details
* polish grad op code
* fix expand && concat/transpose to new api
* update xpu_header
* update activation op on kunlun
* update activation op on kunlun
* update activation op on kunlun
* update activation op on kunlun
* update activation op on kunlun
* add nearest_interp on kunlun
* update error message
* added UT should not exceed 15s
* fix error
* UT limit of 15s is the first to be executed
* fix error
* fix error with CI_SKIP_CPP_TEST
* modfied tiemout setting
* fix error
1. Fix error in _build_cond_stmt of for-range stmts.
2. Support that step value is negative in for-range stmts
3. Fix code because of the diff between Py2 and Py3
Fix 3 Windows Unittests
test_fuse_all_reduce_pass: Paddle cannot run multiple-GPU on Windows so set single visible GPU flag
test_feed_data_check_shape_type: Paddle cannot run multiple-GPU on Windows so set single visible GPU flag
test_tsm: Winodws GPU size is not enough so decrease batch size and data size.
* Fix a bug when running on an operating system without "bash."
* add execution condition
* for ci-coverage
* get cpu information to check the precision problem
* Update compilation environment for musl version
* update dependencies
* remove test code
check cpu info
remove test code
review
* update alpine and third_party denpendencies
* add newline for ci Code format
* Fix api docs in RNN, Transformer, layer_norm, WeightNormParamAttr.
test=develop
* Fix api doc for print in label_smooth.
test=develop
* Update api docs according to review comments.
Add name argument in RNN back.
test=develop
* add complex64 and complex128 type; add +-*/@ and slice opreator for complex types
* add test cases for complex elementwise, matmul and getitem unittest
* add test cases for complex types
* add test cases for complex matmul unittest
* kron, reshape, transpose support complex types
* sum and trace op support complex types
* add test case of sum and trace op
* fix the bug of imag part of complex not initialized
* format file
* format code style
* kron support type promotion; modify test cases
* basic impl of type promote
* add comment & another testcase
* fix complex bugs & support python op promote type
* fix failed unittests & polish code
* add unittest for coverage
* change to only promote complex type
* polish code details
* polish several comments
Usage scenarios:A function could have run successfully in static mode, you can use it to decorate a function in the following cases:
1. An unknown error occurs in the dynamic-to-static conversion process of the function;
2. In the internal implementation of the function, it has two branches: dynamic branch and static branch;
3. Users don't want to convert the function in the process of dynamic to static.
* add the weight decay func for the momentum op
* Add the multi_precision function in Momentum Optimizer.
* Make sure that the initial value of master weights are same with the fp16 weights.
* add static loss scaling.
* add the rescale_grad function in the pure fp16 training.
* use the original momentum updating method.
* Polish some codes, such as variable names.
* add docstring for apis.
* update the var creation details of _create_master_weight.
* not modify codes about imperative momentum updating.
* Fix the error of test_dist_sparse_tensor_load_momentum UT.
* add unit test for multi precision fp16 training.
* add more unit tests for CI.
* Use lower threshold values for allclose comparing in test_multi_precision_fp16_train UT.
* For CI Coverage Checking.
* add fp16 for layer_norm op
* revert layernorm api
* fix forward
* fix forward
* fix backward for layernorm with fp16
* fix unit test for layernorm with fp16
* fix with_mkldnn compile error for layernorm with fp16
* 1. revert to PADDLE_ENFORCE_NOT_NULL, 2. change static_cast<float> to static_cast<U>
* fix with_mkldnn compile error for layernorm with fp16
* fix with_mkldnn compile error for layernorm with fp16
Co-authored-by: zhiqiu <chenqiuliang@baidu.com>
This PR fixes several problems in dy2stat for Deoldify model in PaddleGan.
In model, software engineer wrote if x.shape == y.shape, the Tenser shape is a tuple in dygraph so the == returns True/False, but in static graph the == becomes element-wise comparison, which is a different behavior. In this PR we reduce the element-wise comparison result.
If software engineer write computations which uses parameters in hooks, the static graph can loss the parameter variable because we put param_guard at forward of a Layer. In this PR we made param_guard cover pre-hook and post-hook.
In PaddleGan, software engineer calculated some parameter values in __init__ by running some dygraph code. Those code also run during dy2stat. So some variables may be assign as a VarBase (Tensor) first and then Variable, which raised an error. We fixed the bug in this PR by handling the case.
TODO: We just added testcase for the 1. shape comparison. Should add test case for 2. and 3. But since we are chasing 2.0RC, I will do it in the near future PR
* add complex64 and complex128 type; add +-*/@ and slice opreator for complex types
* add test cases for complex elementwise, matmul and getitem unittest
* add test cases for complex types
* add test cases for complex matmul unittest
* The leaf tensor concept is exposed and the gradient accumulation of leaf tensor
* The leaf tensor concept is exposed and the gradient accumulation of leaf tensor
* fix coverage
* fix api doc
* fix CI unittest
* fix CI unittest
* fix unitest
* empty tensor does’t need inner_var_
* fix some error message
* Add a class TensorInplaceVersion to count the inplace version and put it in framework::Tensor instead of Allocation or Variable.
* Add a new attribute `_inplace_version` for VarBase.
* Raise exception if an inplace operation can result in incorrect gradient computation.
* Add a new interface _bump_inplace_version() for VarBase to bump the version whenever the Tensor is modified through an inplace operation.
* For api assign, call _bump_inplace_version() when it's an inplace operation inn dynamic mode.
* Use original var_wrapper if the inplace_version is not changed.
* Replace SnapshotVarWrapperList with SnapshotVarWrapper to optimize performane.
* Changed a variable name error
* Add comments
* Move member functions of TranslatedLayer out of function
* edit code according to review
* Edit input argument of '_run_static_graph'
* reset due to Segmentation fault
* rename variables when stitching graph
* modify code according CI
* Add comments to '__i_m_p_l__'
* remove blanks befor 'Get...'
* edit code according to review
* Add a comment to '_execution_method_creator'
* Edit a comment to '_execution_method_creator'
* Generate code coverage reports only for incremental files, test=develop
* Generate code coverage reports only for incremental files, test=develop
* Generate code coverage reports only for incremental files, test=develop
* test for diff python file, test=develop
* fix no python diff report, test=develop
* add cc test file, test=develop
* fix bug in generic.cmake, test=develop
* for debug no cc report, test=develp
* modify compire branch form test_pr to test, test=develop
* fix bug, test=develop
* test for h file changed, test=develop
* debug for redefinition of argument optimize error, test=develop
* close -o3 for test, test=develop
* remove -o3 for test, test=develop
* remove coverage option for nvcc, test=develop
* use CMAKE_CXX_FLAGS open coverage option when header file changed, test=develop
* reopen -o3, test=develop
* remove debug code, test=develop
* remove unused code, test=develop
test_mnist failed on CUDA11. We found that it is due to PaddleInference IR Optimization after debugging. We disable it in this PR and we will re-enable it after PaddleInference fixes it.
GridGenerator model failed because the output shape of `linspace` is (-1). The reason is that C++ InferShape fixes the shape to (-1):
5da3d514eb/paddle/fluid/operators/linspace_op.cc (L49)
We cannot set the shape in C++ infer shape because this Tensor may not be initialized during compile time, but when input `num` of `linspace` is an integer, we know the shape at compiler time. This PR simply set the shape in Python and add GridGenerator as unittest.
* add reducer
* refine envent for memorycopy
* add concat&split for allreduce
* apply concat & split for fuse tensor
* fix nccl dep
* fix the untest, compile problem and ddp initialize problem
* fix untest for mac & add some comments & solve the repeated param in sublayers
* fix untest for windows & fix document
* add lars to fleet meta optimizer
* add lamb to proto
* add lamb to fleet meta optimizer
* fixed syntax bug
* fixed syntax bug
* fixed syntax error in lamb, add config setter of lamb in distributed_strategy
* trigger unitest to rerun
* add new unitest func for lamb
* revise unitest for lars and lamb
* revise dgc meta unitest
* revise lars document in distribute_strategy
* revise lars lamb document in distributed_strategy.py
* revise lars lamb document in distributed_strategy.py
* add weight decay exclude logic to lars
* restore optimzier.py
* restore optimizer.py as develop except lars
* add epsilon and exclude fn to distributed_sttrategy
* add lars epsilon
* revise unitest for fleet lars and lamb
* revise lars lamb unitest for CI coverage
* revise lars argument api
* revise lars argument api
* revise lars argument api
* revise api doc of lars
* fix op role
* add sharding save and add_sync_comm_for_test function
* add comm_analyse to utlis
* revise sharding_utils
* add sharding saving unittest
* revise sharding utils for unittest
* revise sharding en doc
* update sharding utils api
* add doc for sharding
* fixed bug in sharding var size count
* update varsize count in sharding
* fix sharding num_nccl_comm
* Revert "fix sharding num_nccl_comm"
This reverts commit d51587c15e9323acf226ddd36154275f0d1daf76.
* add static_only for static api
* addd static_only for class init
* remove static_only for default_main_program
* remove creater_parameter & startup_program
* remove failed apis
* revert py_func import
* remove global scope
* remove some api
* remove cuda pinned place
* add hapi api flops
* fix bug
* fix some bug
* add unit test
* fix unit test
* solve ci coverage
* fix doc
* fix doc
* fix static flops
* delete the comment
* fix some grammar problem in doc
* fix some bug
* fix some doc
* fix some doc
* Rename variables when use 'jit.load'
* Check whether the original graph contains the variable with the same name
* add comment
* rename output/input of op and edit unittest
* modify the code according to CI
* edit code according to CI
* edit code according to CI
* edit code according to CI
* edit code according to CI
* edit code according to CI
* edit code according to CI
* rewrite the sigmoid_focal_loss code example. test=develop
* fix spelling mistake in comments of code example.test=develop
* change print([.*].numpy()) to print([.*]) in example codes of sigmoid_focal_loss. test=document_fix
* save one name in cross_entropy and softmax_cross_entropy, test=develop
* change used function in CrossEntropy from softmax_cross_entropy to cross_entropy, test=develop
* Impelement 2.0 API version Conv2d and Linear layer quantization in imperative mode.
* use cudnn softmax in static Lenet
* Modified ChannelwiseQAT Unittest for 2.0 API.
* For CI python coverage.
* fix some docs test=develop;test=document_fix
* add code example test=develop;test=document_fix
* fix code example test=develop;test=document_fix
* fix code example test=develop;test=document_fix
* fix code example test=develop;test=document_fix
1) The operands are executed sequentially according to the running logic of Python.
2) If the left hand operand is True(for convert_logical_or)/False(for convert_logical_and), the right hand operand should be executed.
* fix eng doc, test=develop
* add import deprecated for layers, test=develop
* add block line for doc generate, test=develop
* remove todo for create_variable, test=develop
* add blank line for doc generate, test=develop
* add blank line for doc generate, test=develop