1. Type of index: int, slice(step must be 1).
2. Type of value:
(1) int32, int64, float32, bool;
(2) numpy.array(int32, int64, float32, bool);<Note: float64 is not supported>
(3) paddle.Tensor(int32, int64, float32, float64, bool);
* add heter box
* add trainer, worker, wrapper...
* format
* for ci
* format
* remove boost get
* boost & copyright
* rename
* rename
* format
* format
* format
Co-authored-by: yaoxuefeng6 <yaoxuefeng@baidu.com>
* add conj op for complex types
* add conj for complex types
* add more test case
* add conj_op test
* modify conj api and impl
* add complex type for fill_constant_op xpu
* add setConstant for complex type
* remove complex conj test file
* user define grad for test_conj_op
* add test case for static mode of conj api
* modify conj doc
* change input args name to x
* remove useless codes
* conj support real types
* add conj test case for real number
* delete no need to calculate inputs in dygraph op_test
* delete no need to calculate inputs in dygraph op_test
* modify grad of mul for complex types
* fix the grads of inputs args order not match bug
* Test compilation time with less parallel count, notest, test=windows_ci
* optimize rules of Unity Build, notest, test=windows_ci, test=windows_op
* limit parallel counts used only on GPU, test=develop
* remove limit of argument /m:8 on Windows, test=develop
* add conj op for complex types
* add conj for complex types
* add more test case
* add conj_op test
* modify conj api and impl
* add complex type for fill_constant_op xpu
* add setConstant for complex type
* remove complex conj test file
* user define grad for test_conj_op
* add test case for static mode of conj api
* modify conj doc
* change input args name to x
* remove useless codes
* conj support real types
* add conj test case for real number
Modify CublasHandleHolder from using PADDLE_ENFORCE_CUDA_SUCCESS to PADDLE_RETRY_CUDA_SUCCESS to fix random unittest failure. We checked that the unittest log showed CUDA allocation error at this file, which may due to GPU not enough. We fixed similar failure in the past, so we applied PADDLE_RETRY_CUDA_SUCCESS here.
* add complex real op & api & unittest
* add imag op & api & unittest
* refactor op impl
* revert simplify writing due to complile failed
* polish details
* polish grad op code
* fix expand && concat/transpose to new api
* update xpu_header
* update activation op on kunlun
* update activation op on kunlun
* update activation op on kunlun
* update activation op on kunlun
* update activation op on kunlun
* add nearest_interp on kunlun
* update error message
* Add the strategy of skipping cc/cu test compilation and execution in CI, test=develop
* fix if error with CI_SKIP_TEST, test=develop
* fix add properties to test error on Linux/MAC, test=develop
* fix set test properties of test_code_generator error, test=develop
* remove test codes and advance judgment of file modification on Linux, test=develop
* rename CI_SKIP_TEST to CI_SKIP_CPP_TEST, test=document_fix
* Add branch judgement on Linux, test=develop
* Compiling operator libraries with Unity Build on Windows CPU.
* Compiling operator libraries with Unity Build on Windows GPU, no_test, test=windows_ci
* Add option in windows ci script, no_test, test=windows_ci
* Optimize parallel compiling, test=develop
* remove limit of parallel compile and skip some ops in UB, test=develop
* remove changes of header file, test=develop
* remove changes of header file, test=develop
* fix test_eye_op unittest failed, test=develop
* Compiling operator libraries with Unity Build on Linux, test=develop
* set default WITH_UNITY_BUILD=OFF, test=develop
* Move unity build rules into a single file and add comment, test=develop
* optimize parallel compilation, test=develop
* fix undefined reference error on coverage ci, test=develop
* add complex64 and complex128 type; add +-*/@ and slice opreator for complex types
* add test cases for complex elementwise, matmul and getitem unittest
* add test cases for complex types
* add test cases for complex matmul unittest
* kron, reshape, transpose support complex types
* sum and trace op support complex types
* add test case of sum and trace op
* fix the bug of imag part of complex not initialized
* format file
* format code style
* kron support type promotion; modify test cases
* basic impl of type promote
* add comment & another testcase
* fix complex bugs & support python op promote type
* fix failed unittests & polish code
* add unittest for coverage
* change to only promote complex type
* polish code details
* polish several comments
* add the weight decay func for the momentum op
* Add the multi_precision function in Momentum Optimizer.
* Make sure that the initial value of master weights are same with the fp16 weights.
* add static loss scaling.
* add the rescale_grad function in the pure fp16 training.
* use the original momentum updating method.
* Polish some codes, such as variable names.
* add docstring for apis.
* update the var creation details of _create_master_weight.
* not modify codes about imperative momentum updating.
* Fix the error of test_dist_sparse_tensor_load_momentum UT.
* add unit test for multi precision fp16 training.
* add more unit tests for CI.
* Use lower threshold values for allclose comparing in test_multi_precision_fp16_train UT.
* For CI Coverage Checking.
* add fp16 for layer_norm op
* revert layernorm api
* fix forward
* fix forward
* fix backward for layernorm with fp16
* fix unit test for layernorm with fp16
* fix with_mkldnn compile error for layernorm with fp16
* 1. revert to PADDLE_ENFORCE_NOT_NULL, 2. change static_cast<float> to static_cast<U>
* fix with_mkldnn compile error for layernorm with fp16
* fix with_mkldnn compile error for layernorm with fp16
Co-authored-by: zhiqiu <chenqiuliang@baidu.com>
* add complex64 and complex128 type; add +-*/@ and slice opreator for complex types
* add test cases for complex elementwise, matmul and getitem unittest
* add test cases for complex types
* add test cases for complex matmul unittest
* The leaf tensor concept is exposed and the gradient accumulation of leaf tensor
* The leaf tensor concept is exposed and the gradient accumulation of leaf tensor
* fix coverage
* fix api doc
* fix CI unittest
* fix CI unittest
* fix unitest
* empty tensor does’t need inner_var_
* fix some error message
* Add a class TensorInplaceVersion to count the inplace version and put it in framework::Tensor instead of Allocation or Variable.
* Add a new attribute `_inplace_version` for VarBase.
* Raise exception if an inplace operation can result in incorrect gradient computation.
* Add a new interface _bump_inplace_version() for VarBase to bump the version whenever the Tensor is modified through an inplace operation.
* For api assign, call _bump_inplace_version() when it's an inplace operation inn dynamic mode.
* Use original var_wrapper if the inplace_version is not changed.
* Replace SnapshotVarWrapperList with SnapshotVarWrapper to optimize performane.
* add reducer
* refine envent for memorycopy
* add concat&split for allreduce
* apply concat & split for fuse tensor
* fix nccl dep
* fix the untest, compile problem and ddp initialize problem
* fix untest for mac & add some comments & solve the repeated param in sublayers
* fix untest for windows & fix document