* add more unitest for ABI compatibility
* add more unittest
* refine warning style
* support compile multi custom ops in same time
* fix not import paddle in unittest
* fix typo
* add more unittest
* add comment for details
* Add conv transpose BF16
* Share function GetWeightsTz
* Adjust to review and fix op compatibility
* Add bias to unique handler name
* Remove errors related to paddle enforce
* Add conv2d_transpose to bf16 list and kernel refator
Dy2stat didn't support tuple as iteration variable in the past. This PR added there main cases:
1). Non-enumerate case: for var1, var2 in var|var.numpy() will be re-written as:
for FOR_ITER_TUPLE_PREFIX_x in var | var.numpy():
var1 = FOR_ITER_TUPLE_PREFIX_x[0]
var2 = FOR_ITER_TUPLE_PREFIX_x[1]
2). Enumerate out tuple case: for t in enumerate(var|var.numpy) will be rewritten as:
for FOR_ITER_TUPLE_INDEX_PREFIX_x, FOR_ITER_TUPLE_PREFIX_x in enumerate(var|var.numpy):
t = (FOR_ITER_TUPLE_INDEX_PREFIX_x, FOR_ITER_TUPLE_PREFIX_x)
3). Enumerate inner tuple case: for i, (var1, (var2, va3)) in enumerate(var|var.numpy()) will
be re-written as:
for i, FOR_ITER_TUPLE_PREFIX_x in var | var.numpy():
var1 = FOR_ITER_TUPLE_PREFIX_x[0]
var2 = FOR_ITER_TUPLE_PREFIX_x[1][0]
var3 = FOR_ITER_TUPLE_PREFIX_x[1][1]
* support setup.py to compile custom op
* move file into paddle.utils.cpp_extension
* support python setup.py install
* refine code style
* Enrich code and add unittest
* initial commit: simple demo
* polish copyright format
* add grap op simple demo
* adapt uncertain number of argument
* change trait marco name
* add place & dtype support for add kernel
* add dispath and infershape func
* poish code & add notes
* add dynamic_loader dep for paddle_framework
* add new custom op test dir
* polish impl details
* add unittest for new custom op
* fix failed unittest
* Costum op (#1)
* fix compile error
* wrap framework tensor with LoDTensor
* fix compile error
* fix compile error
* fix compile error
* fix compile error
* fix compile error
* add CustomTensor default constructor
* add size() for CustomTensor
* make size const for CustomTensor
* refactor place related api to circle the concept
* fix compile error
* fix compile error
* fix compile error
* fix compile error
* fix compile error
* fix compile error
* fix compile error
* fix compile error
* fix compile error
* fix compile error
* fix compile error
* fix compile error
* fix compile error
* fix compile error
* fix compile error
* fix compile error
* fix compile error
* fix compile error
* make place const
* make Tensor copy
* debug CustomTensor core
* debug CustomTensor core
* debug CustomTensor core
* debug CustomTensor core
* debug CustomTensor core
* debug CustomTensor core
* debug CustomTensor core
* debug CustomTensor core
* debug CustomTensor core
* debug CustomTensor core
* debug CustomTensor core
* debug CustomTensor core
* debug CustomTensor core
* debug CustomTensor core
* remove additional head of framework
* use back to shared ptr for custom tensor
* use back to shared ptr for custom tensor
* use back to shared ptr for custom tensor
* use back to shared ptr for custom tensor
* use back to shared ptr for custom tensor
* use back to shared ptr for custom tensor
* add gpu test
* merge latest cwh code in
* adjust ut code of custom op
* adjust ut code of custom op
* adjust ut code of custom op
* Remove ShareData from user && Change CustomTensor to Tensor && Support more data type (#2)
* fix compile error
* wrap framework tensor with LoDTensor
* fix compile error
* fix compile error
* fix compile error
* fix compile error
* fix compile error
* add CustomTensor default constructor
* add size() for CustomTensor
* make size const for CustomTensor
* refactor place related api to circle the concept
* fix compile error
* fix compile error
* fix compile error
* fix compile error
* fix compile error
* fix compile error
* fix compile error
* fix compile error
* fix compile error
* fix compile error
* fix compile error
* fix compile error
* fix compile error
* fix compile error
* fix compile error
* fix compile error
* fix compile error
* fix compile error
* make place const
* make Tensor copy
* debug CustomTensor core
* debug CustomTensor core
* debug CustomTensor core
* debug CustomTensor core
* debug CustomTensor core
* debug CustomTensor core
* debug CustomTensor core
* debug CustomTensor core
* debug CustomTensor core
* debug CustomTensor core
* debug CustomTensor core
* debug CustomTensor core
* debug CustomTensor core
* debug CustomTensor core
* remove additional head of framework
* use back to shared ptr for custom tensor
* use back to shared ptr for custom tensor
* use back to shared ptr for custom tensor
* use back to shared ptr for custom tensor
* use back to shared ptr for custom tensor
* use back to shared ptr for custom tensor
* add gpu test
* merge latest cwh code in
* adjust ut code of custom op
* adjust ut code of custom op
* adjust ut code of custom op
* adjust ut code of custom op
* adjust ut code of custom op
* hid share data from and to
* rename CustomTensor to Tensor
* refactor register design & add test
* change op_funtion to op_meta_info
* split op meta info into .h and .cc
* move get methods into friend class
* move OpMetaInfoHelper into framework space
* move CustomTensorUtils into framework space
* change pybind api name
* move PD C API into op meta info
* add register custom op api
* remove inference cmake change
* refactor copy to api && change Reshape to lowercase && support more dtype && add more test (#3)
* fix compile error
* wrap framework tensor with LoDTensor
* fix compile error
* fix compile error
* fix compile error
* fix compile error
* fix compile error
* add CustomTensor default constructor
* add size() for CustomTensor
* make size const for CustomTensor
* refactor place related api to circle the concept
* fix compile error
* fix compile error
* fix compile error
* fix compile error
* fix compile error
* fix compile error
* fix compile error
* fix compile error
* fix compile error
* fix compile error
* fix compile error
* fix compile error
* fix compile error
* fix compile error
* fix compile error
* fix compile error
* fix compile error
* fix compile error
* make place const
* make Tensor copy
* debug CustomTensor core
* debug CustomTensor core
* debug CustomTensor core
* debug CustomTensor core
* debug CustomTensor core
* debug CustomTensor core
* debug CustomTensor core
* debug CustomTensor core
* debug CustomTensor core
* debug CustomTensor core
* debug CustomTensor core
* debug CustomTensor core
* debug CustomTensor core
* debug CustomTensor core
* remove additional head of framework
* use back to shared ptr for custom tensor
* use back to shared ptr for custom tensor
* use back to shared ptr for custom tensor
* use back to shared ptr for custom tensor
* use back to shared ptr for custom tensor
* use back to shared ptr for custom tensor
* add gpu test
* merge latest cwh code in
* adjust ut code of custom op
* adjust ut code of custom op
* adjust ut code of custom op
* adjust ut code of custom op
* adjust ut code of custom op
* hid share data from and to
* rename CustomTensor to Tensor
* support multi dtype
* remove lod, make reshape lowercase, add copy test and refactor copy api
* remove lod, make reshape lowercase, add copy test and refactor copy api
* remove lod, make reshape lowercase, add copy test and refactor copy api
* remove lod, make reshape lowercase, add copy test and refactor copy api
* fix copy to error
* add more test
* add more test
* add more test
* add more test
* add more test
* add more test
* add more test
* add more test
* add more test
* add more test
* add more test
* add more test
* add more test
* add more test
* add more test
* add more test
* polish detail & error message
* polish test details
* Add cast api && Change copy related api to copy_to && add more test (#4)
* fix compile error
* wrap framework tensor with LoDTensor
* fix compile error
* fix compile error
* fix compile error
* fix compile error
* fix compile error
* add CustomTensor default constructor
* add size() for CustomTensor
* make size const for CustomTensor
* refactor place related api to circle the concept
* fix compile error
* fix compile error
* fix compile error
* fix compile error
* fix compile error
* fix compile error
* fix compile error
* fix compile error
* fix compile error
* fix compile error
* fix compile error
* fix compile error
* fix compile error
* fix compile error
* fix compile error
* fix compile error
* fix compile error
* fix compile error
* make place const
* make Tensor copy
* debug CustomTensor core
* debug CustomTensor core
* debug CustomTensor core
* debug CustomTensor core
* debug CustomTensor core
* debug CustomTensor core
* debug CustomTensor core
* debug CustomTensor core
* debug CustomTensor core
* debug CustomTensor core
* debug CustomTensor core
* debug CustomTensor core
* debug CustomTensor core
* debug CustomTensor core
* remove additional head of framework
* use back to shared ptr for custom tensor
* use back to shared ptr for custom tensor
* use back to shared ptr for custom tensor
* use back to shared ptr for custom tensor
* use back to shared ptr for custom tensor
* use back to shared ptr for custom tensor
* add gpu test
* merge latest cwh code in
* adjust ut code of custom op
* adjust ut code of custom op
* adjust ut code of custom op
* adjust ut code of custom op
* adjust ut code of custom op
* hid share data from and to
* rename CustomTensor to Tensor
* support multi dtype
* remove lod, make reshape lowercase, add copy test and refactor copy api
* remove lod, make reshape lowercase, add copy test and refactor copy api
* remove lod, make reshape lowercase, add copy test and refactor copy api
* remove lod, make reshape lowercase, add copy test and refactor copy api
* fix copy to error
* add more test
* add more test
* add more test
* add more test
* add more test
* add more test
* add more test
* add more test
* add more test
* add more test
* add more test
* add more test
* add more test
* add more test
* add more test
* add more test
* add type cast
* add cast and make copy to api
* add cast and make copy to api
* add cast and make copy to api
* add cast and make copy to api
* merge cwh code
* merge cwh code
* merge cwh code
* merge cwh code
* merge cwh code
* add more error log
* add more error log
* polish code
* used for test
* remove test comment
* remove test comment
* fix uint8 type error
* fix lost uint8 type error
* add test for coverage
* polish details by reviewer comments
* add prefix for DISABLE_COPY_AND_ASSIGN
Co-authored-by: Jiabin Yang <360788950@qq.com>
* support xpu inference with analysis predictor, test=develop
* merge the cmake of the xpu toolchain, test=develop
* add c-apis, test=develop
* fix a bug in extern_xpu, test=develop
* support setup.py to compile custom op
* move file into paddle.utils.cpp_extension
* support python setup.py install
* refine code style
* Enrich code and add unittest
* Polish code and api doc
* fix cpp_extension not include in package
* fix relative import
* fix os.makedirs exist_ok param compatibility PY2
* add compile flags in test_jit_load
* rewrite abs op
* rewrite abs op and remove abs in activation
* remove abs register in old codes
* fix abs_grad type error
* fix abs double_grad output name error
* modify abs_grad, abs_grad_grad functor for windows building
* format code style
* fix the bug of result is nan when the divisor is zero
* add missing abs attr and add abs for float16
* Avoid bug on 'MAC python3.5/6'.
* Choose the saving method according to the OS.
* smaller length of '_unpack_saved_dict' for MAC OS.
* add version information of Python.
* Edit comment.
* add view strategy on squeeze,unsqueeze,reshape,flatten
* add squeeze unittest
* add unittests
* use View strategy as name rather than Reuse Allacation
* fix view api doc
* fix format
* use core.ops when input of reshape2 is Tensor
* fix test_cross_entropy_loss error because of reshape2
* fix test_cross_entropy_loss error because of reshape2
* add inplace strategy
* add elementwise_add sub
* let backward op not use inplace
* grad op do not use inplace
* fix memory increase error and add leaf error message
* delete selected_rows
* change op_function
* little change
* solve HandleViewBetweenInputAndOutput
* add unittest and leaf error message
* merge view error
* optimize op_function_generator format and support sum inplace op
* fix format of basic_engine
* fix format for framework
* little change of variable wrapper
* add reshape, squeeze, unsqueeze, scatter api
* add relu elu tanh softmax inplace api
* fix test_squeeze_op unittest
* fix test_relu_op unittest
* fix comment problems
* delete sample code of inplace api
* add reference of grad_pending_nodes in basic_engine
* fix unittest name
* add inplace apis into wlist
* fix error message
* add PADDLE_ENFORCE for set grad op twice
* fix head file error
* set expected place in child thread for dataloader
* set device id when set tensor from numpy
* revert tensor_py change
* add compile guard
* fix ci
* fix bug
* Implemented AddQuantDequantPass in imperative quantization.
* Supported LeakyReLU Quantization
* For meeting coverage rate.
* Changed the file name of test of AddQuantDequant
* Implemented more Quantized NoWeightLayers.
* Fix the loss cannot align problem between static and dynamic model quantization, add swish as supported quantized layer in imperative quantization.
* remove noweight_list
* support 2.0 API such as Pool2D and ReLu
* upgrade oneDNN version to 2.0 master branch
* - Added workarounds for new lib onednn change
* fix regex
Co-authored-by: Jacek Czaja <jacek.czaja@intel.com>
* fix bug of using ignore_index and reduction,test=develop
* fix bug of celoss when using ignore_index and reduction, test=develop
* improve performance when ignore_index=-100, test=develop
* add test in test_cross_entropy_loss.py for coverage rate, test=develop
* rm comment in test_cross_entropy_loss.py, test=develop
* del hard code of "float64" in python/paddle/nn/functional/loss.py, test=develop
* change mask to a more simplified implementation, test=develop
* del comment in python/paddle/nn/functional/loss.py, test=develop
* del hard code and change mask to a more simplified implementation, test=develop
* change mask to a more simplified implementation, test=develop
* change mask to a more simplified implementation, test=develop
* add view strategy on squeeze,unsqueeze,reshape,flatten
* add squeeze unittest
* add unittests
* use View strategy as name rather than Reuse Allacation
* fix view api doc
* fix format
* use core.ops when input of reshape2 is Tensor
* fix test_cross_entropy_loss error because of reshape2
* delete selected_rows
* change op_function
* little change
* solve HandleViewBetweenInputAndOutput
* add cast ops before and after unsupported fp16 ops.
* Keep partial net in FP32 pattern.
* Support check_finite_and_unscale and update_loss_scaling for FP16 calculation mode.
* Add fp16 support for adam op.
* add multi precision attr for adam.
* Fix the bug of test_multi_precision_fp16_train UT.
* Code format for CI.
* Fix the redefine error about MPTypeTrait on windows.
* fix bugs of the _create_accumulators func in Momentum.
* fix bug when inserting post cast op.
* Add the update_loss_scaling op in allow_set of UnusedVarCheck.
* Update for ci coverage.
* Add some doc for OptimizerWithMixedPrecision.
* Fix the code style.
* Imporve the doc of `amp_init`.
* Change for fp16 testing if users have the infer program defined in separate way.
1. When x is Variable, call nn.shape(x) only in following cases:
1)The shape of x is used in control flow condition.
2)The dim to be used is negetive
2. When x is Variable, but x.shape or x.shape[idx] doesn't contain negetive value, don't convert to paddle.shape()
* change to tensor copy sync
* change to tensor copy sync
* make copy_to safe when use TensorCopy
* refine code
* add ut
* add cudapinned garbagecollector
* add testcase: cpu place -> cuda pinned place
1. when slice_item is a slice:
1) the start of __getitem__ should be std::max(start, 0) if slice
2) the start of __getitem__ should be std::min(end, dim)
2. when slice_item is an integer, it should be in [-dim_len, dim_len)
3. Fix error message to use accurate data
* Support storage of large parameters
* Reduce the complexity of the unittest
* Reduce the complexity of the unittest,commented out unittest for
* add unittest for static.save/load
* Increase the timeout threshold of 'test_static_save_load'
* Increase the timeout threshold of 'test_static_save_load'
* Increase the timeout threshold of 'test_static_save_load' and 'test_paddle_save_load'
* Increase the timeout threshold of 'test_static_save_load' and 'test_paddle_save_load'
* dot op support complex types
* matmul support complex types
* add test case
* matmul broadcast gradient support complex
* move conjFunctor to complex_functor.h