* remove remove_unsupport_dtype
* remove remove_unsupport_dtype
* remove test dtype
* add more include
* change dtype.h's enum as enum class to avoid conflict with inference lib
* make enum as enum class
* remove additional test
* merge develop
* polish code
* add simple attr support and test
* add int, float attr support
* support other attribute
* add custom attrs test in cmake
* polish details
* fix test failed
* add backward test
* update test flags
* add group norm plugin
* fix compile problems
* move concat axis check to trt op teller
* add nbDims for scale and bias nv dims
* add group norm unit test
* fix unittest
* add trt version restriction for group norm op teller
* fix unittest
* fix entry
* fix distributed lookup table fuse case
* fix entry bug at first time
* move entry from paddle.fluid -> paddle.distributed
* fix ut with paddle.enable_static()
Co-authored-by: malin10 <malin10@baidu.com>
* add error msg when dtypes of operator are not same
* add error msg when dtypes of operator are not same
* change error msg to warning msg when dtypes of operator are not same
* modify test case to fit for python2
* add default argument for paddle.save/static.save
* edit documentation of
* Add comments for special processing for protocol=2 and protocol=3.
* Update python/paddle/fluid/io.py
Co-authored-by: lanxianghit <47554610+lanxianghit@users.noreply.github.com>
Co-authored-by: lanxianghit <47554610+lanxianghit@users.noreply.github.com>
**Problem**
In our old shape transformer logic, if user write:
```
s = tensor.shape
...
y = paddle.some_api(s)
```
Dy2stat will change it to
```
...
y = paddle.some_api(convert_var_shape(tensor))
```
However it will cause fatal bug if user changes the shape of `x` after assign. For example:
```
s = tensor.shape
...
tensor = paddle.some_change_shape_api(tensor)
...
y = paddle.some_api(s)
```
Then the Dy2stat will get wrong result because the code is translated into:
```
tensor = paddle.some_change_shape_api(tensor)
...
y = paddle.some_api(convert_var_shape(tensor)) # tensor shape has been changed, not origin `s` value
```
**Solution Logic**
It can not be solved in the old logic, so I refactoring tensor_shape_transformer logic. Now we will use `s` to store shape attribute and generate a var `s__STATIC_CONVERT_VAR_SHAPE_SUFFIX` to store static shape API `shape(tensor)`
```
s = tensor.shape
...
y = paddle.some_api(s)
```
Dy2stat will change it to
```
s = tensor.shape
s__STATIC_CONVERT_VAR_SHAPE_SUFFIX = shape(tensor)
...
y = paddle.some_api(choose_shape_attr_or_api(s, s__STATIC_CONVERT_VAR_SHAPE_SUFFIX ))
```
In this case, the code is consistent with origin dygraph meaning and it fixed the change after assign bug.
**Code Key Note**
To help reviewers, the key change of this PR is changing `self.name_to_var_shape` from "mapping name to shape node" to "mapping name to its STATIC_CONVERT_VAR_SHAPE_SUFFIX name", then if a variable name has the SUFFIX, we can choose to use attribute shape or shape api. Other changes go with the key change.
**Consideration**
The issue of this PR is that we store extra static `shape` API result, will it harms the speed of Dy2stat? In some cases it will, but we argue that the benefit would be greater than the cost.
1. The extra calling to static `shape` API will happen when coder assign among shape variables. Take the following dygraph code as an instance:
```
s1 = tensor.shape
s2 = s1
s3 = s2
...
```
Then we called extra static `shape` APIs again and again, however users seldom write code like this.
2. If the shape variable is used a lot, for example:
```
s = tensor.shape
y1 = paddle.some_api1(s)
y2 = paddle.some_api2(s)
y3 = paddle.some_api3(s)
```
Our old logic will create 3 shape APIs but now just 1. This is more common user code pattern. In fact, if reviewers take a look at the current unit test in this PR, you could see the op numbers decrease after this PR. So we argue that this PR can also improve speed in this code pattern.
* add more dispatch marco
* add more dispatch marco
* add more tests
* revert unneeded change
* add timeout for test dispatch
* add float and complex test
* remove and marco
* [static setitem] support the index step > 1. tensor_a[::3] = value
* [static setitem] support the index step < 0. Eg: tensor_a[::-3] = value
* [static setitem] support the index is Tensor. eg: tensor_a[tensor_3:0:-1] = value
* Add op version.
As the title, when slice_node like 1:3 being passed to idx of convert_var_shape, it will cause syntax error because a function cannot take this as argument. This PR fixed it.
* add more unitest for ABI compatibility
* add more unittest
* refine warning style
* support compile multi custom ops in same time
* fix not import paddle in unittest
* fix typo
* add more unittest
* add comment for details
* Add conv transpose BF16
* Share function GetWeightsTz
* Adjust to review and fix op compatibility
* Add bias to unique handler name
* Remove errors related to paddle enforce
* Add conv2d_transpose to bf16 list and kernel refator
Dy2stat didn't support tuple as iteration variable in the past. This PR added there main cases:
1). Non-enumerate case: for var1, var2 in var|var.numpy() will be re-written as:
for FOR_ITER_TUPLE_PREFIX_x in var | var.numpy():
var1 = FOR_ITER_TUPLE_PREFIX_x[0]
var2 = FOR_ITER_TUPLE_PREFIX_x[1]
2). Enumerate out tuple case: for t in enumerate(var|var.numpy) will be rewritten as:
for FOR_ITER_TUPLE_INDEX_PREFIX_x, FOR_ITER_TUPLE_PREFIX_x in enumerate(var|var.numpy):
t = (FOR_ITER_TUPLE_INDEX_PREFIX_x, FOR_ITER_TUPLE_PREFIX_x)
3). Enumerate inner tuple case: for i, (var1, (var2, va3)) in enumerate(var|var.numpy()) will
be re-written as:
for i, FOR_ITER_TUPLE_PREFIX_x in var | var.numpy():
var1 = FOR_ITER_TUPLE_PREFIX_x[0]
var2 = FOR_ITER_TUPLE_PREFIX_x[1][0]
var3 = FOR_ITER_TUPLE_PREFIX_x[1][1]
* support setup.py to compile custom op
* move file into paddle.utils.cpp_extension
* support python setup.py install
* refine code style
* Enrich code and add unittest
* initial commit: simple demo
* polish copyright format
* add grap op simple demo
* adapt uncertain number of argument
* change trait marco name
* add place & dtype support for add kernel
* add dispath and infershape func
* poish code & add notes
* add dynamic_loader dep for paddle_framework
* add new custom op test dir
* polish impl details
* add unittest for new custom op
* fix failed unittest
* Costum op (#1)
* fix compile error
* wrap framework tensor with LoDTensor
* fix compile error
* fix compile error
* fix compile error
* fix compile error
* fix compile error
* add CustomTensor default constructor
* add size() for CustomTensor
* make size const for CustomTensor
* refactor place related api to circle the concept
* fix compile error
* fix compile error
* fix compile error
* fix compile error
* fix compile error
* fix compile error
* fix compile error
* fix compile error
* fix compile error
* fix compile error
* fix compile error
* fix compile error
* fix compile error
* fix compile error
* fix compile error
* fix compile error
* fix compile error
* fix compile error
* make place const
* make Tensor copy
* debug CustomTensor core
* debug CustomTensor core
* debug CustomTensor core
* debug CustomTensor core
* debug CustomTensor core
* debug CustomTensor core
* debug CustomTensor core
* debug CustomTensor core
* debug CustomTensor core
* debug CustomTensor core
* debug CustomTensor core
* debug CustomTensor core
* debug CustomTensor core
* debug CustomTensor core
* remove additional head of framework
* use back to shared ptr for custom tensor
* use back to shared ptr for custom tensor
* use back to shared ptr for custom tensor
* use back to shared ptr for custom tensor
* use back to shared ptr for custom tensor
* use back to shared ptr for custom tensor
* add gpu test
* merge latest cwh code in
* adjust ut code of custom op
* adjust ut code of custom op
* adjust ut code of custom op
* Remove ShareData from user && Change CustomTensor to Tensor && Support more data type (#2)
* fix compile error
* wrap framework tensor with LoDTensor
* fix compile error
* fix compile error
* fix compile error
* fix compile error
* fix compile error
* add CustomTensor default constructor
* add size() for CustomTensor
* make size const for CustomTensor
* refactor place related api to circle the concept
* fix compile error
* fix compile error
* fix compile error
* fix compile error
* fix compile error
* fix compile error
* fix compile error
* fix compile error
* fix compile error
* fix compile error
* fix compile error
* fix compile error
* fix compile error
* fix compile error
* fix compile error
* fix compile error
* fix compile error
* fix compile error
* make place const
* make Tensor copy
* debug CustomTensor core
* debug CustomTensor core
* debug CustomTensor core
* debug CustomTensor core
* debug CustomTensor core
* debug CustomTensor core
* debug CustomTensor core
* debug CustomTensor core
* debug CustomTensor core
* debug CustomTensor core
* debug CustomTensor core
* debug CustomTensor core
* debug CustomTensor core
* debug CustomTensor core
* remove additional head of framework
* use back to shared ptr for custom tensor
* use back to shared ptr for custom tensor
* use back to shared ptr for custom tensor
* use back to shared ptr for custom tensor
* use back to shared ptr for custom tensor
* use back to shared ptr for custom tensor
* add gpu test
* merge latest cwh code in
* adjust ut code of custom op
* adjust ut code of custom op
* adjust ut code of custom op
* adjust ut code of custom op
* adjust ut code of custom op
* hid share data from and to
* rename CustomTensor to Tensor
* refactor register design & add test
* change op_funtion to op_meta_info
* split op meta info into .h and .cc
* move get methods into friend class
* move OpMetaInfoHelper into framework space
* move CustomTensorUtils into framework space
* change pybind api name
* move PD C API into op meta info
* add register custom op api
* remove inference cmake change
* refactor copy to api && change Reshape to lowercase && support more dtype && add more test (#3)
* fix compile error
* wrap framework tensor with LoDTensor
* fix compile error
* fix compile error
* fix compile error
* fix compile error
* fix compile error
* add CustomTensor default constructor
* add size() for CustomTensor
* make size const for CustomTensor
* refactor place related api to circle the concept
* fix compile error
* fix compile error
* fix compile error
* fix compile error
* fix compile error
* fix compile error
* fix compile error
* fix compile error
* fix compile error
* fix compile error
* fix compile error
* fix compile error
* fix compile error
* fix compile error
* fix compile error
* fix compile error
* fix compile error
* fix compile error
* make place const
* make Tensor copy
* debug CustomTensor core
* debug CustomTensor core
* debug CustomTensor core
* debug CustomTensor core
* debug CustomTensor core
* debug CustomTensor core
* debug CustomTensor core
* debug CustomTensor core
* debug CustomTensor core
* debug CustomTensor core
* debug CustomTensor core
* debug CustomTensor core
* debug CustomTensor core
* debug CustomTensor core
* remove additional head of framework
* use back to shared ptr for custom tensor
* use back to shared ptr for custom tensor
* use back to shared ptr for custom tensor
* use back to shared ptr for custom tensor
* use back to shared ptr for custom tensor
* use back to shared ptr for custom tensor
* add gpu test
* merge latest cwh code in
* adjust ut code of custom op
* adjust ut code of custom op
* adjust ut code of custom op
* adjust ut code of custom op
* adjust ut code of custom op
* hid share data from and to
* rename CustomTensor to Tensor
* support multi dtype
* remove lod, make reshape lowercase, add copy test and refactor copy api
* remove lod, make reshape lowercase, add copy test and refactor copy api
* remove lod, make reshape lowercase, add copy test and refactor copy api
* remove lod, make reshape lowercase, add copy test and refactor copy api
* fix copy to error
* add more test
* add more test
* add more test
* add more test
* add more test
* add more test
* add more test
* add more test
* add more test
* add more test
* add more test
* add more test
* add more test
* add more test
* add more test
* add more test
* polish detail & error message
* polish test details
* Add cast api && Change copy related api to copy_to && add more test (#4)
* fix compile error
* wrap framework tensor with LoDTensor
* fix compile error
* fix compile error
* fix compile error
* fix compile error
* fix compile error
* add CustomTensor default constructor
* add size() for CustomTensor
* make size const for CustomTensor
* refactor place related api to circle the concept
* fix compile error
* fix compile error
* fix compile error
* fix compile error
* fix compile error
* fix compile error
* fix compile error
* fix compile error
* fix compile error
* fix compile error
* fix compile error
* fix compile error
* fix compile error
* fix compile error
* fix compile error
* fix compile error
* fix compile error
* fix compile error
* make place const
* make Tensor copy
* debug CustomTensor core
* debug CustomTensor core
* debug CustomTensor core
* debug CustomTensor core
* debug CustomTensor core
* debug CustomTensor core
* debug CustomTensor core
* debug CustomTensor core
* debug CustomTensor core
* debug CustomTensor core
* debug CustomTensor core
* debug CustomTensor core
* debug CustomTensor core
* debug CustomTensor core
* remove additional head of framework
* use back to shared ptr for custom tensor
* use back to shared ptr for custom tensor
* use back to shared ptr for custom tensor
* use back to shared ptr for custom tensor
* use back to shared ptr for custom tensor
* use back to shared ptr for custom tensor
* add gpu test
* merge latest cwh code in
* adjust ut code of custom op
* adjust ut code of custom op
* adjust ut code of custom op
* adjust ut code of custom op
* adjust ut code of custom op
* hid share data from and to
* rename CustomTensor to Tensor
* support multi dtype
* remove lod, make reshape lowercase, add copy test and refactor copy api
* remove lod, make reshape lowercase, add copy test and refactor copy api
* remove lod, make reshape lowercase, add copy test and refactor copy api
* remove lod, make reshape lowercase, add copy test and refactor copy api
* fix copy to error
* add more test
* add more test
* add more test
* add more test
* add more test
* add more test
* add more test
* add more test
* add more test
* add more test
* add more test
* add more test
* add more test
* add more test
* add more test
* add more test
* add type cast
* add cast and make copy to api
* add cast and make copy to api
* add cast and make copy to api
* add cast and make copy to api
* merge cwh code
* merge cwh code
* merge cwh code
* merge cwh code
* merge cwh code
* add more error log
* add more error log
* polish code
* used for test
* remove test comment
* remove test comment
* fix uint8 type error
* fix lost uint8 type error
* add test for coverage
* polish details by reviewer comments
* add prefix for DISABLE_COPY_AND_ASSIGN
Co-authored-by: Jiabin Yang <360788950@qq.com>
* support xpu inference with analysis predictor, test=develop
* merge the cmake of the xpu toolchain, test=develop
* add c-apis, test=develop
* fix a bug in extern_xpu, test=develop
* support setup.py to compile custom op
* move file into paddle.utils.cpp_extension
* support python setup.py install
* refine code style
* Enrich code and add unittest
* Polish code and api doc
* fix cpp_extension not include in package
* fix relative import
* fix os.makedirs exist_ok param compatibility PY2
* add compile flags in test_jit_load
* rewrite abs op
* rewrite abs op and remove abs in activation
* remove abs register in old codes
* fix abs_grad type error
* fix abs double_grad output name error
* modify abs_grad, abs_grad_grad functor for windows building
* format code style
* fix the bug of result is nan when the divisor is zero
* add missing abs attr and add abs for float16
* Avoid bug on 'MAC python3.5/6'.
* Choose the saving method according to the OS.
* smaller length of '_unpack_saved_dict' for MAC OS.
* add version information of Python.
* Edit comment.
* add view strategy on squeeze,unsqueeze,reshape,flatten
* add squeeze unittest
* add unittests
* use View strategy as name rather than Reuse Allacation
* fix view api doc
* fix format
* use core.ops when input of reshape2 is Tensor
* fix test_cross_entropy_loss error because of reshape2
* fix test_cross_entropy_loss error because of reshape2
* add inplace strategy
* add elementwise_add sub
* let backward op not use inplace
* grad op do not use inplace
* fix memory increase error and add leaf error message
* delete selected_rows
* change op_function
* little change
* solve HandleViewBetweenInputAndOutput
* add unittest and leaf error message
* merge view error
* optimize op_function_generator format and support sum inplace op
* fix format of basic_engine
* fix format for framework
* little change of variable wrapper
* add reshape, squeeze, unsqueeze, scatter api
* add relu elu tanh softmax inplace api
* fix test_squeeze_op unittest
* fix test_relu_op unittest
* fix comment problems
* delete sample code of inplace api
* add reference of grad_pending_nodes in basic_engine
* fix unittest name
* add inplace apis into wlist
* fix error message
* add PADDLE_ENFORCE for set grad op twice
* fix head file error