WeiXin
404c16763a
Add detailed error message for curandStatus_t, cublasStatus_t, cusolverStatus_t ( #30161 )
4 years ago
Wilber
91a8a25721
enhance error info for py_func ( #30138 )
...
* enhance error info for py_func
* update
4 years ago
weihaoji
b8207af6bc
[XPU] Remove lite_xpu ut lite_resnet50_test since fusion pass changes introduced precision diff. test=develop ( #30122 )
4 years ago
liuyuhui
15fac5e7fa
fix assign_op_xpu concat_op_xpu warining ( #30120 )
4 years ago
Jack Zhou
f5428eca4f
fix enforce msg of sum xpu op ( #30113 )
4 years ago
123malin
198fbdfb60
Add Lookahead and ModelAverage Optimizer ( #30004 )
...
* test=develop, add model_average and lookahead
4 years ago
Leo Chen
adac38c506
add dispenable input for core.ops.reshape2/expand/slice ( #30072 )
...
* add dispenable input 'shape' for core.ops.reshape2
* add dispenable inputs for core.ops.reshape2/expand/slice
* add ut
4 years ago
ShenLiang
becf99d2e8
fix error message ( #30135 )
4 years ago
Zhou Wei
30888ca343
Polish and Optimize the print/repr information of Layer ( #29998 )
...
* Polish and Optimize the print/repr message of all layer
* fix some code format
4 years ago
wangguanzhong
69839f8a9a
fix error message for distribute_fpn_proposals_op ( #30116 )
4 years ago
QingshuChen
8e1c3ddf15
add aarch64 and sunway kunlun lib ( #30027 )
...
* add aarch64 and sunway kunlun lib
* minor
* optimize elementwise_add for kunlun
* update kunlun dependence
* minor
* minor
4 years ago
Shang Zhizhou
05b27695f1
add inference api: DisableTensorRtOps ( #30109 )
...
* snap
* add inference api: DisableTensorRtOPs
* fix code style
* update api to experimental
* update variable name
4 years ago
石晓伟
53bb126510
fix a bug in op_version_registry, test=develop, test=op_version ( #29994 )
4 years ago
xiemoyuan
3e0c492910
Optimize the error message of framework. ( #30134 )
4 years ago
liym27
9922bd4125
Fix bug: In dynamic mode, if start or end is negetive, __getitem__ return wrong result( #30003 )
...
1. when slice_item is a slice:
1) the start of __getitem__ should be std::max(start, 0) if slice
2) the start of __getitem__ should be std::min(end, dim)
2. when slice_item is an integer, it should be in [-dim_len, dim_len)
3. Fix error message to use accurate data
4 years ago
chentianyu03
666e665132
change the kron gradient when complex types ( #29995 )
4 years ago
chentianyu03
a5e422c85d
add trace op_register_version and fix version bug; test=op_version ( #30000 )
...
* add trace op_register_version and fix defaulf bug; test=op_version
* add trace op_register_version; test=op_version
* add trace op_register_version; test=op_version
* add trace op_register_version; test=op_version
* fix missing the template bug of vector; test=op_version
4 years ago
cc
9f34374b48
Fix the formate of raising error in randperm op ( #30108 )
...
* fix the formate of raising error in randperm op
4 years ago
liuyuhui
254ad61959
fix xpu pe sync, test=notest ( #30095 )
4 years ago
Thunderbrook
0b8e1fadc5
add topo-aware in heter-ps ( #30087 )
...
* add topo aware
* resource.h
* topo aware
* format
4 years ago
hong
297fff1a79
support dygraph in xpu place ( #30051 )
...
* support dygraph in xpu place; test=develop
* fix cpu/gpu compile error; test=develop
* fix compile error; test=develop
* fix xpu compile error; testd=develop
4 years ago
wangchaochaohu
d0a5620575
fix the compiler error when gcc4 cuda9.0 ( #29997 )
4 years ago
WangXi
ee16006b5d
Optimization grad merge performance ( #29784 )
4 years ago
yongqiangma
e891f4da1b
Add p_norm op version info ( #30042 )
...
* p_norm fix op version info. test=develop
4 years ago
tangwei12
7d1c149e09
for inference checkpoint ( #30081 )
...
* for inference checkpoint
Change-Id: I36c979240ffa55bf1ef0c9315402960762af6be4
* for inference checkpoint
Change-Id: I82025365d5b792cbea1ead506df685aecc8ac198
4 years ago
tangwei12
7d4bdff07d
fix large scale memory ( #30035 )
...
* memory holder optimize
Change-Id: Ic91af8ac6f2853336d28a9fbbc5e8d0c57b5d05e
* memory holder optimize
Change-Id: I2fd1c14ecc17f5d5ce88b87890381ea801e6367f
* fix large scale memory holder
Change-Id: Ief0992b02b00220e16c72cc637a56e7b5788140f
* fix large scale memory holder
Change-Id: I910142a3952ead643a5604f8f80955f3e6efe655
4 years ago
Shang Zhizhou
08dc5bc27e
fix op version checker of pass bug ( #30028 )
...
* fix op version checker of pass bug
* fix code style
* update pass version
4 years ago
cc
68398abce9
[Inference] zero_copy_tensor supports int8_t ( #30053 )
...
* zero_copy_tensor supports int8_t
4 years ago
whs
1b999d2b5d
Add version checking ( #30040 )
4 years ago
ceci3
85b2f05ab0
register ModifyAttr for instance_norm, test=op_version ( #30065 )
...
* register instance norm, test=op_version
4 years ago
channings
ddcff254db
fix op_register_version for compare ops, test=op_version ( #30007 )
...
Co-authored-by: zhoushunjie <zhoushunjie@baidu.com>
4 years ago
Wilber
66e16b7e99
update lite subgraph. ( #30056 )
4 years ago
GaoWei8
a64822589f
add REGISTER_OP_VERSION for LSTM ( #30038 )
4 years ago
yinhaofeng
6e93fb92f9
Register op version for linspace,test=op_version ( #30025 )
...
* Register op version for linspace,test=op_version
* Register op version for linspace,test=op_version
* Register op version for linspace,test=op_version
* Register op version for linspace,test=op_version
* Register op version for linspace,test=op_version
4 years ago
123malin
d0056c324d
test=develop, add op_register_version for roll_op ( #30023 )
...
* test=develop, add op_register_version for roll_op
4 years ago
chentianyu03
e012930aa3
complex gradient matmul ( #29966 )
...
* dot op support complex types
* matmul support complex types
* add test case
* matmul broadcast gradient support complex
* move conjFunctor to complex_functor.h
4 years ago
ShenLiang
893d37e5c6
Fix rank_attention op_version, test=op_version ( #30006 )
...
* fix rank_attention, test=op_version
4 years ago
Adam Osewski
13aef97043
operator checkpoints for new attributes. ( #29832 )
...
* Add operator checkpoints for new attributes.
* Fix adding subsequent checkpoint to quantize op.
4 years ago
wangguanzhong
844d8e0c2c
add REGISTER_OP_VERSION for generate_proposals, roi_align, roi_pool test=op_version ( #30034 )
4 years ago
cc
c3c064a8fc
Add mkldnn nearest_interp and bilinear_interp op ( #30016 )
...
* Add mkldnn nearest_interp and bilinear_interp op
* don't run mkldnn interpolate in default
* add interpolate_mkldnn_pass
4 years ago
chalsliu
c053bf2a57
Revert "register ModifyAttr for instance_norm, test=op_version ( #29938 )"
4 years ago
wawltor
cc2f94620c
add the support the op version check for matmul, test=op_version ( #30011 )
...
* add the support the op version check for matmul, test=op_version
4 years ago
wawltor
b33aaea86c
add the op version check for the elementwise ops, test=op_version ( #30010 )
...
* add the op version check for the elementwise ops, test=op_version
* add the support check for elementwise_ops, test=op_version
4 years ago
Chengmo
4cbcc9b6da
fix momentum op register ( #29941 )
...
* fix momentum op register
4 years ago
hutuxian
7c1f69bdf0
add op_version for flip op [test=op_version] ( #30019 )
4 years ago
ceci3
77c1684397
register ModifyAttr for instance_norm, test=op_version ( #29938 )
...
* upgrade instance_norm, test=op_version
* fix
4 years ago
Leo Chen
47d10c55d5
Enhance debugging ( #30001 )
...
* add debug code
* add place info
* fix compile problem
* add place for output
4 years ago
FlyingQianMM
d42f93e504
add op_register_version for allclose op; test=op_version ( #29968 )
4 years ago
wawltor
8f49f9d5c9
change the elementwise ops version check, test=op_version
...
change the elementwise ops version check, test=op_version
4 years ago
guofei
b23faf37be
Add moving_average_abs_max_scale op_register_version test=develop ( #29957 )
...
Add moving_average_abs_max_scale op_register_version
4 years ago
Thunderbrook
0ca6de171f
add include ( #29952 )
4 years ago
Pei Yang
6206b9bc71
fix ut:trt_resnext_test, trt_quant_int8_yolov3_r50_test, test_trt_dynamic_shape_ernie, test_trt_dynamic_shape_ernie_fp16_ser_deser, trt_cascade_rcnn_test ( #29977 )
4 years ago
wangxinxin08
be8b5fd18a
register op version for conv2d_transpose, conv3d_transpose and depthwise_conv2d_transpose, test=op_version ( #29937 )
4 years ago
石晓伟
958612231f
compile the denormal.cc on aarch64, test=develop ( #29956 )
4 years ago
Guo Sheng
6ac4f0af6a
Register op version for coalesce_tensor. ( #29940 )
...
test=develop
test=op_version
4 years ago
Chen Weihang
a1d9a14e89
support grad accumulated across batch ( #29942 )
4 years ago
cc
6a0102b038
map matmul/squeeze2+matmul/reshape2+matmul to mul ( #29911 )
...
* map matmul/squeeze2+matmul/reshape2+matmul to mul
4 years ago
Huihuang Zheng
d038746e1c
Fix Unix Sleep for Wrong Time. test=develop ( #29953 )
...
PADDLE_RETRY_CUDA_SUCCESS used wrong sleep time so it can cause timeout in unittest. This PR fixed it.
After we searched the doc in https://pubs.opengroup.org/onlinepubs/7908799/xsh/unistd.h.html , the time unit of sleep in unistd.h takes "seconds", usleep takes "microseconds", Sleep in windows.h takes "milliseconds".
4 years ago
Jack Zhou
5a4e42ca9a
add gru op_register_version; test=op_version; ( #29931 )
...
* add gru op_register_version; test=op_version;
* Update fc,mul version;test=op_version;
4 years ago
Wilber
2b1d796cd0
[Inference] Solve 2.0 trt performance reduce compare 1.8. ( #29925 )
4 years ago
Qi Li
913f77a4b7
Register op version for print, test=op_version ( #29945 )
4 years ago
石晓伟
181ea1870b
flush denormals to zero, test=develop ( #29924 )
...
* flush denormals to zero, test=develop
* add comments, test=develop
4 years ago
cc
7667e59bf7
add op version for fake_quant and fake_dequant ops, test=op_version ( #29923 )
...
* add op version for fake_quant and fake_dequant ops, test=op_version, test=develop
4 years ago
石晓伟
acb5e86363
fix a bug in reset_tensor_array, test=develop ( #29620 )
...
* fix a bug in reset_tensor_array, test=develop
* ci coverage, test=develop
4 years ago
liuyuhui
3d1741b794
[Kunlun] bug fix of PR2: Support MultiDevicePass and BKCL in parallel executor ( #29926 )
4 years ago
Wilber
332da133a1
Support mips arch ( #29903 )
...
* Support MIPS arch.
4 years ago
LielinJiang
eab0b60e16
Register op version for grid_sampler, test=op_version ( #29916 )
...
* register op version for grid_sampler
4 years ago
liym27
9602a182b2
[Dynamic Inplace] Support ShareInplaceVersionCounterWith for C++ Tensor ( #29842 )
...
* Revert "[inplace] Add ShareHolderWith for class Variable and SharePlaceholderWith in VarBase.detach() to share the same Tensor/SelectedRows (#29267 )"
This reverts commit b10ecd9d3a
.
* Support ShareInplaceVersionCounterWith to share the same inplace version counter for VarBase
4 years ago
liuyuhui
4427df37cf
[Kunlun] PR2: Support MultiDevicePass and BKCL in parallel executor ( #29574 )
4 years ago
LielinJiang
0f4b218640
Enable bilateral_slice unittest on windows platform ( #29896 )
...
* enable bilateral_slice unittest on windows platform
* reduce max threads
4 years ago
YUNSHEN XIE
2a01756bf3
remove duplicate ut names ( #29809 )
4 years ago
Chen Weihang
a6072055be
[Complex] Handle complex to real after type promotion ( #29855 )
...
* try to add fwd op input dtypes
* refactor base impl
* return tmp_ins after dygraph prepare data
* fix typo found in debug
* polish comment & add complex net test
* revert detail change
* fix unittest failed
* add complex kernel condition control
* fix xpu test failed & polish comment
* polish details by review comments
4 years ago
Chen Weihang
1a304e6c06
[Complex] Add support for complex grad accumulated ( #29889 )
...
* add support for complex grad accumulated
* add unittest for coverage
* update test dtype
* remove useless blank line
4 years ago
taixiurong
c7acad9f2f
support some shape for matmul and cast in xpu place ( #29900 )
...
* support some shape in matmul and cast
* modify matmul
4 years ago
Leo Chen
6b258317cb
fix TransferInplaceBack ( #29830 )
4 years ago
QingshuChen
59b47f3b32
feat: support check_nan_inf for kunlun/xpu device ( #29694 )
...
* feat: support check_nan_inf for kunlun device
* support kunlun stack
* minor
4 years ago
tangwei12
032414ca2a
[Feature] one ps (3/4) ( #29604 )
...
* oneps (3/4)
Co-authored-by: MrChengmo <cmchengmo@163.com>
Co-authored-by: malin10 <malin10@baidu.com>
Co-authored-by: chengmo <chengmo@baidu.com>
4 years ago
jakpiase
edc06c6a1b
Added fc + activation fuse pass (currently only gelu, sigmoid and tanh are supported) ( #29772 )
4 years ago
Wilber
2c0a4a3470
call_statck is turned on default when ON_INFER=ON ( #29798 )
4 years ago
Wilber
ad0b01ffe2
lod operator should not be reused in memory_optimize pass. ( #29828 )
4 years ago
liym27
97e75ad0f5
[setitem] Support Tensor setitem in static mode ( #29708 )
...
1. Type of index: int, slice(step must be 1).
2. Type of value:
(1) int32, int64, float32, bool;
(2) numpy.array(int32, int64, float32, bool);<Note: float64 is not supported>
(3) paddle.Tensor(int32, int64, float32, float64, bool);
4 years ago
YUNSHEN XIE
24ce051a84
remove duplicate ut reload ( #29810 )
...
* remove duplicate ut reload
* remove duplicate ut define in cmakelist
4 years ago
Jacek Czaja
c9e874fc8e
[oneDNN] Unit test for checking oneDNN caching ( #29606 )
4 years ago
Thunderbrook
09b6e71928
heter box ( #29734 )
...
* add heter box
* add trainer, worker, wrapper...
* format
* for ci
* format
* remove boost get
* boost & copyright
* rename
* rename
* format
* format
* format
Co-authored-by: yaoxuefeng6 <yaoxuefeng@baidu.com>
4 years ago
Jacek Czaja
7b33720c90
[oneDNN] Tensor copy fix to oneDNN tensors ( #29771 )
...
* - Tensor copy fix to oneDNN tensors
* - Fixes after review
4 years ago
123malin
a400b76db7
Roll cuda kernel ( #29655 )
...
* test=develop, optimize roll_op_cuda_kernel
4 years ago
wuhuanzhou
e7ac74c85b
optimize compilation time of argmin/argmax op ( #29595 )
...
* Using VisitDataTypeTiny and put CastOP after ReduceOP, test=develop
* remove changes of reduce_op.h, test=develop
4 years ago
chentianyu03
ddfc3d2c2f
change grad elementwise_mul for complex types ( #29757 )
...
* add conj op for complex types
* add conj for complex types
* add more test case
* add conj_op test
* modify conj api and impl
* add complex type for fill_constant_op xpu
* add setConstant for complex type
* remove complex conj test file
* user define grad for test_conj_op
* add test case for static mode of conj api
* modify conj doc
* change input args name to x
* remove useless codes
* conj support real types
* add conj test case for real number
* delete no need to calculate inputs in dygraph op_test
* delete no need to calculate inputs in dygraph op_test
* modify grad of mul for complex types
* fix the grads of inputs args order not match bug
4 years ago
chentianyu03
2a260d9b0e
change the grad of div when complex types ( #29804 )
...
* change the grad of div when complex types
* fix the grads of inputs args order not match bug
4 years ago
ShenLiang
f65f1caad3
opt sparse allreduce using ncclgather ( #29819 )
4 years ago
TTerror
82aa01c373
add nearest_interp_v2 on kunlun ( #29725 )
...
* add nearest_interp_v2 on kunlun
* add nearest_interp_v2 on kunlun
4 years ago
wangchaochaohu
01c37c8e02
refine the compiler error for half2 operation ( #29816 )
4 years ago
whs
82630408b4
Support double backward rsqrt ( #29589 )
4 years ago
Zhang Ting
b76f5a8489
fix the bug of dropout_grad ( #29813 )
4 years ago
LielinJiang
a94c3cbbf3
register cudnn conv double grad for depthwise conv ( #29807 )
4 years ago
ShenLiang
01e2874a0e
Support multi-stream communication for dynamic graph distributed ( #29525 )
...
* fix fleet for multi-stream
* fix memcpy for ncclid
* use sync to solve move operation
4 years ago
wangchaochaohu
f350aa59ff
Fix the compiler error for half type ( #29799 )
4 years ago
Huihuang Zheng
1cbb282d77
Add Retry Logic to CublasHandlerHolder
...
Add Retry Logic to CublasHandlerHolder to avoid random unittest failure.
4 years ago
LielinJiang
e5af650b71
Add double grad for conv_transpose ( #29706 )
...
* add double grad for conv_transpose
4 years ago
Leo Chen
224f3bcbb1
format code ( #29714 )
4 years ago
LoveAn
2e5b4a216c
Optimize compilation time with Unity Build ( #29733 )
...
* Test compilation time with less parallel count, notest, test=windows_ci
* optimize rules of Unity Build, notest, test=windows_ci, test=windows_op
* limit parallel counts used only on GPU, test=develop
* remove limit of argument /m:8 on Windows, test=develop
4 years ago
Zhang Jun
0c23ba95d8
enable MakeCiper api for inference;test=develop ( #29692 )
4 years ago
wangchaochaohu
7b2dc4e6b1
optimization for fp16 elementwise add ( #29744 )
4 years ago
Jacek Czaja
07790ba13e
[oneDNN] Reimplemented elementwise_add grad ( #29747 )
...
* - Reimplemented elementwise_add grad
- lint
* - fix after review
* - Fix to fix after review
4 years ago
Aurelius84
17c8e3adfe
Polish code in gpu_launch_config.h ( #29730 )
4 years ago
wangchaochaohu
068d905e1e
fix the shape choose of vectorize for cuda
4 years ago
syyxsxx
7c2affaa26
fix isfinite_v2_op OpProtoAndCheckerMaker AddComment bug ( #29626 )
...
fix isfinite_v2_op OpProtoAndCheckerMaker AddComment bug
4 years ago
石晓伟
8bd2879ef7
update the operator registration for incompatible upgrade, test=develop ( #29720 )
4 years ago
chentianyu03
71063b8137
add conj op for complex types ( #29527 )
...
* add conj op for complex types
* add conj for complex types
* add more test case
* add conj_op test
* modify conj api and impl
* add complex type for fill_constant_op xpu
* add setConstant for complex type
* remove complex conj test file
* user define grad for test_conj_op
* add test case for static mode of conj api
* modify conj doc
* change input args name to x
* remove useless codes
* conj support real types
* add conj test case for real number
4 years ago
Wilber
b593d588aa
[Inference] EnableUseGpu has higher priority than flags ( #29697 )
...
* enable_use_gpu has higher priority than FLAGS
* update.
4 years ago
WangXi
9cbcc6cadc
fleet sync build strategy, test=develop ( #29732 )
4 years ago
wanghuancoder
0c59ad2a1a
Windows generate pdb and dump, for debug ( #29628 )
...
* Windows generate pdb and dump, for debug
* fix code style, test=develop
* modify cmakelist
4 years ago
Huihuang Zheng
4c4d4ba5e0
Modify CublasHandleHolder to Fix Random Unittest Failure. test=develop ( #29617 )
...
Modify CublasHandleHolder from using PADDLE_ENFORCE_CUDA_SUCCESS to PADDLE_RETRY_CUDA_SUCCESS to fix random unittest failure. We checked that the unittest log showed CUDA allocation error at this file, which may due to GPU not enough. We fixed similar failure in the past, so we applied PADDLE_RETRY_CUDA_SUCCESS here.
4 years ago
Chen Weihang
6cfa59de1b
[Complex] Add real & imag op and api for complex tensor ( #29672 )
...
* add complex real op & api & unittest
* add imag op & api & unittest
* refactor op impl
* revert simplify writing due to complile failed
* polish details
* polish grad op code
4 years ago
Jacek Czaja
9eff1a674f
Added missing format of oneDNN ( #29670 )
4 years ago
wangchaochaohu
2e0d1ed00f
delete the code for fp16 optimization because it is not faster than common template code ( #29715 )
4 years ago
TTerror
af8ded773a
update activation op on kunlun ( #29577 )
...
* fix expand && concat/transpose to new api
* update xpu_header
* update activation op on kunlun
* update activation op on kunlun
* update activation op on kunlun
* update activation op on kunlun
* update activation op on kunlun
* add nearest_interp on kunlun
* update error message
4 years ago
ceci3
cc387159f3
add pad and concat double grad ( #29549 )
...
* add constant pad double grad
4 years ago
liuyuhui
f13c3a9cd7
[Kunlun] PR1:Support one Kunlun card training in parallel executor ( #29337 )
4 years ago
Y_Xuan
76738504ad
添加rocm平台支持代码 ( #29342 )
...
* 添加rocm平台支持代码
* 修改一些问题
* 修改一些歧义并添加备注
* 修改代码格式
* 解决冲突后的代码修改
* 修改operators.cmake
* 修改格式
* 修正错误
* 统一接口
* 修改日期
4 years ago
Zhang Ting
1e9127f688
improve dropout grad ( #29605 )
...
* improve grad perf
4 years ago
wangchaochaohu
eab44e1f32
refine ( #29622 )
4 years ago
WangXi
613c46bc07
fix gen_nccl_id_op_helper compile failed, test=develop ( #29614 )
4 years ago
Chen Weihang
f02aece1f0
Add complex dtype op (add) test example ( #29603 )
...
* add op test case for complex
* polish code details
* add xpu set constant support
* fix argument rror
* remove useless pyc file
4 years ago
AshburnLee
efea540ca9
Add tf32 support for A100 tensor core acceleration for cuBLAS ( #28732 )
4 years ago
lijianshe02
7779768b53
add transpose double grad test=develop ( #29600 )
...
* add transpose double grad test=develop
4 years ago
wangchaochaohu
1b69e528d3
optimize for long width for elementwise ( #29602 )
4 years ago
Wilber
78dad78610
fix none-contiguous bug for python api. ( #29615 )
4 years ago
ShenLiang
1efef8baed
Fix bug of matmul_v2 for broadcast case ( #29599 )
...
* fix bug of matmul_v2 for broadcast
4 years ago
qingqing01
8d549fc85d
Add clip double grad ( #29590 )
4 years ago
wangchaochaohu
ac4bae8ee9
elementwise_add_grad Op optimization ( #29575 )
4 years ago
arlesniak
62d4483649
Added verbose oneDNN lib version ( #29378 )
4 years ago
lilong12
ff6a145011
update, test=develop ( #29559 )
4 years ago
WangXi
467c716963
gen nccl id use socket ( #29431 )
4 years ago
tangwei12
0034273b7e
add service ( #29560 )
...
* add service, remove ut on mac
* fix heter_profiler & add heter stop method
* fix code style
4 years ago
Leo Chen
c0163837a5
Fix compile problem when cuda_arch < 6000 ( #29576 )
...
* fix compile problem when cuda_arch < 6000
* refine code
* refine code
4 years ago
QingshuChen
79a41a9ed6
support roi_align & affine_channel for kunlun ( #29561 )
...
* support roi_align & affine_channel for kunlun
* minor
4 years ago
Jacek Czaja
f6cca62575
[oneDNN] Making ThreadID info in caching key optional ( #29272 )
4 years ago
Wilber
740c0d58c3
update for xpu ci. ( #29568 )
4 years ago
JZ-LIANG
d33d468f02
[Sharding] add hybrid-dp feature ( #29518 )
...
* Sharding add hybrid-dp feature
* update sharding in distributed_strategy
* update sharding unitest
* revise code format for sharding
4 years ago
Leo Chen
1e72e03217
remove duplicated macro ( #29563 )
4 years ago
Zhang Ting
6702040e94
improve dropout ( #29465 )
...
* improve drop out
* add VectorizedRandomGeneratorWithGenerator
* fix bug
* modify according to comments
4 years ago
Zhang Ting
30d9589afe
add cast cuda kernel ( #29352 )
4 years ago
LoveAn
b5d4a1f33d
Add the strategy of skipping cc/cu test compilation and execution in CI ( #29499 )
...
* Add the strategy of skipping cc/cu test compilation and execution in CI, test=develop
* fix if error with CI_SKIP_TEST, test=develop
* fix add properties to test error on Linux/MAC, test=develop
* fix set test properties of test_code_generator error, test=develop
* remove test codes and advance judgment of file modification on Linux, test=develop
* rename CI_SKIP_TEST to CI_SKIP_CPP_TEST, test=document_fix
* Add branch judgement on Linux, test=develop
4 years ago
Aurelius84
2a42250699
Polish hash function of executor cache key ( #29556 )
...
* Add more value to calculate hash key
* fix size_t
* polish code
4 years ago
taixiurong
760d015c14
add xpu ops for training transformer in kunlun ( #29539 )
...
* 1.fix matmul bug 2. add one hot
* add xpu error msg
4 years ago
Jacek Czaja
83a693ee55
[oneDNN] Added Unit Test for Multiple instances prediction ( #29501 )
...
* - Added infrastructre for new test
- Added UT for Multiple models prediction
- cosmetic fixes
- lint
- lint fixes
* - Removed timeout for MMP test
4 years ago
Zhong Hui
60bfd308ab
fix p_norm with empty shape ( #29500 )
...
fix p_norm with empty shape (#29500 )
4 years ago
Leo Chen
9f926eb720
Layernorm opt ( #29522 )
...
* layernorm fw opt
* layernorm bw opt
* fix typo, test=develop
* remove const dim3 for windows CI compatibility
* merge develop
Co-authored-by: zlsh80826 <zlsh80826@gmail.com>
4 years ago
tangwei12
ae3f7a7100
add ps table ( #29463 )
...
* add ps table
Change-Id: I468a04bd071d21ff52654926fcf4d5f3da19e178
4 years ago
ShenLiang
d8391a1983
fix error message of gather nd ( #29521 )
4 years ago
Zhen Wang
5ac71b36fb
Remove tensor copy in the update_loss_scaling op. ( #29426 )
...
* remove tensor copy in the update_loss_scaling op
* not use thrust.
* fix some cuda memory access error.
4 years ago
Zhou Wei
e74e1a226c
support deepcopy for Layer/Tensor/Paramerbase ( #29387 )
...
* support deepcopy for Layer/Tensor/Paramerbase
* fix some code
4 years ago
joejiong
87e75a77c2
Add tangent operator ( #29207 )
...
As the title
4 years ago
zlsh80826
95e334810a
Softmax vectorization ( #29404 )
...
* vec softmax fw
* vec softmax bw
* add a message argument for compiler compatibility
4 years ago
ShenLiang
2ef9e0e23c
Rebuild group automatically in dynamic graph distributed ( #29255 )
...
* add tensor_indices in AssignGroupBySize
* add rebuild group in reducer
4 years ago
procr
3a0558339d
support mobilenet for kunlun ( #29458 )
4 years ago
Huihuang Zheng
a1909affc6
Fix Unit Test: Add Sleep Time for CUDA Retry ( #29442 )
...
Add Sleep Time for CUDA Retry, which is similar to our GPU retry logic. This is a try to avoid init GPU allocation random failure in unit test.
4 years ago
Leo Chen
e5e522493d
make gelu fp16 computing more robust ( #29484 )
4 years ago
Zhang Ting
560b432349
Revert "improve elementwise_add_grad perf ( #29277 )" ( #29464 )
...
This reverts commit befd6d5338
.
4 years ago
jakpiase
57a4f16d9e
added internal and external reorders to profiler ( #29443 )
...
* added external reorder to profiler
* added external and internal reorders to profiler
* added internal and external reorder to profiler
* added formatting to int/ext reorder commit
* removed unnecessary comment
4 years ago
Pei Yang
2480bdef6c
change hard_swish from plugin to layer ( #29177 )
...
* change hard_swish from plugin to layer
* add ut when threshold != scale
4 years ago
taixiurong
ecca6585cd
1. fix elementwise ops'bug 2. fix softmax_with_cross_entropy_op 3. add biliner_interp_op ( #29448 )
...
Co-authored-by: root <root@bjhw-sys-rpm0223.bjhw.baidu.com>
4 years ago
LoveAn
03b42d9fa7
fix unittest on windows, test=develop ( #29365 )
4 years ago
TTerror
a5fcc4b545
update reduce_sum op on xpu ( #29367 )
...
* update reduce_sum op on xpu
* update reduce_sum op on xpu
* support running on xpu
4 years ago
Jack Zhou
c7cada8571
Fix gru performace decline in 1.8.5 ( #29455 )
4 years ago
Zhang Ting
6296f4ed09
revert cast eigen kernel ( #29427 )
4 years ago
Leo Chen
a040c055a5
fix layer_norm accuracy ( #29434 )
4 years ago
Zhou Wei
24ba9ed436
fix that parameters'grad has grad var ( #29408 )
4 years ago
Leo Chen
4e19ce1df5
refine reshape grad and double grad kernel, use tensor copy async ( #29128 )
4 years ago
Shang Zhizhou
225a9c4ed8
Fix unittest ( #29412 )
...
* fix tensorrt unittest precision error
* fix unittest precision error. test_trt_subgraph_pass && test_trt_dynamic_shape_transformer_prune
4 years ago
Pei Yang
f860de4af7
support clip op trt converter ( #29411 )
4 years ago
Jack Zhou
1dd7b97b66
fix rnn_op bug in cudnn_version>= 8 ( #29406 )
4 years ago
LoveAn
671555ed32
Compiling operator libraries with Unity build ( #29130 )
...
* Compiling operator libraries with Unity Build on Windows CPU.
* Compiling operator libraries with Unity Build on Windows GPU, no_test, test=windows_ci
* Add option in windows ci script, no_test, test=windows_ci
* Optimize parallel compiling, test=develop
* remove limit of parallel compile and skip some ops in UB, test=develop
* remove changes of header file, test=develop
* remove changes of header file, test=develop
* fix test_eye_op unittest failed, test=develop
* Compiling operator libraries with Unity Build on Linux, test=develop
* set default WITH_UNITY_BUILD=OFF, test=develop
* Move unity build rules into a single file and add comment, test=develop
* optimize parallel compilation, test=develop
* fix undefined reference error on coverage ci, test=develop
4 years ago
cc
a623ce044f
Use different name_scope for different conv type, test=develop ( #29355 )
4 years ago
yongqiangma
7c508d8668
update unbind norm add CUDAPlace api doc information ( #29322 )
...
* enhance array_to_lod_tensor_op lod_tensor_to_array_op errors information. test=develop
* fix format. test=develop
* format fix. test=develop
* add lod_rank_table. test=develop
* fix format. test=develop
* fix doc info. test=develop
* fix np error
* add unbind dygraph api. test=develop
* fix unbind doc.test=develop
4 years ago
chentianyu03
879e913b6d
Make transpose, trace, kron, reshape, sum op support complex type ( #29321 )
...
* add complex64 and complex128 type; add +-*/@ and slice opreator for complex types
* add test cases for complex elementwise, matmul and getitem unittest
* add test cases for complex types
* add test cases for complex matmul unittest
* kron, reshape, transpose support complex types
* sum and trace op support complex types
* add test case of sum and trace op
* fix the bug of imag part of complex not initialized
* format file
* format code style
* kron support type promotion; modify test cases
4 years ago
卖鱼的哲学
074065e5de
fix expand/uniform_random && concat/transpose to new api on xpu ( #29280 )
...
* fix expand && concat/transpose to new api
* update uniform_random_op
* update xpu_header
4 years ago
lilong12
1decf4ada6
update, test=develop ( #29331 )
4 years ago
QingshuChen
74bf3bed36
support global pooling for kunlun ( #29293 )
...
* test=kunlun
4 years ago
liym27
b10ecd9d3a
[inplace] Add ShareHolderWith for class Variable and SharePlaceholderWith in VarBase.detach() to share the same Tensor/SelectedRows ( #29267 )
4 years ago
Chen Weihang
9ad800ebb2
Support type promote for basic math ops (quantum required) ( #29265 )
...
* basic impl of type promote
* add comment & another testcase
* fix complex bugs & support python op promote type
* fix failed unittests & polish code
* add unittest for coverage
* change to only promote complex type
* polish code details
* polish several comments
4 years ago
tangwei12
8358791607
fix gpu outofrange ( #29238 )
...
* fix gpu emb out of range
Change-Id: I5794ac73bd634d5ea069a6fbbd914274b6d6b7bf
* fix doc
Change-Id: I5a3350b2930a9ab2f52116c192b087307faf8fdf
4 years ago
Leo Chen
b58cfff89d
use has_grad instead of train_mode ( #29309 )
...
* use has_grad instead of train_mode
* add vlog for debug
* fix ut
* fix ut
4 years ago
Zhang Ting
befd6d5338
improve elementwise_add_grad perf ( #29277 )
...
* improve performance of elementwise_sum_grad
4 years ago
Shang Zhizhou
ebf689197d
fix tensorrt output shape error ( #29308 )
...
* fix tensorrt output shape error
* fix unittest tensorrt_engine_op_test
* fix code style for unitest
4 years ago
Aurelius84
67c700b479
[Dy2Stat] Add cache for Executor and Context in run_program_op ( #28421 )
4 years ago
ShenLiang
696dc4bb13
fix the warning of reducer ( #29323 )
4 years ago
wangchaochaohu
c4be80f402
polish the code of cumsum and remove some unused code ( #29303 )
4 years ago
Wilber
d68af02c04
fix analysis_config bug. ( #29304 )
4 years ago
ShenLiang
0fb18bc214
enforce the matmul_v2 error message ( #29297 )
4 years ago
Zhen Wang
9b59a589b1
Remove some useless log. ( #29300 )
4 years ago
Leo Chen
13a22a3752
fix shape of tile_grad op ( #29289 )
4 years ago
Zhen Wang
be3777a50a
Add pure fp16 training with master weights. ( #27712 )
...
* add the weight decay func for the momentum op
* Add the multi_precision function in Momentum Optimizer.
* Make sure that the initial value of master weights are same with the fp16 weights.
* add static loss scaling.
* add the rescale_grad function in the pure fp16 training.
* use the original momentum updating method.
* Polish some codes, such as variable names.
* add docstring for apis.
* update the var creation details of _create_master_weight.
* not modify codes about imperative momentum updating.
* Fix the error of test_dist_sparse_tensor_load_momentum UT.
* add unit test for multi precision fp16 training.
* add more unit tests for CI.
* Use lower threshold values for allclose comparing in test_multi_precision_fp16_train UT.
* For CI Coverage Checking.
4 years ago
Wojciech Uss
6673fb0565
change import math.h to cmath ( #29260 )
4 years ago
furnace
7584bb5096
Layer norm fp16 ( #29169 )
...
* add fp16 for layer_norm op
* revert layernorm api
* fix forward
* fix forward
* fix backward for layernorm with fp16
* fix unit test for layernorm with fp16
* fix with_mkldnn compile error for layernorm with fp16
* 1. revert to PADDLE_ENFORCE_NOT_NULL, 2. change static_cast<float> to static_cast<U>
* fix with_mkldnn compile error for layernorm with fp16
* fix with_mkldnn compile error for layernorm with fp16
Co-authored-by: zhiqiu <chenqiuliang@baidu.com>
4 years ago
Shang Zhizhou
c59b4f28a2
fix cmake error when WITH_GPU=ON and WITH_TENSORRT=ON && WITH_MKL=OFF ( #29275 )
4 years ago
Leo Chen
116305ea4b
Improve performance of elementwise_add grad op ( #29187 )
...
* pass stop_gradient for cast op
* improve performance of elementwise_add grad
* use tensor copy async
* dygraph branch
* fix dygraph branch
* add ut
4 years ago
卖鱼的哲学
07c67d5a8b
add deformable_conv op on xpu ( #29234 )
...
* rebase develop
* update deformable_conv op on xpu
* update deformable_conv op on xpu
4 years ago
Chen Weihang
1de32f823d
Hot fix complle failed in gcc4.8 caused by complex impl ( #29254 )
...
* hot fix complle failed in gcc4.8
* fix failed unittest
4 years ago
GeminiCarrie
642abe2a48
Fix a bug when running on an operating system without "bash." ( #29131 )
...
* Fix a bug when running on an operating system without "bash."
* add execution condition
* for ci-coverage
4 years ago
ShenLiang
46b73e6cd9
Change the api of DataParallel and Fleet ( #29224 )
4 years ago
QingshuChen
64f29fbb70
update kunlun conv2d/softmax/elementwise implemetation ( #29229 )
...
* update conv2d & softmax to new xpu api
* test=kunlun
* remove useless comments
* test=kunlun
* remote softmax xpu op
* test=kunlun
* update kunlun softmax
* test=kunlun
* update xpu unitest
* test=kunlun
* fix elementwise_grad bug for kunlun
*test=kunlun
4 years ago
chentianyu03
8f45d14263
add complex64 and complex128 type; add +-*/@ and slice opreator for c… ( #29199 )
...
* add complex64 and complex128 type; add +-*/@ and slice opreator for complex types
* add test cases for complex elementwise, matmul and getitem unittest
* add test cases for complex types
* add test cases for complex matmul unittest
4 years ago
Zhou Wei
c0a991c874
accumulate gradient for leaf tensor with previous graph and expose leaf tensor concept ( #28429 )
...
* The leaf tensor concept is exposed and the gradient accumulation of leaf tensor
* The leaf tensor concept is exposed and the gradient accumulation of leaf tensor
* fix coverage
* fix api doc
* fix CI unittest
* fix CI unittest
* fix unitest
* empty tensor does’t need inner_var_
* fix some error message
4 years ago
Wilber
74c43ac638
fix lite unit test. ( #29233 )
4 years ago
Adam Osewski
4096ff94dc
Small optimizations for conv2d kernel subroutines. ( #29188 )
...
- Make sure that oneDNN memory descriptors are created only once at
first iteration.
4 years ago
joanna.wozna.intel
5c61eeef61
Enable all image classification models ( #29155 )
4 years ago
Wilber
4fec182d24
[Lite-Subgraph] Fix compile error for lite subgraph. ( #29146 )
4 years ago
123malin
b5c6342336
Update ps gpu ( #29209 )
...
* fix paramete prefetch & device guard
Co-authored-by: MrChengmo <cmchengmo@163.com>
Co-authored-by: chengmo <chengmo@baidu.com>
4 years ago
liym27
865a45984f
Check whether there is any inplace operation affecting gradient calculation. ( #27901 )
...
* Add a class TensorInplaceVersion to count the inplace version and put it in framework::Tensor instead of Allocation or Variable.
* Add a new attribute `_inplace_version` for VarBase.
* Raise exception if an inplace operation can result in incorrect gradient computation.
* Add a new interface _bump_inplace_version() for VarBase to bump the version whenever the Tensor is modified through an inplace operation.
* For api assign, call _bump_inplace_version() when it's an inplace operation inn dynamic mode.
* Use original var_wrapper if the inplace_version is not changed.
* Replace SnapshotVarWrapperList with SnapshotVarWrapper to optimize performane.
4 years ago
123malin
03d4665f44
prefetch optimize ( #29095 )
...
* test=develop, optimize async prefetch
4 years ago
WangXi
0c2a51d240
optimizer amp, all use fp16 communication, overlap last comm and compute ( #28957 )
4 years ago
Chen Weihang
0b032faeee
Polish unittests details and execution conditions to adapt to MUSL ( #29044 )
...
* fix failed tests in yingchun gived list
* add unittests into static_mode_white_list
* add enable static
* fix dist unittest
* skip test_sigmoid_focal_loss_op & add gym
* revert no need skip unittests
* remove gym
4 years ago
Wojciech Uss
4fd4095d1b
Add quantization of multi_gru op and tests ( #28615 )
4 years ago
Jack Zhou
bc6033f86b
fix gru gcc7.4 bug for the gru compile
...
fix gru gcc7.4 bug for the gru compile
4 years ago
wangchaochaohu
b818429ae7
optimize cumsum OP ( #29193 )
4 years ago
ShenLiang
e2d01eb650
Support dynamic graph distributed ( #28997 )
...
* add reducer
* refine envent for memorycopy
* add concat&split for allreduce
* apply concat & split for fuse tensor
* fix nccl dep
* fix the untest, compile problem and ddp initialize problem
* fix untest for mac & add some comments & solve the repeated param in sublayers
* fix untest for windows & fix document
4 years ago
lilong12
7e5e9934fe
update expand as op to use the shape of the target tensor instead of the target tensor itself. ( #29020 )
...
* update, test=develop
4 years ago
Zhou Wei
e668cb07fb
fix CUDA 11 error on windows ( #29101 )
4 years ago
Jack Zhou
085260f3de
Add eigen gru and fix the dropout bug in the rnn
...
Add eigen gru and fix the dropout bug in the rnn
4 years ago
yaoxuefeng
545df287fc
add user_define_dump ( #28596 )
4 years ago
arlesniak
bc902044a4
Fixes mkldnn dygraph learning rate scheduler crashes ( #28988 )
4 years ago
Shang Zhizhou
b9e76a0103
detect tensorRT plugin fp16 in runtime ( #27933 )
...
* remove -DSUPPORTS_CUDA_FP16 in cuda.cmake
* comile with cuda9
* add some unittest
* notest;test=coverage
* add unittest for trt plugin swish && split
* update ernie unittest
* fix some error message
* remove repeated judgement of CUDA version in mbEltwiseLayerNormOpConverter
* fix comile errror when CUDA_ARCH_NAME < Pascal"
* fix comile error
* update unittest timeout
* compile with cuda9
* update error msg
* fix code style
* add some comments
* add define IF_CUDA_ARCH_SUPPORT_FP16
* rename IF_CUDA_ARCH_SUPPORT_FP16 to CUDA_ARCH_FP16_SUPPORTED
4 years ago
Leo Chen
fd3fcb051a
fix typo of flag name ( #29154 )
4 years ago
Noel
da71173bc9
Fix ops doc for some ops
...
Fix ops doc for some ops
4 years ago
Leo Chen
770395cb93
Split train_mode and has_grad for tracer ( #29064 )
...
* split train_mode and has_grad
* fix format
* fix ci problems
* fix sample code
4 years ago
Aurelius84
7ae3cb554a
Polish CUDA Information stdout ( #29109 )
4 years ago
WangXi
173c22aec2
optimize fast graph executor ( #28962 )
4 years ago
Shang Zhizhou
562ded1041
fix unittest trt_dynamic_shape_transformer_prune_test error ( #29122 )
4 years ago
Shibo Tao
db41258501
add API serialize_program, serialize_persistables, save_to_file, deserialize_program, deserialize_persistables, load_from_file. ( #29034 )
4 years ago
joanna.wozna.intel
b0d1ac161e
Add bf16 pool2d and unify bf16 unit tests ( #29039 )
...
* Add bf16 pool2d and unify bf16 unit tests
* Add change default ops test
4 years ago
joanna.wozna.intel
fddea67445
Fix cpu_bfloat16_pass ( #28730 )
...
* Fix cpu_bfloat16_pass
* Add output_format
* Fix incorrect SetOutput
* Change fromating
4 years ago
Qi Li
2fd16cf6fc
fix win ci failure, test=develop ( #29089 )
...
* fix win ci failure, test=develop
* add ci test, test=develop
4 years ago
Chen Weihang
fea0e294ee
Hide the C++ stack by default and add hints ( #29042 )
...
* default not show cpp statck & add hint
* fix failed unittest
* fix failed unittests
4 years ago
joejiong
582c0a0468
add uint8 for reshape op ( #28996 )
...
add uint8 for reshape operator
4 years ago
Zhou Wei
8ca0a8a859
fix tensor detach to zero copy ( #27921 )
...
* fix tensor detach to zero copy
* fix tensor detach to zero copy
4 years ago
taixiurong
a5aa4dc7a9
add xpu elementwise ops ( #29031 )
4 years ago
joejiong
b04c78ef5e
Update pow ( #29000 )
...
Simple code clean up
4 years ago
wawltor
b2c8a00745
remove eigen threadpool for the speed up
...
remove eigen threadpool for the speed up
4 years ago
Wojciech Uss
7b5a8e46de
Add multi_gru_fuse_pass and tests ( #28601 )
...
* Add multi_gru_fuse_pass and tests
* fix date
* cleaned up headers
4 years ago
lilong12
767d0ba267
update, test=develop ( #28700 )
4 years ago
Wojciech Uss
991345b368
Add multi_gru_seq_fuse_pass and tests ( #28604 )
...
* Add multi_gru_seq_fuse_pass and tests
* fix date
* removed unused functions
4 years ago
123malin
fbf9564f6b
【paddle.distributed.fleet】Optimize ParameterServer's Async Mode ( #28442 )
...
* test=develop, optimize global_step
4 years ago
lilong12
f77a78cdee
enable pipeline to run with Executor.run() ( #28373 )
...
* update, test=develop
4 years ago
Thunderbrook
0073f9bdb0
support ps-gpu ( #28752 )
...
* ps gpu transpile
* ps gpu
* remove op
* gps trainer
* local ps
* add macro
* HeterBox
* def cuda
* tab
* code style
* style
Co-authored-by: Thunderbrook <a754913769#163.com>
4 years ago
Chen Weihang
768dab441e
polish two api doc detail, test=document_fix ( #28971 )
4 years ago
furnace
8ff3550658
refactor momentum op to combine weight ( #27414 )
...
* refactor momentum op to combine weight_decay (scale op and sum op)
4 years ago
Jacek Czaja
bd1d6d3b30
extends oneDNN caching keys so caching objects are unique to executor/predictor ( #28758 )
4 years ago
Pei Yang
994673bf4f
change avg pooling and global pooling to trt layer in dynamic shape mode ( #28702 )
...
* change avg pooling and global pooling to trt layer
* add support for static shape global pooling
* modify trt errmsg
4 years ago