Commit Graph

18556 Commits (52b05baca349d1bbfcbb6ed78b289d6c66dbec3e)

Author SHA1 Message Date
zlsh80826 50cafa0b0c
remove redundant sync, set collect/dist kernel to context stream, sub_lod memcpy opt (#31641)
4 years ago
ronnywang 420527f0d9
[ROCM] fix layer_norm, norm, p_norm, test_sequence_softmax_op, test_math_op_patch_var_base (#31709)
4 years ago
Chen Weihang 87852616aa
[CustomOp] Support complex dtype in custom op (#31657)
4 years ago
zlsh80826 fe241fd02f
[Paddle-TRT] gather converter (#31640)
4 years ago
zlsh80826 4ea3427865
[Paddle-TRT] support batch axis concatenation when using dynamic shape (#31627)
4 years ago
Zhang Ting 7f50bb7ec1
support NHWC for temporal_shift op (#31642)
4 years ago
Chen Weihang 2fbe9b097a
[CustomOp] Remove Eigen dependencies of float16 (#31669)
4 years ago
Qi Li d9b50f664f
[ROCM] update ci scripts and dockefile, test=develop (#31551)
4 years ago
YUNSHEN XIE 1a6e3b04cd
Second optimization of retry method (#31646)
4 years ago
yiak c1b1ccfbf5
Update tinyformat.h (#31612)
4 years ago
ronnywang da10c5cf8b
[ROCM] fix softmax_with_cross_entropy_op, test=develop (#31629)
4 years ago
Chen Weihang 027b574a0e
[CustomOp] Remove the dependence of the underlying data types on eigen (#31602)
4 years ago
WangXi 9066b74f58
c_gen_nccl_id add SocketServer to persit server (#31589)
4 years ago
Kaipeng Deng a32e8bf1e7
DataLoader supprot dict str (#31481)
4 years ago
Chen Weihang 30a627aaf3
Normalized function parameter writing (#31588)
4 years ago
Pei Yang cac9635a67
[Paddle-TRT] Fix engine key in trt int8 calibration (#31513)
4 years ago
Shang Zhizhou 50ac7dbfd0
Trt elementwise plugin serialize (#31587)
4 years ago
whs da9dda5c9b
Make CreateProgramDesc more robust (#31543)
4 years ago
Qi Li 3d5aa9d10a
[ROCM] fix conv2d and conv3d op, test=develop (#31553)
4 years ago
YUNSHEN XIE f302bb4f8b
help timeout ut debug (#31500)
4 years ago
Chen Weihang 95cceb2dd7
[CustomOp] Support duplicable op input and output (#31535)
4 years ago
YUNSHEN XIE 49c3d2a97b
modified show_ut_retry_result (#31528)
4 years ago
lidanqing 0f1e7e3d52
[Bug fix] Different machine generate different binary file, remove md5 check (#31482)
4 years ago
jiangcheng 9ed6c895f1
optimize range op by place parameters on cpu rather than gpu, test=develop (#30811)
4 years ago
Thunderbrook 3789a69923
solve bug in heter mode (#31531)
4 years ago
chajchaj 6148b87f9d
add softmax_switch for softmax_with_cross_entropy_op, test=develop (#31428)
4 years ago
WangXi 83a2fb1f08
Add collective async wait op (#31463)
4 years ago
lilong12 0205e9f84e
remove the send/recv of tensor size (#31460)
4 years ago
furnace 910f377fa5
Bugfix rocm (#31490)
4 years ago
Qi Li 416e47edef
[ROCM] fix softmax with loss nan in HIP platform, test=develop (#31491)
4 years ago
Shang Zhizhou f57739be35
fix ernie_varlen when cutting head (#31497)
4 years ago
JamesLim 45c7d90564
Optimization of elementwise CUDA kernel (#30801)
4 years ago
YUNSHEN XIE 0b3c229606
Prec on mac (#31382)
4 years ago
Jacek Czaja 23d96cf221
[oneDNN] bumpup onednn 2.2 fixup version (#31473)
4 years ago
YUNSHEN XIE 390cebee15
Prec on windows exclude check_added_ut (#31372)
4 years ago
wangguanzhong 43d6abf0a5
update conv2d, test=develop (#31480)
4 years ago
wangguanzhong 50af0c2cbb
fix roi_align, test=develop (#31479)
4 years ago
ronnywang e03e46730c
[ROCM] fix gather_op, sigmoid_cross_entropy_with_logits_op, test=develop (#31467)
4 years ago
Qi Li b85c8e03be
[ROCM] fix reduce op, test=develop (#31478)
4 years ago
Jacek Czaja 39a5424ed1
[oneDNN] elementwise add bf16 grad kernel with broadcasting (#31385)
4 years ago
石晓伟 5f6213217b
update zero_copy_tensor_test.cc for build of gcc485, test=develop (#31470)
4 years ago
Qi Li 133a914bd0
[ROCM] fix test_dist_op ci test, test=develop (#31468)
4 years ago
Qi Li f9377965c4
[ROCM] fix dropout and remove hipcub, test=develop (#31455)
4 years ago
石晓伟 bc7632be73
upgrade inference tensor apis, test=develop (#31402)
4 years ago
JamesLim 8491ae9a02
Creating a CUDA function to find the minimum value in warp or block (#31191)
4 years ago
Pei Yang 30717a6cbc
fix trt serialization on windows (#31438)
4 years ago
Pei Yang 1321c47950
add more info in trt engine serialization (#31434)
4 years ago
liuyuhui 9ebf05b003
[Kunlun]Multi xpu dygraph performance optimization , add distributed.spawn support for multi xpu and some bug-fixes (#31130)
4 years ago
Qi Li 4d647ec137
[ROCM] update fluid platform for rocm (part5), test=develop (#31315)
4 years ago
Wilber c9a7bfec89
prepare remove grad script and update PADDLE_CI_INFERENCE pipeline (#31149)
4 years ago
Zhang Ting 7d95e598c1
support float16 for temporal_shift op (#31432)
4 years ago
wuhuanzhou 4d6d2db812
Windows system supports Ninja compilation (#31161)
4 years ago
liym27 0fff930667
Fix bug for set_value op when input dtype is not float32 (#31411)
4 years ago
jakpiase 5b4f8aac82
Added LSTM BF16 and fixed GRU BF16 (#31234)
4 years ago
Qi Li 7cdf6ea770
[ROCM] update fluid elementwise op for rocm (part10), test=develop (#31361)
4 years ago
Qi Li 84639b6193
[ROCM] update fluid operators for rocm (part3), test=develop (#31213)
4 years ago
Qi Li 3b9db17199
[ROCM] update fluid operators for rocm (part7), test=develop (#31307)
4 years ago
Qi Li db50fb6766
[ROCM] fix softmax with loss and update python scripts, test=develop (#31373)
4 years ago
Pei Yang 32211fe9c4
TRT conv2d converter support SAME padding (#31379)
4 years ago
Qi Li e312a1ff6e
[ROCM] update fluid operators for rocm (part9), test=develop (#31338)
4 years ago
Qi Li 6626c6a6ad
fix bert cu file compiler error, test=develop (#31389)
4 years ago
Zhou Wei 13e4280f82
[Custom OP]polish doc of custom OP (#31369)
4 years ago
Qi Li 946dbdae8c
[ROCM] update fluid operators for rocm (part6), test=develop (#31301)
4 years ago
Shang Zhizhou 77c44e2f1b
change prelu plugin to tensorRT layer (#30210)
4 years ago
Qi Li 59940cb383
[ROCM] update fluid operators for rocm (part8), test=develop (#31309)
4 years ago
tangwei12 5d7a8b05f8
fix sycn training error (#31357)
4 years ago
Qi Li ec72f5b235
fix ELU output for nan, test=develop (#31132)
4 years ago
Qi Li 65bcaeb004
[ROCM] update fluid operators for rocm (part5), test=develop (#31258)
4 years ago
YUNSHEN XIE 2111d912d4
Decrease threshold for failed ut retry (#30903)
4 years ago
Pei Yang 2e9e3fad15
add n-d input support for trt scale converter (#31316)
4 years ago
Shang Zhizhou 6404c43814
support trt serialize when load model from memory (#31342)
4 years ago
Gradie d79fdc3d62
lamb_op_xpu;test=kunlun (#31012)
4 years ago
danleifeng d1075df2e8
topo and memory performance for heterps (#30440)
4 years ago
Qi Li 72d99c5dcd
[ROCM] update fluid operators for rocm (part4), test=develop (#31225)
4 years ago
cucuzg 91635de390
opt matmul and matmul_v2 on kunlun, *test=kunlun (#31326)
4 years ago
Wilber e20234094c
Fix xpu compile and cipher symbol problem. (#31271)
4 years ago
wuhuanzhou 30858d8974
fix compilation errors for missing brpc header files, test=develop (#31325)
4 years ago
石晓伟 625482f752
inference modification for custom operator, test=develop (#31312)
4 years ago
wuhuanzhou a13f1d6930
optimize unity build (#31119)
4 years ago
jiangcheng 8f4ac6b525
optimize topk op through limit SortTopK kernel entrance, test=develop (#30403)
4 years ago
alncat bfb8a64234
updated conv bn fuse pass to make it compatible with latest batch_norm op (#31272)
4 years ago
Chen Weihang 5610c1717e
fix dtype unmatched (#31305)
4 years ago
Qi Li 9b016c7cb7
[ROCM] update fluid operators for rocm (part2), test=develop (#31211)
4 years ago
niuliling123 2fd999d979
Optimized the adaptive_avg_pool2d op when output_size == 1 (#31197)
4 years ago
石晓伟 1da3280660
inference modification for custom operator, test=develop (#31283)
4 years ago
Zhou Wei af9066e89c
[Custom OP]add PD_THROW and PD_CHECK for User Error message (#31253)
4 years ago
石晓伟 8c94d8cb4c
[Custom OP] change the user header file format, test=develop (#31274)
4 years ago
Jiabin Yang 038ce70d69
[Custom OP] Support stream set on Custom Op (#31257)
4 years ago
Jiabin Yang 0c38708a90
[Custom Op] Remove unsupport dtypes (#31232)
4 years ago
WangXi b8bce682e0
xpu support fuse allreduce (#31104)
4 years ago
Chen Weihang 126633c50f
[CustomOp] Split build op marco & polish details (#31229)
4 years ago
tangwei12 903235945b
loglevel adjustment for distributed training (#31205)
4 years ago
Qi Li 28b356b9a2
[ROCM] update fluid framework for rocm (part6), test=develop (#31015)
4 years ago
Qi Li c8fac5ee30
[ROCM] update fluid framework for rocm (part5), test=develop (#31014)
4 years ago
Qi Li 580447d019
[ROCM] update fluid framework for rocm (part4), test=develop (#31013)
4 years ago
Wilber 7d91974c91
enable lite ut. (#30890)
4 years ago
Guanghua Yu d18c5e47f3
fix ignore_index check in softmax_with_cross_entropy (#31201)
4 years ago
chentianyu03 ca3b6bcf78
add cache for VariableWrapper (#30880)
4 years ago
wangchaochaohu f114c3f8ca
fix the branch of code choose (#31200)
4 years ago
joanna.wozna.intel d11602481c
Add bf16 gru model test (#31158)
4 years ago