Commit Graph

18556 Commits (52b05baca349d1bbfcbb6ed78b289d6c66dbec3e)

Author SHA1 Message Date
Zhang Ting 7d95e598c1
support float16 for temporal_shift op (#31432)
4 years ago
wuhuanzhou 4d6d2db812
Windows system supports Ninja compilation (#31161)
4 years ago
liym27 0fff930667
Fix bug for set_value op when input dtype is not float32 (#31411)
4 years ago
jakpiase 5b4f8aac82
Added LSTM BF16 and fixed GRU BF16 (#31234)
4 years ago
Qi Li 7cdf6ea770
[ROCM] update fluid elementwise op for rocm (part10), test=develop (#31361)
4 years ago
Qi Li 84639b6193
[ROCM] update fluid operators for rocm (part3), test=develop (#31213)
4 years ago
Qi Li 3b9db17199
[ROCM] update fluid operators for rocm (part7), test=develop (#31307)
4 years ago
Qi Li db50fb6766
[ROCM] fix softmax with loss and update python scripts, test=develop (#31373)
4 years ago
Pei Yang 32211fe9c4
TRT conv2d converter support SAME padding (#31379)
4 years ago
Qi Li e312a1ff6e
[ROCM] update fluid operators for rocm (part9), test=develop (#31338)
4 years ago
Qi Li 6626c6a6ad
fix bert cu file compiler error, test=develop (#31389)
4 years ago
Zhou Wei 13e4280f82
[Custom OP]polish doc of custom OP (#31369)
4 years ago
Qi Li 946dbdae8c
[ROCM] update fluid operators for rocm (part6), test=develop (#31301)
4 years ago
Shang Zhizhou 77c44e2f1b
change prelu plugin to tensorRT layer (#30210)
4 years ago
Qi Li 59940cb383
[ROCM] update fluid operators for rocm (part8), test=develop (#31309)
4 years ago
tangwei12 5d7a8b05f8
fix sycn training error (#31357)
4 years ago
Qi Li ec72f5b235
fix ELU output for nan, test=develop (#31132)
4 years ago
Qi Li 65bcaeb004
[ROCM] update fluid operators for rocm (part5), test=develop (#31258)
4 years ago
YUNSHEN XIE 2111d912d4
Decrease threshold for failed ut retry (#30903)
4 years ago
Pei Yang 2e9e3fad15
add n-d input support for trt scale converter (#31316)
4 years ago
Shang Zhizhou 6404c43814
support trt serialize when load model from memory (#31342)
4 years ago
Gradie d79fdc3d62
lamb_op_xpu;test=kunlun (#31012)
4 years ago
danleifeng d1075df2e8
topo and memory performance for heterps (#30440)
4 years ago
Qi Li 72d99c5dcd
[ROCM] update fluid operators for rocm (part4), test=develop (#31225)
4 years ago
cucuzg 91635de390
opt matmul and matmul_v2 on kunlun, *test=kunlun (#31326)
4 years ago
Wilber e20234094c
Fix xpu compile and cipher symbol problem. (#31271)
4 years ago
wuhuanzhou 30858d8974
fix compilation errors for missing brpc header files, test=develop (#31325)
4 years ago
石晓伟 625482f752
inference modification for custom operator, test=develop (#31312)
4 years ago
wuhuanzhou a13f1d6930
optimize unity build (#31119)
4 years ago
jiangcheng 8f4ac6b525
optimize topk op through limit SortTopK kernel entrance, test=develop (#30403)
4 years ago
alncat bfb8a64234
updated conv bn fuse pass to make it compatible with latest batch_norm op (#31272)
4 years ago
Chen Weihang 5610c1717e
fix dtype unmatched (#31305)
4 years ago
Qi Li 9b016c7cb7
[ROCM] update fluid operators for rocm (part2), test=develop (#31211)
4 years ago
niuliling123 2fd999d979
Optimized the adaptive_avg_pool2d op when output_size == 1 (#31197)
4 years ago
石晓伟 1da3280660
inference modification for custom operator, test=develop (#31283)
4 years ago
Zhou Wei af9066e89c
[Custom OP]add PD_THROW and PD_CHECK for User Error message (#31253)
4 years ago
石晓伟 8c94d8cb4c
[Custom OP] change the user header file format, test=develop (#31274)
4 years ago
Jiabin Yang 038ce70d69
[Custom OP] Support stream set on Custom Op (#31257)
4 years ago
Jiabin Yang 0c38708a90
[Custom Op] Remove unsupport dtypes (#31232)
4 years ago
WangXi b8bce682e0
xpu support fuse allreduce (#31104)
4 years ago
Chen Weihang 126633c50f
[CustomOp] Split build op marco & polish details (#31229)
4 years ago
tangwei12 903235945b
loglevel adjustment for distributed training (#31205)
4 years ago
Qi Li 28b356b9a2
[ROCM] update fluid framework for rocm (part6), test=develop (#31015)
4 years ago
Qi Li c8fac5ee30
[ROCM] update fluid framework for rocm (part5), test=develop (#31014)
4 years ago
Qi Li 580447d019
[ROCM] update fluid framework for rocm (part4), test=develop (#31013)
4 years ago
Wilber 7d91974c91
enable lite ut. (#30890)
4 years ago
Guanghua Yu d18c5e47f3
fix ignore_index check in softmax_with_cross_entropy (#31201)
4 years ago
chentianyu03 ca3b6bcf78
add cache for VariableWrapper (#30880)
4 years ago
wangchaochaohu f114c3f8ca
fix the branch of code choose (#31200)
4 years ago
joanna.wozna.intel d11602481c
Add bf16 gru model test (#31158)
4 years ago
jakpiase 2f1165342b
OneDNN hardswish integration (#30211)
4 years ago
Chen Weihang e8cdb49aa9
[CustomOp] Support attributes as func input in custom op (#31128)
4 years ago
Zhou Wei ffbf71359a
modify custom op dependent from paddle_framework to paddle_custom_op (#31195)
4 years ago
Leo Chen 0f1fde5102
fix the modification of set_expected_place (#31177)
4 years ago
lilong12 dc8dfba35b
align the default value of some configuration for fleet to that of single cards (#30740)
4 years ago
lilong12 a373aa7645
fix the bug in expand_v2 op (#30984)
4 years ago
Thunderbrook c4f279fe8d
support multi node in heterps (#31102)
4 years ago
liu zhengxi ae2be49f40
Add cublas_handle() to expose cublas_handle to ops (#31157)
4 years ago
Pei Yang 00b09e86ac
[Paddle-TRT] support group_norm (#31040)
4 years ago
Chen Weihang 1ce96fa118
[CustomOp] Add new paddle custom op so (#31141)
4 years ago
tangwei12 ebbdf52557
fix entry (#31079)
4 years ago
Qi Li ee76ea72de
[ROCM] update fluid collective op for rocm, test=develop (#31075)
4 years ago
yaoxuefeng d8fa65a3a8
fix heter compile (#30518)
4 years ago
Zhou Wei 4b220550ef
[Custom OP]Fix problem of custom op unitests on Windows CI (#31114)
4 years ago
Zhou Wei be61c2d06b
support build whl and inference library nightly,test=windows3 (#30616)
4 years ago
alncat 5d6a8c7b73
added support for fake_quantize_dequantize_abs_max op in quantization… (#30896)
4 years ago
Jacek Czaja d3f09ad702
Update of onednn to 2.2 (#31067)
4 years ago
Guanghua Yu 24ba5ee05c
merge develop conflict (#31122)
4 years ago
Qi Li cced930b61
[ROCM] update fluid operators for rocm (part1), test=develop (#31077)
4 years ago
wangchaochaohu 364cfa2686
fix windows for optimization of elementwise_add Op (#31068)
4 years ago
joanna.wozna.intel 781df300d0
Unification of BF16 enablement process (#31034)
4 years ago
Zhong Hui 16fe11d71e
fix softmax cross entropy integer overflow (#30590)
4 years ago
Zhou Wei 44ee251fde
fix UNIX cmake problem (#31113)
4 years ago
Qi Li a60d93fb77
[ROCM] update fluid framework for rocm (part2), test=develop (#31010)
4 years ago
Thunderbrook 565354f676
support save multi sparse table in one path (#31108)
4 years ago
Qi Li 50967135a5
[ROCM] update fluid framework for rocm (part3), test=develop (#31011)
4 years ago
Qi Li 8fe09faf14
[ROCM] update fluid framework for rocm (part1), test=develop (#31009)
4 years ago
Qi Li 334296306c
[ROCM] update fluid platform for rocm39 (part4), test=develop (#30936)
4 years ago
Shang Zhizhou a5c56d83a1
update trt int8 calibrator to IEntropyCalibratorV2 (#31060)
4 years ago
Zhou Wei adaec0073d
[2.0Custom OP]Support New Custom OP on Windows (#31063)
4 years ago
Qi Li 1d996637e6
[ROCM] update fluid imperative for rocm (part1), test=develop (#31017)
4 years ago
JamesLim b95eb38b8a
fix the bug in backward OP of index_sample. (#31026)
4 years ago
Chengmo 6b3371e0c7
Remove PE special profiler (#30886)
4 years ago
Chen Weihang 6beeafe797
[CustomOp] Add more dispatch marco for users (#31058)
4 years ago
TTerror d5323dab41
add squeeze_op/unsqueeze_op on kunlun;fix conv op and parallel executor;optimize lookup_table op (#31056)
4 years ago
123malin 16b4260b2f
test=develop, save/load, shrink (#30625)
4 years ago
Jiabin Yang 628451af06
hide useless headers and add complex support (#31074)
4 years ago
Wilber 463eae0383
update paddle_fluid.so to paddle_inference.so (#30850)
4 years ago
liym27 5b367dab44
[static setitem] Support the index is Tensor; step>1; step<0 .(#30949)
4 years ago
Qi Li eb3050fa9a
[ROCM] update fluid inference for rocm (part1), test=develop (#31018)
4 years ago
Jacek Czaja f7465641c3
Added reshape grad bf16 (#31035)
4 years ago
Wojciech Uss 615d8a2264
Modify relu native implementation 2 (#30996)
4 years ago
ShenLiang 9401173e3a
Remove scale loss before reduce in dygraph (#30807)
4 years ago
Wilber 0020d91506
fix python pass builder error. (#30946)
4 years ago
Wilber 39aeaa160e
fix jetson problem (#30939)
4 years ago
Wilber 01ccfbcde9
update trt error message when input height or width is -1 (#31019)
4 years ago
Wilber cf8b8f9c5e
resolve memory leak in cudnn8.0 (#31029)
4 years ago
Guanghua Yu 5b267474a9
add offset parameter in roi_align,generate_proposals.etc ops (#30864)
4 years ago
Chen Weihang 75f81233ae
fix regex error & simplify marco name (#31031)
4 years ago
Zhang Ting f0ee159280
enable exhaustive_search for forward and backward algos when dtype is float16 (#30959)
4 years ago
Pei Yang 9b54fe4154
add trt transpose and flatten converter (#31022)
4 years ago
joanna.wozna.intel caf9d39839
Add Conv Transpose BF16 (#30877)
4 years ago
Chen Weihang f649442ddd
New custom operator extension mechanism (#30690)
4 years ago
Zhou Wei 5c0332714f
fix bug of Linux UT parallel level (#30971)
4 years ago
wuhuanzhou 9b3c80c8ab
update eigen version on Windows (#30573)
4 years ago
ShenLiang dae3e1f337
Solve inconsistent order in each card in dynamic graph (#30931)
4 years ago
WangXi 14d039e4a1
Fix the problem that the number of ops executed by xpu is wrong (#30961)
4 years ago
Chen Weihang 010f2caa23
try to fix reader and signal test failed (#30960)
4 years ago
Adam Osewski 3ba69809bf
Fix LayerNorm tester for gcc4.8 (#30962)
4 years ago
Qi Li 93c1d9e761
[ROCM] update fluid platform for rocm39 (part3), test=develop (#30913)
4 years ago
QingshuChen 15297a065c
fix depends of kunlun bkcl (#30945)
4 years ago
liym27 97f7a70c01
Add error message for slice op(#30851)
4 years ago
liuyuhui 87197f8c2e
[kunlun]fix sync in multi kunlun xpu dygraph training. (#30943)
4 years ago
石晓伟 99bd16eb4e
bug fix of xpu lite engine, test=develop (#30918)
4 years ago
tianshuo78520a 2e93233899
Add WITH_XPU_BKCL in Kunlun-CI (#30919)
4 years ago
Qi Li 34f1628ce8
[ROCM] update fluid platform for rocm39 (part2), test=develop (#30774)
4 years ago
Jacek Czaja 9e527d9956
[oneDNN] Added basic changes for elementwise_add_grad bf16 (#30925)
4 years ago
Chengmo c98f144fbc
add truncated gaussian random (#30922)
4 years ago
liuyuhui 4a8b8b4547
[Kunlun] add gen_bkcl_id_op, support multi XPU cards training using multiprocess (#30858)
4 years ago
liym27 39f41cb47f
Performance optimization for dynamic setitem: Call op set_value to speed up because the original call to TensorToPyArray will introduce unnecessary data copy. (#30817)
4 years ago
liuyuhui bef46ccfc8
[Kunlun]fix include files of gen_comm_id_helper.cc (#30917)
4 years ago
wanghuancoder aab3a3012e
add include for heterbox_trainer.cc, develop=test (#30910)
4 years ago
taixiurong 24873f4f77
dyngraph (#30892)
4 years ago
Adam Osewski 092a2b1413
More UT for LayerNormFuse pass (#30891)
4 years ago
tianshuo78520a a80fe67f84
Change cmake/third_party files for CI (#30833)
4 years ago
Jacek Czaja abfa822650
[oneDNN]Extended adaptive pooling support for oneDNN pool kernel (#30757)
4 years ago
joanna.wozna.intel 73cdea01d4
Add bf16 fast performance verification (#30551)
4 years ago
Shang Zhizhou e6095bc2ce
fix split trt plugin initialize (#30875)
4 years ago
WangXi 6e3856d3fb
fix xpu dygraph place (#30868)
4 years ago
wanghuancoder 35c5b23f68
use iwyu clean include second time, test=develop (#30829)
4 years ago
cucuzg ac2e2e6b7f
add clip_by_norm on kunlun, *test=kunlun (#30862)
4 years ago
wawltor b7560a59ab
fix the broadcast for the large second input (#30818)
4 years ago
JamesLim 6e1e036a75
Implement cuda kernel for index_sample. (#30380)
4 years ago
AshburnLee 666efc2336
Call new cudnn batch norm API regardless of data type and data layout (#30157)
4 years ago
QingshuChen 5c8455d6ea
try again if kunlun memory malloc failed (#30855)
4 years ago
石晓伟 2ac4143b6c
support xpu with analysis predictor, test=develop (#30832)
4 years ago
liuyuhui 2cb55eff57
fix WITH_XPU_BKCL in CMakeLists.txt (#30854)
4 years ago
Adam Osewski 4f066e316e
Layer normalization fuse pass. (#30721)
4 years ago
WangXi b1026f64af
【kunlun】dygraph supports multi xpu card training (#30671)
4 years ago
joanna.wozna.intel 04532b8a83
Update Xbyak to v5.81 (#30809)
4 years ago
Shang Zhizhou b909450994
fix trt plugin clone and initialize bugs in TRT7.1+ (#30709)
4 years ago
Wilber b08ae368bb
ci compilation depends on a stable release (#30755)
4 years ago
Thunderbrook cb66c53c2d
dump to cpu (#30750)
4 years ago
Chengmo d3fac0ea85
fix int64 bug (#30780)
4 years ago
Qi Li 69875dc42c
[ROCM] update fluid memory for rocm35 (part1), test=develop (#30758)
4 years ago
QingshuChen c35a9880f9
fix malloc L3 failed bug for kunlun (#30745)
4 years ago
WangXi 31ed9c9eed
Fleet distributed strategy support pure fp16 (#30754)
4 years ago
Zhen Wang 53d01afed6
Fix the nan bug when passing all zero values into clip_by_norm_op. (#30777)
4 years ago
ShenLiang 3858f458ea
rm Singleton of reducer (#30775)
4 years ago
Qi Li f89da4ab45
[ROCM] update fluid platform for rocm35 (part1), test=develop (#30639)
4 years ago