Commit Graph

1073 Commits (a5c56d83a1b16482dcaae1db6e0543b1cf355f3f)

Author SHA1 Message Date
AshburnLee efea540ca9
Add tf32 support for A100 tensor core acceleration for cuBLAS (#28732)
4 years ago
arlesniak 62d4483649
Added verbose oneDNN lib version (#29378)
4 years ago
Jacek Czaja f6cca62575
[oneDNN] Making ThreadID info in caching key optional (#29272)
4 years ago
taixiurong 760d015c14
add xpu ops for training transformer in kunlun (#29539)
4 years ago
Huihuang Zheng a1909affc6
Fix Unit Test: Add Sleep Time for CUDA Retry (#29442)
4 years ago
jakpiase 57a4f16d9e
added internal and external reorders to profiler (#29443)
4 years ago
Jack Zhou 1dd7b97b66
fix rnn_op bug in cudnn_version>= 8 (#29406)
4 years ago
chentianyu03 879e913b6d
Make transpose, trace, kron, reshape, sum op support complex type (#29321)
4 years ago
卖鱼的哲学 074065e5de
fix expand/uniform_random && concat/transpose to new api on xpu (#29280)
4 years ago
lilong12 1decf4ada6
update, test=develop (#29331)
4 years ago
Chen Weihang 9ad800ebb2
Support type promote for basic math ops (quantum required) (#29265)
4 years ago
QingshuChen 64f29fbb70
update kunlun conv2d/softmax/elementwise implemetation (#29229)
4 years ago
chentianyu03 8f45d14263
add complex64 and complex128 type; add +-*/@ and slice opreator for c… (#29199)
4 years ago
ShenLiang e2d01eb650
Support dynamic graph distributed (#28997)
4 years ago
Zhou Wei e668cb07fb
fix CUDA 11 error on windows (#29101)
4 years ago
arlesniak bc902044a4
Fixes mkldnn dygraph learning rate scheduler crashes (#28988)
4 years ago
Shang Zhizhou b9e76a0103
detect tensorRT plugin fp16 in runtime (#27933)
4 years ago
Leo Chen fd3fcb051a
fix typo of flag name (#29154)
4 years ago
Aurelius84 7ae3cb554a
Polish CUDA Information stdout (#29109)
4 years ago
Chen Weihang fea0e294ee
Hide the C++ stack by default and add hints (#29042)
4 years ago
wawltor b2c8a00745
remove eigen threadpool for the speed up
4 years ago
Jacek Czaja bd1d6d3b30
extends oneDNN caching keys so caching objects are unique to executor/predictor (#28758)
4 years ago
Pei Yang 994673bf4f
change avg pooling and global pooling to trt layer in dynamic shape mode (#28702)
4 years ago
gongweibao 1dad8ceaab
Fix gpu memory allocation bug. (#28703)
4 years ago
QingshuChen 30ef3815b3
adjust kunlun header file (#28536)
4 years ago
Jacek Czaja 6d8d3d4c22
[oneDNN] Layer norm bf16 kernel (#28619)
4 years ago
lilong12 80d2024644
bug fix, test=develop (#28674)
4 years ago
Zhou Wei 849467b5aa
fix user set CUDA_VISIBLE_DEVICES start/end with quotation marks (#28547)
4 years ago
Chen Weihang 23439b1688
show cpp stack when catch signal (#28415)
4 years ago
Shang Zhizhou ea851796e5
TensorRT中ernie模型推理性能优化,支持变长输入 (#28367)
4 years ago
Jacek Czaja 84cc61b2cd
[oneDNN] sum op refactor (#28318)
4 years ago
Wilber 09fd2b2aab
Paddle support compile on sw (#27858)
4 years ago
Guo Sheng 9a600df373
Add rnn_op (#28197)
4 years ago
wangchaochaohu 0f4b6247c8
refine the gpu config for performance optimization (#28291)
4 years ago
Huihuang Zheng acc11c2a62
Retry CUDA Initialization to Fix Random Failure, test=develop (#28323)
4 years ago
Leo Chen 18c86fb2fb
hide some logs of p2p (#28307)
4 years ago
Jacek Czaja c11d9b3035
[oneDNN ] conv2d fwd&bwd optimization (#27871)
4 years ago
Chen Weihang 813b2ade34
Enrich the python error types of paddle & polish format (#28124)
4 years ago
Chen Weihang 2babd6ff67
Add compile limit for PADDLE_ENFORCE without error message (#28221)
4 years ago
Zhou Wei 5d7000215a
fix dynamic_loader more safe and error message on windows (#28117)
4 years ago
wangchaochaohu 463c72c2d9
refine gpu kernel config for Paddle (#28085)
4 years ago
Pei Yang a0b2f93689
reduce trt warning message (#28011)
4 years ago
lidanqing 7cb4a8b8f2
[oneDNN] Conv dilation support (#27914)
4 years ago
Zhang Ting d5cc144c60
tune backward filter algorithm for float16 (#27529)
4 years ago
Jacek Czaja 55e63763ec
[oneDNN] adaptive pool support (#27747)
4 years ago
chen zhiyu 6335e6a0a6
add musl option (#27798)
4 years ago
Jacek Czaja b9fda2ff09
Fix to issue #25537 (#27546)
4 years ago
joanna.wozna.intel 0cd4907eba
Add avx512 core instructions check (#27732)
4 years ago
123malin cc780b1977
test=develop, optimize geo communicator (#26857)
4 years ago
lilong12 bbc2add703
Initialize gloo for low level collective apis (#27672)
4 years ago
arlesniak 0ecf441af1
Add support for mkldnn ops types selection with FLAGS in dygraph (#27482)
4 years ago
lilong12 36c0410223
Revert "Initialize gloo for low level collective apis (#27356)", test=document_fix (#27665)
4 years ago
lilong12 5218b7af6b
add ncclSend and ncclRecv (#27621)
4 years ago
lilong12 fa73e4a284
Initialize gloo for low level collective apis (#27356)
4 years ago
Li Fuchen 1501a80f74
add support to float64 input of warpctc op. (#27399)
4 years ago
QingshuChen 6b727e08b1
support elementwise add, activation, matmul on Baidu Kunlun (#27143)
4 years ago
Zhong Hui a85592bcbf
fix cpplint error for the autmic max/min
4 years ago
Zhong Hui 597345d17b
fix cuda atomic for ARCH<350 for the automic_max
4 years ago
Shibo Tao 8f7bb52bd2
fix tensorrt 6 build error. test=develop (#27511)
4 years ago
wanghuancoder df43905f12
use iwyu clean include (#27267)
4 years ago
Zhong Hui 4a9d21de49
Add GPU Kernels of Segment Ops, support, sum, max, min, mean
4 years ago
Shang Zhizhou c17f9cf25f
[bug fix]:Memory increases after adapting the cudnn version to cudnn8 (#27436)
4 years ago
Chen Weihang 765064476b
Polish some lost invalid error message (#27445)
4 years ago
Leo Chen aba759ba16
[Feature] Enhance inplace addto strategy for gradient accumulation in static graph (#27112)
4 years ago
GaoWei8 1a7559718e
fix cudnn dyload (#27308)
4 years ago
Jack Zhou 63203c4abc
enhance reduce op which can reduce tensor with arbitrary rank
4 years ago
GaoWei8 ee1ed42c99
change sequence length attribute to input (#27193)
5 years ago
joanna.wozna.intel 1483ea2304
Add bfloat16 passes (#26999)
5 years ago
GaoWei8 4ff16eb201
Add padding cudnn interface (#26370)
5 years ago
wangchaochaohu 3eacced950
[cuda11 support] add support for cublas load of same function name (parameter diff) (#26963)
5 years ago
joanna.wozna.intel 95e1434bb2
Add bfloat16 data type (#25402)
5 years ago
Zhen Wang f9066e6a6f
Update the demo code and the doc of varbase.backward. (#26506)
5 years ago
lilong12 1c68138327
[api 2.0] add collective op for cpu using gloo and paddle.distributed.* apis (#26552)
5 years ago
joanna.wozna.intel 559e43eee4
Small change in conv2d and quantize pass (#26671)
5 years ago
Adam f3909020de
Add mechanism for blocking oneDNN cache clearing (#26502)
5 years ago
QingshuChen 138ecf24aa
support Baidu Kunlun AI Accelerator (#25959)
5 years ago
GaoWei8 1fbee267d4
remove scope in cudnn lstm (#25188)
5 years ago
Leo Chen 672578a797
Print user-friendly error message in core.ops (#26261)
5 years ago
wangchaochaohu 0b81d76310
[API2.0] add op for cudnn version query test=develop (#26180)
5 years ago
joanna.wozna.intel 734cf1c3e9
Change use_quantizer attribute name and data type (#25838)
5 years ago
Leo Chen 751305ecf0
Add flags to control call stack of error message (#25997)
5 years ago
Pei Yang beb0ca5fab
Fix TRT plugin registry without TRT lib (#25982)
5 years ago
Adam 68c6160e63
Add oneDNN fusion_gru kernel (#25594)
5 years ago
Zhaolong Xing 358bc06c72
[CUDNN8 support] : support CUDNN8 (#25664)
5 years ago
Pei Yang b717895f64
Fix registering trt plugin (#25744)
5 years ago
Chen Weihang 9b5a65b819
refine init signal handler meg dumper (#25911)
5 years ago
Chen Weihang d47304e6d9
Refine paddle error stack format (#25790)
5 years ago
Chen Weihang 2469b578f5
Unified paddle error format when catch system signal (#25765)
5 years ago
Chen Weihang 1b3081b1b4
Simplify BufferedReader to improve DataLoader performance (#25648)
5 years ago
arlesniak e52df3b125
Added DNNL cache management for DyGraph (#25624)
5 years ago
joanna.wozna.intel e5bbffa84c
Add NOMINMAX define due to windows.h max/min macro conflict (#25637)
5 years ago
Chen Weihang a6abd92dfd
Polish install error hint message (#25531)
5 years ago
Jacek Czaja 7dbc441eab
[oneDNN] cache cosmetics improvement (#25576)
5 years ago
LielinJiang 7129f544f0
Add bilateral_slice op (#25401)
5 years ago
GaoWei8 c10dcff12d
refine PADDLE_ENFORCE (#25456)
5 years ago
Chen Weihang 0b54d54fd8
Fix index overflow bug of the CUDA kernel loop increment (#25435)
5 years ago
Chen Weihang 7be285a66f
remove useless property, test=develop (#25461)
5 years ago
Jacek Czaja a5d1592f6c
Added missing oneDNN format (#25450)
5 years ago
Chen Weihang 172d4ecb6c
remove WITH_DSO compile option (#25444)
5 years ago
Zhen Wang bb45af02ac
add the c++ part of Imperative QAT. test=develop (#25446)
5 years ago