Commit Graph

1041 Commits (d0a5620575a3ce94e0a7a5a20192e9307b0b9c93)

Author SHA1 Message Date
石晓伟 958612231f
compile the denormal.cc on aarch64, test=develop (#29956)
5 years ago
Huihuang Zheng d038746e1c
Fix Unix Sleep for Wrong Time. test=develop (#29953)
5 years ago
石晓伟 181ea1870b
flush denormals to zero, test=develop (#29924)
5 years ago
liuyuhui 3d1741b794
[Kunlun] bug fix of PR2: Support MultiDevicePass and BKCL in parallel executor (#29926)
5 years ago
Wilber 332da133a1
Support mips arch (#29903)
5 years ago
liuyuhui 4427df37cf
[Kunlun] PR2: Support MultiDevicePass and BKCL in parallel executor (#29574)
5 years ago
LielinJiang 0f4b218640
Enable bilateral_slice unittest on windows platform (#29896)
5 years ago
Chen Weihang 1a304e6c06
[Complex] Add support for complex grad accumulated (#29889)
5 years ago
Wilber 2c0a4a3470
call_statck is turned on default when ON_INFER=ON (#29798)
5 years ago
Jacek Czaja c9e874fc8e
[oneDNN] Unit test for checking oneDNN caching (#29606)
5 years ago
Huihuang Zheng 1cbb282d77
Add Retry Logic to CublasHandlerHolder
5 years ago
Jacek Czaja 07790ba13e
[oneDNN] Reimplemented elementwise_add grad (#29747)
5 years ago
Aurelius84 17c8e3adfe
Polish code in gpu_launch_config.h (#29730)
5 years ago
wanghuancoder 0c59ad2a1a
Windows generate pdb and dump, for debug (#29628)
5 years ago
Huihuang Zheng 4c4d4ba5e0
Modify CublasHandleHolder to Fix Random Unittest Failure. test=develop (#29617)
5 years ago
Jacek Czaja 9eff1a674f
Added missing format of oneDNN (#29670)
5 years ago
liuyuhui f13c3a9cd7
[Kunlun] PR1:Support one Kunlun card training in parallel executor (#29337)
5 years ago
Y_Xuan 76738504ad
添加rocm平台支持代码 (#29342)
5 years ago
AshburnLee efea540ca9
Add tf32 support for A100 tensor core acceleration for cuBLAS (#28732)
5 years ago
arlesniak 62d4483649
Added verbose oneDNN lib version (#29378)
5 years ago
Jacek Czaja f6cca62575
[oneDNN] Making ThreadID info in caching key optional (#29272)
5 years ago
taixiurong 760d015c14
add xpu ops for training transformer in kunlun (#29539)
5 years ago
Huihuang Zheng a1909affc6
Fix Unit Test: Add Sleep Time for CUDA Retry (#29442)
5 years ago
jakpiase 57a4f16d9e
added internal and external reorders to profiler (#29443)
5 years ago
Jack Zhou 1dd7b97b66
fix rnn_op bug in cudnn_version>= 8 (#29406)
5 years ago
chentianyu03 879e913b6d
Make transpose, trace, kron, reshape, sum op support complex type (#29321)
5 years ago
卖鱼的哲学 074065e5de
fix expand/uniform_random && concat/transpose to new api on xpu (#29280)
5 years ago
lilong12 1decf4ada6
update, test=develop (#29331)
5 years ago
Chen Weihang 9ad800ebb2
Support type promote for basic math ops (quantum required) (#29265)
5 years ago
QingshuChen 64f29fbb70
update kunlun conv2d/softmax/elementwise implemetation (#29229)
5 years ago
chentianyu03 8f45d14263
add complex64 and complex128 type; add +-*/@ and slice opreator for c… (#29199)
5 years ago
ShenLiang e2d01eb650
Support dynamic graph distributed (#28997)
5 years ago
Zhou Wei e668cb07fb
fix CUDA 11 error on windows (#29101)
5 years ago
arlesniak bc902044a4
Fixes mkldnn dygraph learning rate scheduler crashes (#28988)
5 years ago
Shang Zhizhou b9e76a0103
detect tensorRT plugin fp16 in runtime (#27933)
5 years ago
Leo Chen fd3fcb051a
fix typo of flag name (#29154)
5 years ago
Aurelius84 7ae3cb554a
Polish CUDA Information stdout (#29109)
5 years ago
Chen Weihang fea0e294ee
Hide the C++ stack by default and add hints (#29042)
5 years ago
wawltor b2c8a00745
remove eigen threadpool for the speed up
5 years ago
Jacek Czaja bd1d6d3b30
extends oneDNN caching keys so caching objects are unique to executor/predictor (#28758)
5 years ago
Pei Yang 994673bf4f
change avg pooling and global pooling to trt layer in dynamic shape mode (#28702)
5 years ago
gongweibao 1dad8ceaab
Fix gpu memory allocation bug. (#28703)
5 years ago
QingshuChen 30ef3815b3
adjust kunlun header file (#28536)
5 years ago
Jacek Czaja 6d8d3d4c22
[oneDNN] Layer norm bf16 kernel (#28619)
5 years ago
lilong12 80d2024644
bug fix, test=develop (#28674)
5 years ago
Zhou Wei 849467b5aa
fix user set CUDA_VISIBLE_DEVICES start/end with quotation marks (#28547)
5 years ago
Chen Weihang 23439b1688
show cpp stack when catch signal (#28415)
5 years ago
Shang Zhizhou ea851796e5
TensorRT中ernie模型推理性能优化,支持变长输入 (#28367)
5 years ago
Jacek Czaja 84cc61b2cd
[oneDNN] sum op refactor (#28318)
5 years ago
Wilber 09fd2b2aab
Paddle support compile on sw (#27858)
5 years ago