Commit Graph

1051 Commits (81138239db4dbb37cf659ec5688d24ce33f7ab57)

Author SHA1 Message Date
Leo Chen 81138239db
[feature] support npu allocator (#30840)
4 years ago
gongweibao f9c97dd728
Add distribution supported (#30578)
4 years ago
石晓伟 715d862868
export global google flags to users, test=develop (#30448)
4 years ago
Huihuang Zheng 28e156c27f
Fix Sleep Error in enforce.h (#30335)
4 years ago
QingshuChen 2c1bba02e4
optimize memcpy perf for kunlun (#30291)
4 years ago
Chen Weihang c8c8f205ba
remove c++ stacktrace hint (#30325)
4 years ago
AshburnLee 924aac2216
Add tf32 switch for cuDNN (#29192)
4 years ago
Jacek Czaja 4aba17b5db
[oneDNN] Added UT for testing elementwise_mul caching (#30203)
4 years ago
WeiXin 404c16763a
Add detailed error message for curandStatus_t, cublasStatus_t, cusolverStatus_t (#30161)
4 years ago
Zhou Wei 30888ca343
Polish and Optimize the print/repr information of Layer (#29998)
4 years ago
石晓伟 958612231f
compile the denormal.cc on aarch64, test=develop (#29956)
4 years ago
Huihuang Zheng d038746e1c
Fix Unix Sleep for Wrong Time. test=develop (#29953)
4 years ago
石晓伟 181ea1870b
flush denormals to zero, test=develop (#29924)
4 years ago
liuyuhui 3d1741b794
[Kunlun] bug fix of PR2: Support MultiDevicePass and BKCL in parallel executor (#29926)
4 years ago
Wilber 332da133a1
Support mips arch (#29903)
4 years ago
liuyuhui 4427df37cf
[Kunlun] PR2: Support MultiDevicePass and BKCL in parallel executor (#29574)
4 years ago
LielinJiang 0f4b218640
Enable bilateral_slice unittest on windows platform (#29896)
4 years ago
Chen Weihang 1a304e6c06
[Complex] Add support for complex grad accumulated (#29889)
4 years ago
Wilber 2c0a4a3470
call_statck is turned on default when ON_INFER=ON (#29798)
4 years ago
Jacek Czaja c9e874fc8e
[oneDNN] Unit test for checking oneDNN caching (#29606)
4 years ago
Huihuang Zheng 1cbb282d77
Add Retry Logic to CublasHandlerHolder
4 years ago
Jacek Czaja 07790ba13e
[oneDNN] Reimplemented elementwise_add grad (#29747)
4 years ago
Aurelius84 17c8e3adfe
Polish code in gpu_launch_config.h (#29730)
4 years ago
wanghuancoder 0c59ad2a1a
Windows generate pdb and dump, for debug (#29628)
4 years ago
Huihuang Zheng 4c4d4ba5e0
Modify CublasHandleHolder to Fix Random Unittest Failure. test=develop (#29617)
4 years ago
Jacek Czaja 9eff1a674f
Added missing format of oneDNN (#29670)
4 years ago
liuyuhui f13c3a9cd7
[Kunlun] PR1:Support one Kunlun card training in parallel executor (#29337)
4 years ago
Y_Xuan 76738504ad
添加rocm平台支持代码 (#29342)
4 years ago
AshburnLee efea540ca9
Add tf32 support for A100 tensor core acceleration for cuBLAS (#28732)
4 years ago
arlesniak 62d4483649
Added verbose oneDNN lib version (#29378)
4 years ago
Jacek Czaja f6cca62575
[oneDNN] Making ThreadID info in caching key optional (#29272)
4 years ago
taixiurong 760d015c14
add xpu ops for training transformer in kunlun (#29539)
4 years ago
Huihuang Zheng a1909affc6
Fix Unit Test: Add Sleep Time for CUDA Retry (#29442)
5 years ago
jakpiase 57a4f16d9e
added internal and external reorders to profiler (#29443)
5 years ago
Jack Zhou 1dd7b97b66
fix rnn_op bug in cudnn_version>= 8 (#29406)
5 years ago
chentianyu03 879e913b6d
Make transpose, trace, kron, reshape, sum op support complex type (#29321)
5 years ago
卖鱼的哲学 074065e5de
fix expand/uniform_random && concat/transpose to new api on xpu (#29280)
5 years ago
lilong12 1decf4ada6
update, test=develop (#29331)
5 years ago
Chen Weihang 9ad800ebb2
Support type promote for basic math ops (quantum required) (#29265)
5 years ago
QingshuChen 64f29fbb70
update kunlun conv2d/softmax/elementwise implemetation (#29229)
5 years ago
chentianyu03 8f45d14263
add complex64 and complex128 type; add +-*/@ and slice opreator for c… (#29199)
5 years ago
ShenLiang e2d01eb650
Support dynamic graph distributed (#28997)
5 years ago
Zhou Wei e668cb07fb
fix CUDA 11 error on windows (#29101)
5 years ago
arlesniak bc902044a4
Fixes mkldnn dygraph learning rate scheduler crashes (#28988)
5 years ago
Shang Zhizhou b9e76a0103
detect tensorRT plugin fp16 in runtime (#27933)
5 years ago
Leo Chen fd3fcb051a
fix typo of flag name (#29154)
5 years ago
Aurelius84 7ae3cb554a
Polish CUDA Information stdout (#29109)
5 years ago
Chen Weihang fea0e294ee
Hide the C++ stack by default and add hints (#29042)
5 years ago
wawltor b2c8a00745
remove eigen threadpool for the speed up
5 years ago
Jacek Czaja bd1d6d3b30
extends oneDNN caching keys so caching objects are unique to executor/predictor (#28758)
5 years ago