Commit Graph

1059 Commits (46989e889b023bdb5434e4139ca13f5c4cbc57cf)

Author SHA1 Message Date
Jacek Czaja 173660be7b
[oneDNN] Cache oneDNN stream not to recreate in each oneDNN op (#30358)
4 years ago
chentianyu03 fb7fbc7a5d
fix abs bug and add abs test case (#30637)
4 years ago
Jacek Czaja dfdb0359ea
- Disabling oneDNN inplace pass (#30588)
4 years ago
wanghuancoder 90773473a0
use nvtx push pop in timeline (#30567)
4 years ago
chentianyu03 358106fcb0
make abs op support complex types (#30375)
4 years ago
Wilber 2d5758c456
update. (#30585)
4 years ago
WangXi 572c466d19
[Prepare for MultiProcess xpu] unified gen nccl id, refine imperative reducer (#30455)
4 years ago
Leo Chen 81217a94d8
unify calling cudaSetDevice (#30470)
4 years ago
liuyuhui 843dc3cdbd
[Kunlun]PR3: add xpu executor, multi xpu card train function optimization (#30317)
4 years ago
QingshuChen 8489d4f76f
optimize batch_norm & pool op for kunlun (#30490)
4 years ago
石晓伟 715d862868
export global google flags to users, test=develop (#30448)
4 years ago
Huihuang Zheng 28e156c27f
Fix Sleep Error in enforce.h (#30335)
4 years ago
QingshuChen 2c1bba02e4
optimize memcpy perf for kunlun (#30291)
4 years ago
Chen Weihang c8c8f205ba
remove c++ stacktrace hint (#30325)
4 years ago
AshburnLee 924aac2216
Add tf32 switch for cuDNN (#29192)
4 years ago
Jacek Czaja 4aba17b5db
[oneDNN] Added UT for testing elementwise_mul caching (#30203)
4 years ago
WeiXin 404c16763a
Add detailed error message for curandStatus_t, cublasStatus_t, cusolverStatus_t (#30161)
4 years ago
Zhou Wei 30888ca343
Polish and Optimize the print/repr information of Layer (#29998)
4 years ago
石晓伟 958612231f
compile the denormal.cc on aarch64, test=develop (#29956)
5 years ago
Huihuang Zheng d038746e1c
Fix Unix Sleep for Wrong Time. test=develop (#29953)
5 years ago
石晓伟 181ea1870b
flush denormals to zero, test=develop (#29924)
5 years ago
liuyuhui 3d1741b794
[Kunlun] bug fix of PR2: Support MultiDevicePass and BKCL in parallel executor (#29926)
5 years ago
Wilber 332da133a1
Support mips arch (#29903)
5 years ago
liuyuhui 4427df37cf
[Kunlun] PR2: Support MultiDevicePass and BKCL in parallel executor (#29574)
5 years ago
LielinJiang 0f4b218640
Enable bilateral_slice unittest on windows platform (#29896)
5 years ago
Chen Weihang 1a304e6c06
[Complex] Add support for complex grad accumulated (#29889)
5 years ago
Wilber 2c0a4a3470
call_statck is turned on default when ON_INFER=ON (#29798)
5 years ago
Jacek Czaja c9e874fc8e
[oneDNN] Unit test for checking oneDNN caching (#29606)
5 years ago
Huihuang Zheng 1cbb282d77
Add Retry Logic to CublasHandlerHolder
5 years ago
Jacek Czaja 07790ba13e
[oneDNN] Reimplemented elementwise_add grad (#29747)
5 years ago
Aurelius84 17c8e3adfe
Polish code in gpu_launch_config.h (#29730)
5 years ago
wanghuancoder 0c59ad2a1a
Windows generate pdb and dump, for debug (#29628)
5 years ago
Huihuang Zheng 4c4d4ba5e0
Modify CublasHandleHolder to Fix Random Unittest Failure. test=develop (#29617)
5 years ago
Jacek Czaja 9eff1a674f
Added missing format of oneDNN (#29670)
5 years ago
liuyuhui f13c3a9cd7
[Kunlun] PR1:Support one Kunlun card training in parallel executor (#29337)
5 years ago
Y_Xuan 76738504ad
添加rocm平台支持代码 (#29342)
5 years ago
AshburnLee efea540ca9
Add tf32 support for A100 tensor core acceleration for cuBLAS (#28732)
5 years ago
arlesniak 62d4483649
Added verbose oneDNN lib version (#29378)
5 years ago
Jacek Czaja f6cca62575
[oneDNN] Making ThreadID info in caching key optional (#29272)
5 years ago
taixiurong 760d015c14
add xpu ops for training transformer in kunlun (#29539)
5 years ago
Huihuang Zheng a1909affc6
Fix Unit Test: Add Sleep Time for CUDA Retry (#29442)
5 years ago
jakpiase 57a4f16d9e
added internal and external reorders to profiler (#29443)
5 years ago
Jack Zhou 1dd7b97b66
fix rnn_op bug in cudnn_version>= 8 (#29406)
5 years ago
chentianyu03 879e913b6d
Make transpose, trace, kron, reshape, sum op support complex type (#29321)
5 years ago
卖鱼的哲学 074065e5de
fix expand/uniform_random && concat/transpose to new api on xpu (#29280)
5 years ago
lilong12 1decf4ada6
update, test=develop (#29331)
5 years ago
Chen Weihang 9ad800ebb2
Support type promote for basic math ops (quantum required) (#29265)
5 years ago
QingshuChen 64f29fbb70
update kunlun conv2d/softmax/elementwise implemetation (#29229)
5 years ago
chentianyu03 8f45d14263
add complex64 and complex128 type; add +-*/@ and slice opreator for c… (#29199)
5 years ago
ShenLiang e2d01eb650
Support dynamic graph distributed (#28997)
5 years ago