Commit Graph

1077 Commits (32211fe9c4c22168dfb73f19763b17ac9191341a)

Author SHA1 Message Date
WangXi b8bce682e0
xpu support fuse allreduce (#31104)
4 years ago
liu zhengxi ae2be49f40
Add cublas_handle() to expose cublas_handle to ops (#31157)
4 years ago
Zhong Hui 16fe11d71e
fix softmax cross entropy integer overflow (#30590)
4 years ago
Qi Li 334296306c
[ROCM] update fluid platform for rocm39 (part4), test=develop (#30936)
4 years ago
Zhou Wei adaec0073d
[2.0Custom OP]Support New Custom OP on Windows (#31063)
4 years ago
joanna.wozna.intel caf9d39839
Add Conv Transpose BF16 (#30877)
4 years ago
wuhuanzhou 9b3c80c8ab
update eigen version on Windows (#30573)
4 years ago
Qi Li 93c1d9e761
[ROCM] update fluid platform for rocm39 (part3), test=develop (#30913)
4 years ago
QingshuChen 15297a065c
fix depends of kunlun bkcl (#30945)
4 years ago
Qi Li 34f1628ce8
[ROCM] update fluid platform for rocm39 (part2), test=develop (#30774)
4 years ago
liuyuhui bef46ccfc8
[Kunlun]fix include files of gen_comm_id_helper.cc (#30917)
4 years ago
Jacek Czaja abfa822650
[oneDNN]Extended adaptive pooling support for oneDNN pool kernel (#30757)
4 years ago
joanna.wozna.intel 73cdea01d4
Add bf16 fast performance verification (#30551)
4 years ago
wanghuancoder 35c5b23f68
use iwyu clean include second time, test=develop (#30829)
4 years ago
石晓伟 2ac4143b6c
support xpu with analysis predictor, test=develop (#30832)
4 years ago
WangXi b1026f64af
【kunlun】dygraph supports multi xpu card training (#30671)
4 years ago
QingshuChen c35a9880f9
fix malloc L3 failed bug for kunlun (#30745)
4 years ago
Qi Li f89da4ab45
[ROCM] update fluid platform for rocm35 (part1), test=develop (#30639)
4 years ago
Jacek Czaja 173660be7b
[oneDNN] Cache oneDNN stream not to recreate in each oneDNN op (#30358)
4 years ago
chentianyu03 fb7fbc7a5d
fix abs bug and add abs test case (#30637)
4 years ago
Jacek Czaja dfdb0359ea
- Disabling oneDNN inplace pass (#30588)
4 years ago
wanghuancoder 90773473a0
use nvtx push pop in timeline (#30567)
4 years ago
chentianyu03 358106fcb0
make abs op support complex types (#30375)
4 years ago
Wilber 2d5758c456
update. (#30585)
4 years ago
WangXi 572c466d19
[Prepare for MultiProcess xpu] unified gen nccl id, refine imperative reducer (#30455)
4 years ago
Leo Chen 81217a94d8
unify calling cudaSetDevice (#30470)
4 years ago
liuyuhui 843dc3cdbd
[Kunlun]PR3: add xpu executor, multi xpu card train function optimization (#30317)
4 years ago
QingshuChen 8489d4f76f
optimize batch_norm & pool op for kunlun (#30490)
4 years ago
石晓伟 715d862868
export global google flags to users, test=develop (#30448)
4 years ago
Huihuang Zheng 28e156c27f
Fix Sleep Error in enforce.h (#30335)
4 years ago
QingshuChen 2c1bba02e4
optimize memcpy perf for kunlun (#30291)
4 years ago
Chen Weihang c8c8f205ba
remove c++ stacktrace hint (#30325)
4 years ago
AshburnLee 924aac2216
Add tf32 switch for cuDNN (#29192)
4 years ago
Jacek Czaja 4aba17b5db
[oneDNN] Added UT for testing elementwise_mul caching (#30203)
4 years ago
WeiXin 404c16763a
Add detailed error message for curandStatus_t, cublasStatus_t, cusolverStatus_t (#30161)
4 years ago
Zhou Wei 30888ca343
Polish and Optimize the print/repr information of Layer (#29998)
4 years ago
石晓伟 958612231f
compile the denormal.cc on aarch64, test=develop (#29956)
4 years ago
Huihuang Zheng d038746e1c
Fix Unix Sleep for Wrong Time. test=develop (#29953)
4 years ago
石晓伟 181ea1870b
flush denormals to zero, test=develop (#29924)
4 years ago
liuyuhui 3d1741b794
[Kunlun] bug fix of PR2: Support MultiDevicePass and BKCL in parallel executor (#29926)
4 years ago
Wilber 332da133a1
Support mips arch (#29903)
4 years ago
liuyuhui 4427df37cf
[Kunlun] PR2: Support MultiDevicePass and BKCL in parallel executor (#29574)
4 years ago
LielinJiang 0f4b218640
Enable bilateral_slice unittest on windows platform (#29896)
4 years ago
Chen Weihang 1a304e6c06
[Complex] Add support for complex grad accumulated (#29889)
4 years ago
Wilber 2c0a4a3470
call_statck is turned on default when ON_INFER=ON (#29798)
4 years ago
Jacek Czaja c9e874fc8e
[oneDNN] Unit test for checking oneDNN caching (#29606)
4 years ago
Huihuang Zheng 1cbb282d77
Add Retry Logic to CublasHandlerHolder
4 years ago
Jacek Czaja 07790ba13e
[oneDNN] Reimplemented elementwise_add grad (#29747)
4 years ago
Aurelius84 17c8e3adfe
Polish code in gpu_launch_config.h (#29730)
4 years ago
wanghuancoder 0c59ad2a1a
Windows generate pdb and dump, for debug (#29628)
4 years ago