Commit Graph

1088 Commits (2fbe9b097a41bff2b8c73296bf52e387ec88842a)

Author SHA1 Message Date
Chen Weihang 2fbe9b097a
[CustomOp] Remove Eigen dependencies of float16 (#31669)
4 years ago
Chen Weihang 027b574a0e
[CustomOp] Remove the dependence of the underlying data types on eigen (#31602)
4 years ago
WangXi 9066b74f58
c_gen_nccl_id add SocketServer to persit server (#31589)
4 years ago
Qi Li 3d5aa9d10a
[ROCM] fix conv2d and conv3d op, test=develop (#31553)
4 years ago
WangXi 83a2fb1f08
Add collective async wait op (#31463)
4 years ago
ronnywang e03e46730c
[ROCM] fix gather_op, sigmoid_cross_entropy_with_logits_op, test=develop (#31467)
4 years ago
Qi Li b85c8e03be
[ROCM] fix reduce op, test=develop (#31478)
4 years ago
Jacek Czaja 39a5424ed1
[oneDNN] elementwise add bf16 grad kernel with broadcasting (#31385)
4 years ago
Qi Li f9377965c4
[ROCM] fix dropout and remove hipcub, test=develop (#31455)
4 years ago
Qi Li 4d647ec137
[ROCM] update fluid platform for rocm (part5), test=develop (#31315)
4 years ago
Qi Li db50fb6766
[ROCM] fix softmax with loss and update python scripts, test=develop (#31373)
4 years ago
WangXi b8bce682e0
xpu support fuse allreduce (#31104)
4 years ago
liu zhengxi ae2be49f40
Add cublas_handle() to expose cublas_handle to ops (#31157)
4 years ago
Zhong Hui 16fe11d71e
fix softmax cross entropy integer overflow (#30590)
4 years ago
Qi Li 334296306c
[ROCM] update fluid platform for rocm39 (part4), test=develop (#30936)
4 years ago
Zhou Wei adaec0073d
[2.0Custom OP]Support New Custom OP on Windows (#31063)
4 years ago
joanna.wozna.intel caf9d39839
Add Conv Transpose BF16 (#30877)
4 years ago
wuhuanzhou 9b3c80c8ab
update eigen version on Windows (#30573)
4 years ago
Qi Li 93c1d9e761
[ROCM] update fluid platform for rocm39 (part3), test=develop (#30913)
4 years ago
QingshuChen 15297a065c
fix depends of kunlun bkcl (#30945)
4 years ago
Qi Li 34f1628ce8
[ROCM] update fluid platform for rocm39 (part2), test=develop (#30774)
4 years ago
liuyuhui bef46ccfc8
[Kunlun]fix include files of gen_comm_id_helper.cc (#30917)
4 years ago
Jacek Czaja abfa822650
[oneDNN]Extended adaptive pooling support for oneDNN pool kernel (#30757)
4 years ago
joanna.wozna.intel 73cdea01d4
Add bf16 fast performance verification (#30551)
4 years ago
wanghuancoder 35c5b23f68
use iwyu clean include second time, test=develop (#30829)
4 years ago
石晓伟 2ac4143b6c
support xpu with analysis predictor, test=develop (#30832)
4 years ago
WangXi b1026f64af
【kunlun】dygraph supports multi xpu card training (#30671)
4 years ago
QingshuChen c35a9880f9
fix malloc L3 failed bug for kunlun (#30745)
4 years ago
Qi Li f89da4ab45
[ROCM] update fluid platform for rocm35 (part1), test=develop (#30639)
4 years ago
Jacek Czaja 173660be7b
[oneDNN] Cache oneDNN stream not to recreate in each oneDNN op (#30358)
4 years ago
chentianyu03 fb7fbc7a5d
fix abs bug and add abs test case (#30637)
4 years ago
Jacek Czaja dfdb0359ea
- Disabling oneDNN inplace pass (#30588)
4 years ago
wanghuancoder 90773473a0
use nvtx push pop in timeline (#30567)
4 years ago
chentianyu03 358106fcb0
make abs op support complex types (#30375)
4 years ago
Wilber 2d5758c456
update. (#30585)
4 years ago
WangXi 572c466d19
[Prepare for MultiProcess xpu] unified gen nccl id, refine imperative reducer (#30455)
4 years ago
Leo Chen 81217a94d8
unify calling cudaSetDevice (#30470)
4 years ago
liuyuhui 843dc3cdbd
[Kunlun]PR3: add xpu executor, multi xpu card train function optimization (#30317)
4 years ago
QingshuChen 8489d4f76f
optimize batch_norm & pool op for kunlun (#30490)
4 years ago
石晓伟 715d862868
export global google flags to users, test=develop (#30448)
4 years ago
Huihuang Zheng 28e156c27f
Fix Sleep Error in enforce.h (#30335)
4 years ago
QingshuChen 2c1bba02e4
optimize memcpy perf for kunlun (#30291)
4 years ago
Chen Weihang c8c8f205ba
remove c++ stacktrace hint (#30325)
4 years ago
AshburnLee 924aac2216
Add tf32 switch for cuDNN (#29192)
4 years ago
Jacek Czaja 4aba17b5db
[oneDNN] Added UT for testing elementwise_mul caching (#30203)
4 years ago
WeiXin 404c16763a
Add detailed error message for curandStatus_t, cublasStatus_t, cusolverStatus_t (#30161)
4 years ago
Zhou Wei 30888ca343
Polish and Optimize the print/repr information of Layer (#29998)
4 years ago
石晓伟 958612231f
compile the denormal.cc on aarch64, test=develop (#29956)
5 years ago
Huihuang Zheng d038746e1c
Fix Unix Sleep for Wrong Time. test=develop (#29953)
5 years ago
石晓伟 181ea1870b
flush denormals to zero, test=develop (#29924)
5 years ago