Commit Graph

133 Commits (32211fe9c4c22168dfb73f19763b17ac9191341a)

Author SHA1 Message Date
liu zhengxi ae2be49f40
Add cublas_handle() to expose cublas_handle to ops (#31157)
4 years ago
wuhuanzhou 9b3c80c8ab
update eigen version on Windows (#30573)
4 years ago
Qi Li 93c1d9e761
[ROCM] update fluid platform for rocm39 (part3), test=develop (#30913)
4 years ago
wanghuancoder 35c5b23f68
use iwyu clean include second time, test=develop (#30829)
4 years ago
WangXi b1026f64af
【kunlun】dygraph supports multi xpu card training (#30671)
4 years ago
QingshuChen c35a9880f9
fix malloc L3 failed bug for kunlun (#30745)
4 years ago
Jacek Czaja 173660be7b
[oneDNN] Cache oneDNN stream not to recreate in each oneDNN op (#30358)
4 years ago
liuyuhui 843dc3cdbd
[Kunlun]PR3: add xpu executor, multi xpu card train function optimization (#30317)
4 years ago
QingshuChen 8489d4f76f
optimize batch_norm & pool op for kunlun (#30490)
4 years ago
QingshuChen 2c1bba02e4
optimize memcpy perf for kunlun (#30291)
4 years ago
AshburnLee 924aac2216
Add tf32 switch for cuDNN (#29192)
4 years ago
Jacek Czaja c9e874fc8e
[oneDNN] Unit test for checking oneDNN caching (#29606)
4 years ago
liuyuhui f13c3a9cd7
[Kunlun] PR1:Support one Kunlun card training in parallel executor (#29337)
4 years ago
AshburnLee efea540ca9
Add tf32 support for A100 tensor core acceleration for cuBLAS (#28732)
4 years ago
arlesniak 62d4483649
Added verbose oneDNN lib version (#29378)
4 years ago
Aurelius84 7ae3cb554a
Polish CUDA Information stdout (#29109)
4 years ago
wawltor b2c8a00745
remove eigen threadpool for the speed up
4 years ago
123malin cc780b1977
test=develop, optimize geo communicator (#26857)
4 years ago
Jack Zhou 63203c4abc
enhance reduce op which can reduce tensor with arbitrary rank
4 years ago
Adam f3909020de
Add mechanism for blocking oneDNN cache clearing (#26502)
5 years ago
QingshuChen 138ecf24aa
support Baidu Kunlun AI Accelerator (#25959)
5 years ago
GaoWei8 c10dcff12d
refine PADDLE_ENFORCE (#25456)
5 years ago
GaoWei8 ea7e532598
Refine PADDLE_ENFORCE (#25369)
5 years ago
Chen Weihang d1062d5278
Replace all errors thrown by LOG(FATAL) with PADDLE_THROW (#24759)
5 years ago
pawelpiotrowicz db2b6b6568
Hide globals & redesign restore PR (#24279)
5 years ago
Chen Weihang aa0f254fbe
Add macro BOOST_GET to enrich the error information of boost :: get (#24175)
5 years ago
Sylwester Fraczek e1a7a88057
added reshape transpose matmul fuse pass (#23754)
5 years ago
Guo Sheng a8c0fb4e86
Add cholesky_op (#23543)
5 years ago
石晓伟 34d7d6aef0
declare the stream::Priority as enum class, test=develop (#24013)
5 years ago
Zhang Ting b89dd86fb6
Update eigen (#23203)
5 years ago
石晓伟 2d01cc85c4
DeviceContext Split, test=develop (#23737)
5 years ago
石晓伟 5c59d2139e
reverts the commit 23177, test=develop (#23363)
5 years ago
Yi Liu 0471476a18
fix nccl comm double free bug (#23344)
5 years ago
石晓伟 75ebb48a91
supports thread-binding stream, test=develop (#23177)
5 years ago
Wilber 7bc4b09500
add WITH_NCCL option for cmake. (#22384)
5 years ago
zhaoyuchen2018 3d4f2aa689
Refine stack op to improve xlnet performance, test=develop (#22142)
5 years ago
Adam e81f0228df MKL-DNN 1.0 Update (#20162)
5 years ago
Zeng Jinle 97e76cb96d
refine dev_ctx.Wait() exception throw, test=develop (#21600)
5 years ago
Jacek Czaja cd43c4440e [MKL-DNN] LRN and Pool2d (FWD) NHWC support (#21375)
5 years ago
liuwei1031 d8b6cf2bcd
fix sporadically hang issue on windows(#21201)
5 years ago
zhaoyuchen2018 b93870e696
Improve topk performance. (#21087)
5 years ago
Zeng Jinle 37f76407b0
fix cuda dev_ctx allocator cmake deps, test=develop (#19953)
5 years ago
Zeng Jinle c7f36e7c00
Add lock to cudnn handle calls (#19845)
5 years ago
Huihuang Zheng 12542320c5
Replace TemporaryAllocator by CUDADeviceContextAllocator (#18989)
6 years ago
gongweibao 29d8781240
Polish fleet API to support cuda collective mode and nccl2 mode. (#18966)
6 years ago
Tao Luo 076f833110
add config.SetMkldnnCacheCapacity api for mkldnn cache clear strategy (#18580)
6 years ago
Tao Luo fe32879d2a
add mkldnn shapeblob cache clear strategy (#18513)
6 years ago
Tao Luo 3f3112ceb0
add shape_blob for cache mkldnn primitive (#18454)
6 years ago
Leo Zhao 8f5fffca0a rename mkldnn set/get_cur_thread_id() to set/get_cur_mkldnn_session_id() (#18453)
6 years ago
Michał Gallus 8409693272 Reset DeviceContext after quantization warmup (#18182)
6 years ago