Commit Graph

103 Commits (17030ff28b9a54bb57779e9b8448a6d222110ec5)

Author SHA1 Message Date
Qi Li 4d647ec137
[ROCM] update fluid platform for rocm (part5), test=develop (#31315)
4 years ago
liu zhengxi ae2be49f40
Add cublas_handle() to expose cublas_handle to ops (#31157)
4 years ago
Qi Li 93c1d9e761
[ROCM] update fluid platform for rocm39 (part3), test=develop (#30913)
4 years ago
Jacek Czaja 173660be7b
[oneDNN] Cache oneDNN stream not to recreate in each oneDNN op (#30358)
4 years ago
AshburnLee 924aac2216
Add tf32 switch for cuDNN (#29192)
4 years ago
liuyuhui 3d1741b794
[Kunlun] bug fix of PR2: Support MultiDevicePass and BKCL in parallel executor (#29926)
4 years ago
liuyuhui 4427df37cf
[Kunlun] PR2: Support MultiDevicePass and BKCL in parallel executor (#29574)
4 years ago
Jacek Czaja c9e874fc8e
[oneDNN] Unit test for checking oneDNN caching (#29606)
4 years ago
AshburnLee efea540ca9
Add tf32 support for A100 tensor core acceleration for cuBLAS (#28732)
4 years ago
arlesniak 62d4483649
Added verbose oneDNN lib version (#29378)
4 years ago
Jacek Czaja f6cca62575
[oneDNN] Making ThreadID info in caching key optional (#29272)
4 years ago
wawltor b2c8a00745
remove eigen threadpool for the speed up
4 years ago
Jacek Czaja bd1d6d3b30
extends oneDNN caching keys so caching objects are unique to executor/predictor (#28758)
4 years ago
Huihuang Zheng acc11c2a62
Retry CUDA Initialization to Fix Random Failure, test=develop (#28323)
4 years ago
wanghuancoder df43905f12
use iwyu clean include (#27267)
4 years ago
Jack Zhou 63203c4abc
enhance reduce op which can reduce tensor with arbitrary rank
4 years ago
Adam f3909020de
Add mechanism for blocking oneDNN cache clearing (#26502)
5 years ago
QingshuChen 138ecf24aa
support Baidu Kunlun AI Accelerator (#25959)
5 years ago
GaoWei8 fb70682f00
fix PADDLE_ENFORCE (#25297)
5 years ago
pawelpiotrowicz db2b6b6568
Hide globals & redesign restore PR (#24279)
5 years ago
Guo Sheng a8c0fb4e86
Add cholesky_op (#23543)
5 years ago
石晓伟 34d7d6aef0
declare the stream::Priority as enum class, test=develop (#24013)
5 years ago
Zhou Wei 7817003795
Optimize the error messages of paddle CUDA API (#23816)
5 years ago
石晓伟 2d01cc85c4
DeviceContext Split, test=develop (#23737)
5 years ago
石晓伟 5c59d2139e
reverts the commit 23177, test=develop (#23363)
5 years ago
石晓伟 75ebb48a91
supports thread-binding stream, test=develop (#23177)
5 years ago
Wilber 7bc4b09500
add WITH_NCCL option for cmake. (#22384)
5 years ago
zhaoyuchen2018 3d4f2aa689
Refine stack op to improve xlnet performance, test=develop (#22142)
5 years ago
Jacek Czaja cd43c4440e [MKL-DNN] LRN and Pool2d (FWD) NHWC support (#21375)
5 years ago
Zeng Jinle cdb3d27985
Fix warn of gcc8 (#21205)
5 years ago
zhaoyuchen2018 b93870e696
Improve topk performance. (#21087)
5 years ago
qingqing01 1a3eef026c
Enable users to create custom cpp op outside framework. (#19256)
5 years ago
Zeng Jinle 37f76407b0
fix cuda dev_ctx allocator cmake deps, test=develop (#19953)
5 years ago
Zeng Jinle c7f36e7c00
Add lock to cudnn handle calls (#19845)
5 years ago
Zeng Jinle 5eb381a3e2
refine reallocate of workspace size, test=develop (#19843)
5 years ago
Huihuang Zheng 12542320c5
Replace TemporaryAllocator by CUDADeviceContextAllocator (#18989)
6 years ago
Tao Luo 75d1571995
refine PADDLE_ENFORCE codes for unify PADDLE_ASSERT_MSG (#19603)
6 years ago
Tao Luo fe32879d2a
add mkldnn shapeblob cache clear strategy (#18513)
6 years ago
Tao Luo 3f3112ceb0
add shape_blob for cache mkldnn primitive (#18454)
6 years ago
Leo Zhao 8f5fffca0a rename mkldnn set/get_cur_thread_id() to set/get_cur_mkldnn_session_id() (#18453)
6 years ago
Michał Gallus 8409693272 Reset DeviceContext after quantization warmup (#18182)
6 years ago
Huihuang Zheng b9494058b3
Use CudnnWorkspaceHandle in exhaustive search (#17082)
6 years ago
Zeng Jinle 1202d3fc74
Refine model gpu memory (#16993)
6 years ago
nhzlx a1d11bb175 fix ci bug: cudnn handler in multi card
6 years ago
nhzlx 3df7b98a0f Merge branch 'develop' of https://github.com/paddlepaddle/paddle into HEAD
6 years ago
Wu Yi b7baeed7bb fix win gpu build test=develop (#16334)
6 years ago
nhzlx 07dcf2856c git cherry-pick from feature/anakin-engine: update anakin subgraph #16278
6 years ago
Wu Yi 6382b62f6b
Collective ops (#15572)
6 years ago
qingqing01 86e912c544 Fix windows compiling (#16230)
6 years ago
qingqing01 8ad672a287
Support sync batch norm. (#16121)
6 years ago