Commit Graph

10922 Commits (79fa8fb0df524cc5efbe5cd7a91acac7b721e5cf)

Author SHA1 Message Date
Adam Osewski 092a2b1413
More UT for LayerNormFuse pass (#30891)
4 years ago
Jacek Czaja abfa822650
[oneDNN]Extended adaptive pooling support for oneDNN pool kernel (#30757)
4 years ago
joanna.wozna.intel 73cdea01d4
Add bf16 fast performance verification (#30551)
4 years ago
Shang Zhizhou e6095bc2ce
fix split trt plugin initialize (#30875)
4 years ago
WangXi 6e3856d3fb
fix xpu dygraph place (#30868)
4 years ago
wanghuancoder 35c5b23f68
use iwyu clean include second time, test=develop (#30829)
4 years ago
cucuzg ac2e2e6b7f
add clip_by_norm on kunlun, *test=kunlun (#30862)
4 years ago
wawltor b7560a59ab
fix the broadcast for the large second input (#30818)
4 years ago
JamesLim 6e1e036a75
Implement cuda kernel for index_sample. (#30380)
4 years ago
AshburnLee 666efc2336
Call new cudnn batch norm API regardless of data type and data layout (#30157)
4 years ago
QingshuChen 5c8455d6ea
try again if kunlun memory malloc failed (#30855)
4 years ago
石晓伟 2ac4143b6c
support xpu with analysis predictor, test=develop (#30832)
4 years ago
liuyuhui 2cb55eff57
fix WITH_XPU_BKCL in CMakeLists.txt (#30854)
4 years ago
Adam Osewski 4f066e316e
Layer normalization fuse pass. (#30721)
4 years ago
WangXi b1026f64af
【kunlun】dygraph supports multi xpu card training (#30671)
4 years ago
joanna.wozna.intel 04532b8a83
Update Xbyak to v5.81 (#30809)
4 years ago
Shang Zhizhou b909450994
fix trt plugin clone and initialize bugs in TRT7.1+ (#30709)
4 years ago
Wilber b08ae368bb
ci compilation depends on a stable release (#30755)
4 years ago
Thunderbrook cb66c53c2d
dump to cpu (#30750)
4 years ago
Chengmo d3fac0ea85
fix int64 bug (#30780)
4 years ago
Qi Li 69875dc42c
[ROCM] update fluid memory for rocm35 (part1), test=develop (#30758)
4 years ago
QingshuChen c35a9880f9
fix malloc L3 failed bug for kunlun (#30745)
4 years ago
WangXi 31ed9c9eed
Fleet distributed strategy support pure fp16 (#30754)
4 years ago
Zhen Wang 53d01afed6
Fix the nan bug when passing all zero values into clip_by_norm_op. (#30777)
4 years ago
ShenLiang 3858f458ea
rm Singleton of reducer (#30775)
4 years ago
Qi Li f89da4ab45
[ROCM] update fluid platform for rocm35 (part1), test=develop (#30639)
4 years ago
Wojciech Uss fc00240575
A fix for oneDNN matmul kernel. Fixes issue #30309 (#30723)
4 years ago
lidanqing 46989e889b
Fix python3 incompatibility issues (#30698)
4 years ago
alncat 5b59499e57
fixed compilation error on gcc 4.8.x due to the usage of isfinite (#30733)
4 years ago
Chengmo 78d37c3f75
【Paddle.Fleet】Fix brpc get hostname (#30703)
4 years ago
taixiurong caf3680bbc
fix bugs in transformer predict in xpu place (#30730)
4 years ago
jakpiase f8da5536ed
REUPLOAD Added vanilla LSTM and LSTM with peepholes oneDNN fp32 kernel (#30719)
4 years ago
liuyuhui 67abfc1588
[Kunlun] fix dead lock for exec_op_count_ (#30718)
4 years ago
alncat 5ace20fc3f
modified conv+bn fuse pass to fix wrong mask in mask rcnn (#30704)
4 years ago
Tao Luo 824a79d383
Revert "Added vanilla LSTM and LSTM with peepholes oneDNN fp32 kernel (#30661)" (#30708)
4 years ago
lilong12 7fbc68a2c0
update, test=develop (#30692)
4 years ago
jakpiase d834f4e6e8
Added vanilla LSTM and LSTM with peepholes oneDNN fp32 kernel (#30661)
4 years ago
arlesniak 5bf25d1e8b
More precise mkldnn kernel rules in GetExpectedKernelType (#29840)
4 years ago
Jacek Czaja 173660be7b
[oneDNN] Cache oneDNN stream not to recreate in each oneDNN op (#30358)
4 years ago
Shang Zhizhou ae0f88a988
add DLA support:C++&&Python api (#30165)
4 years ago
chentianyu03 fb7fbc7a5d
fix abs bug and add abs test case (#30637)
4 years ago
ShenLiang 9514b4aa5f
Fix scatter grad bug (#30604)
4 years ago
Pei Yang cf9bdb9404
extend trt ut timeout threshold (#30537)
4 years ago
Thunderbrook 1bebc09253
solve build gpu task core (#30626)
4 years ago
石晓伟 33bf6eb753
revert external gflags, test=develop (#30623)
4 years ago
Jacek Czaja dfdb0359ea
- Disabling oneDNN inplace pass (#30588)
4 years ago
TTerror 10271ddfc4
support reduce_max op on kunlun (#30581)
4 years ago
QingshuChen 5013c67644
fix softmax bug for multi_card in kunlun (#30600)
4 years ago
wuhuanzhou 7e671c07b6
optimize unity build (#30195)
4 years ago
liuyuhui e5b0d9e1fc
[Kunlun] Add condition_variable and notify() in BindThreadedSSAGraphExecutor (#30586)
4 years ago