Commit Graph

29896 Commits (2e9323389988150e4cda62b7d372989248683ca6)
 

Author SHA1 Message Date
tianshuo78520a 2e93233899
Add WITH_XPU_BKCL in Kunlun-CI (#30919)
4 years ago
wanghuancoder 823f499a8a
fix a bug of Sequential::__getitem__ (#30899)
4 years ago
Qi Li 34f1628ce8
[ROCM] update fluid platform for rocm39 (part2), test=develop (#30774)
4 years ago
wanghuancoder 5ded39f226
fix cpplint cfg, test=develop (#30924)
4 years ago
Jacek Czaja 9e527d9956
[oneDNN] Added basic changes for elementwise_add_grad bf16 (#30925)
4 years ago
Chengmo c98f144fbc
add truncated gaussian random (#30922)
4 years ago
liuyuhui 4a8b8b4547
[Kunlun] add gen_bkcl_id_op, support multi XPU cards training using multiprocess (#30858)
4 years ago
liym27 39f41cb47f
Performance optimization for dynamic setitem: Call op set_value to speed up because the original call to TensorToPyArray will introduce unnecessary data copy. (#30817)
4 years ago
liuyuhui bef46ccfc8
[Kunlun]fix include files of gen_comm_id_helper.cc (#30917)
4 years ago
wanghuancoder 90d92111cf
let LayerList could add [None], test=develop (#30911)
4 years ago
wanghuancoder aab3a3012e
add include for heterbox_trainer.cc, develop=test (#30910)
4 years ago
taixiurong 24873f4f77
dyngraph (#30892)
4 years ago
Zhen Wang 71acde9afc
Use correct master weights in AdamW. (#30895)
4 years ago
LielinJiang 79fa8fb0df
rm test_datasets from file parallel_UT_relu.py (#30907)
4 years ago
Adam Osewski 092a2b1413
More UT for LayerNormFuse pass (#30891)
4 years ago
tianshuo78520a a80fe67f84
Change cmake/third_party files for CI (#30833)
4 years ago
Jacek Czaja abfa822650
[oneDNN]Extended adaptive pooling support for oneDNN pool kernel (#30757)
4 years ago
joanna.wozna.intel 73cdea01d4
Add bf16 fast performance verification (#30551)
4 years ago
Shang Zhizhou e6095bc2ce
fix split trt plugin initialize (#30875)
4 years ago
WangXi 6e3856d3fb
fix xpu dygraph place (#30868)
4 years ago
wanghuancoder 35c5b23f68
use iwyu clean include second time, test=develop (#30829)
4 years ago
Zhang Ting e97905c5fa
improve performance of momentum (#30881)
4 years ago
GT-Zhang 4b2d52a001
Update README.md (#30873)
4 years ago
fluffyrita 635e168c22
Update README_cn.md (#30867)
4 years ago
cucuzg ac2e2e6b7f
add clip_by_norm on kunlun, *test=kunlun (#30862)
4 years ago
Kaipeng Deng 302427170f
remove numpy array check in single-process dataloader. test=develop (#30861)
4 years ago
wawltor b7560a59ab
fix the broadcast for the large second input (#30818)
4 years ago
JamesLim 6e1e036a75
Implement cuda kernel for index_sample. (#30380)
4 years ago
AshburnLee 666efc2336
Call new cudnn batch norm API regardless of data type and data layout (#30157)
4 years ago
QingshuChen 5c8455d6ea
try again if kunlun memory malloc failed (#30855)
4 years ago
石晓伟 2ac4143b6c
support xpu with analysis predictor, test=develop (#30832)
4 years ago
joejiong 05d2b7a37f
Update paddle.static.Print with paddle2.0 api (#30846)
4 years ago
Aurelius84 e49d0746dd
[CustomOp] Support install as Package and Add load interface (#30798)
4 years ago
liuyuhui 2cb55eff57
fix WITH_XPU_BKCL in CMakeLists.txt (#30854)
4 years ago
Adam Osewski 4f066e316e
Layer normalization fuse pass. (#30721)
4 years ago
WangXi b1026f64af
【kunlun】dygraph supports multi xpu card training (#30671)
4 years ago
LielinJiang 3a3ff75c52
Fix unittest random failed of test_datasets (#30804)
4 years ago
joanna.wozna.intel 04532b8a83
Update Xbyak to v5.81 (#30809)
4 years ago
Shang Zhizhou b909450994
fix trt plugin clone and initialize bugs in TRT7.1+ (#30709)
4 years ago
Wilber b08ae368bb
ci compilation depends on a stable release (#30755)
4 years ago
Shang Zhizhou 200ee33df8
fix unittest random error (#30808)
4 years ago
xiemoyuan db87087283
Optimize the encoder of Transformer. (#30439)
4 years ago
Thunderbrook cb66c53c2d
dump to cpu (#30750)
4 years ago
Chengmo d3fac0ea85
fix int64 bug (#30780)
4 years ago
Qi Li 69875dc42c
[ROCM] update fluid memory for rocm35 (part1), test=develop (#30758)
4 years ago
tianshuo78520a 5b1ab51ca4
Change PR-CI-PY3 cc version (#30771)
4 years ago
QingshuChen c35a9880f9
fix malloc L3 failed bug for kunlun (#30745)
4 years ago
WangXi 31ed9c9eed
Fleet distributed strategy support pure fp16 (#30754)
4 years ago
Zhen Wang 53d01afed6
Fix the nan bug when passing all zero values into clip_by_norm_op. (#30777)
4 years ago
ShenLiang 3858f458ea
rm Singleton of reducer (#30775)
4 years ago