Commit Graph

889 Commits (a32e8bf1e7fb45b9bae85e80fe7742eae8739fac)

Author SHA1 Message Date
Kaipeng Deng a32e8bf1e7
DataLoader supprot dict str (#31481)
4 years ago
liuyuhui 9ebf05b003
[Kunlun]Multi xpu dygraph performance optimization , add distributed.spawn support for multi xpu and some bug-fixes (#31130)
4 years ago
Qi Li 4d647ec137
[ROCM] update fluid platform for rocm (part5), test=develop (#31315)
4 years ago
wuhuanzhou 4d6d2db812
Windows system supports Ninja compilation (#31161)
4 years ago
Qi Li 28b356b9a2
[ROCM] update fluid framework for rocm (part6), test=develop (#31015)
4 years ago
Thunderbrook 565354f676
support save multi sparse table in one path (#31108)
4 years ago
123malin 16b4260b2f
test=develop, save/load, shrink (#30625)
4 years ago
liym27 5b367dab44
[static setitem] Support the index is Tensor; step>1; step<0 .(#30949)
4 years ago
Wilber 0020d91506
fix python pass builder error. (#30946)
4 years ago
Chen Weihang f649442ddd
New custom operator extension mechanism (#30690)
4 years ago
liym27 39f41cb47f
Performance optimization for dynamic setitem: Call op set_value to speed up because the original call to TensorToPyArray will introduce unnecessary data copy. (#30817)
4 years ago
joanna.wozna.intel 73cdea01d4
Add bf16 fast performance verification (#30551)
4 years ago
WangXi 6e3856d3fb
fix xpu dygraph place (#30868)
4 years ago
石晓伟 2ac4143b6c
support xpu with analysis predictor, test=develop (#30832)
4 years ago
WangXi b1026f64af
【kunlun】dygraph supports multi xpu card training (#30671)
4 years ago
Thunderbrook cb66c53c2d
dump to cpu (#30750)
4 years ago
ShenLiang 3858f458ea
rm Singleton of reducer (#30775)
4 years ago
Shang Zhizhou ae0f88a988
add DLA support:C++&&Python api (#30165)
4 years ago
Thunderbrook 1bebc09253
solve build gpu task core (#30626)
4 years ago
wanghuancoder 90773473a0
use nvtx push pop in timeline (#30567)
4 years ago
wanghuancoder d1b25ed9d7
add some RecordEvent, for dygraph timeline (#30299)
4 years ago
Leo Chen 7043b8cfc6
support layer_norm fp16 in dygraph amp (#30430)
4 years ago
wanghuancoder 59ad6ff3e3
delete empty line of pybing.cc, test=develop (#30529)
4 years ago
hutuxian e207fe6385
Ascend Framework Part2: pybind files (#30410)
4 years ago
wanghuancoder bd97192274
if pybind.cc changed, generate total report, test=develop (#30514)
4 years ago
guofei 11e78ebaa3
Modify the calculation logic of LambOptimizer (#29313)
4 years ago
pangyoki 13d757362c
Add Inplace strategy (Output reuse Input Varbase) in dygraph (#30103)
4 years ago
yaoxuefeng 6e0da01c61
Heter ps new (#30198)
4 years ago
Leo Chen 3d015f1cf5
Set expected place in child thread for dataloader to avoid costing cuda memory on other card (#30338)
4 years ago
ShenLiang a60f17b89d
Support unused parameters in dynamic graph distributed (#30224)
4 years ago
tangwei12 25f80fd304
Fix/distributed proto (#29981)
4 years ago
Chengmo d479ae1725
【Paddle.Fleet】Support local save sparse param (#30175)
4 years ago
AshburnLee 924aac2216
Add tf32 switch for cuDNN (#29192)
4 years ago
pangyoki da16b33f2e
add View(reuse allocation) strategy on squeeze, unsqueeze, reshape, flatten op (#29913)
4 years ago
Leo Chen 8696335f86
Fix dtype of ungenerated grad var (#28511)
4 years ago
Leo Chen 1f97d61c68
Add callback after TensorCopy (#30123)
4 years ago
Chengmo 528e03fc08
【Paddle.Fleet】Fix tensor table (#30075)
4 years ago
123malin 198fbdfb60
Add Lookahead and ModelAverage Optimizer (#30004)
4 years ago
Leo Chen adac38c506
add dispenable input for core.ops.reshape2/expand/slice (#30072)
4 years ago
liym27 9922bd4125
Fix bug: In dynamic mode, if start or end is negetive, __getitem__ return wrong result(#30003)
4 years ago
Thunderbrook 0b8e1fadc5
add topo-aware in heter-ps (#30087)
4 years ago
cc 68398abce9
[Inference] zero_copy_tensor supports int8_t (#30053)
4 years ago
liym27 9602a182b2
[Dynamic Inplace] Support ShareInplaceVersionCounterWith for C++ Tensor (#29842)
5 years ago
liuyuhui 4427df37cf
[Kunlun] PR2: Support MultiDevicePass and BKCL in parallel executor (#29574)
5 years ago
tangwei12 032414ca2a
[Feature] one ps (3/4) (#29604)
5 years ago
Thunderbrook 09b6e71928
heter box (#29734)
5 years ago
ShenLiang 01e2874a0e
Support multi-stream communication for dynamic graph distributed (#29525)
5 years ago
liuyuhui f13c3a9cd7
[Kunlun] PR1:Support one Kunlun card training in parallel executor (#29337)
5 years ago
Y_Xuan 76738504ad
添加rocm平台支持代码 (#29342)
5 years ago
AshburnLee efea540ca9
Add tf32 support for A100 tensor core acceleration for cuBLAS (#28732)
5 years ago