Commit Graph

879 Commits (010f2caa23d92337bf055a54e00ad088ed360f01)

Author SHA1 Message Date
liym27 39f41cb47f
Performance optimization for dynamic setitem: Call op set_value to speed up because the original call to TensorToPyArray will introduce unnecessary data copy. (#30817)
4 years ago
joanna.wozna.intel 73cdea01d4
Add bf16 fast performance verification (#30551)
4 years ago
WangXi 6e3856d3fb
fix xpu dygraph place (#30868)
4 years ago
石晓伟 2ac4143b6c
support xpu with analysis predictor, test=develop (#30832)
4 years ago
WangXi b1026f64af
【kunlun】dygraph supports multi xpu card training (#30671)
4 years ago
Thunderbrook cb66c53c2d
dump to cpu (#30750)
4 years ago
ShenLiang 3858f458ea
rm Singleton of reducer (#30775)
4 years ago
Shang Zhizhou ae0f88a988
add DLA support:C++&&Python api (#30165)
4 years ago
Thunderbrook 1bebc09253
solve build gpu task core (#30626)
4 years ago
wanghuancoder 90773473a0
use nvtx push pop in timeline (#30567)
4 years ago
wanghuancoder d1b25ed9d7
add some RecordEvent, for dygraph timeline (#30299)
4 years ago
Leo Chen 7043b8cfc6
support layer_norm fp16 in dygraph amp (#30430)
4 years ago
wanghuancoder 59ad6ff3e3
delete empty line of pybing.cc, test=develop (#30529)
4 years ago
hutuxian e207fe6385
Ascend Framework Part2: pybind files (#30410)
4 years ago
wanghuancoder bd97192274
if pybind.cc changed, generate total report, test=develop (#30514)
4 years ago
guofei 11e78ebaa3
Modify the calculation logic of LambOptimizer (#29313)
4 years ago
pangyoki 13d757362c
Add Inplace strategy (Output reuse Input Varbase) in dygraph (#30103)
4 years ago
yaoxuefeng 6e0da01c61
Heter ps new (#30198)
4 years ago
Leo Chen 3d015f1cf5
Set expected place in child thread for dataloader to avoid costing cuda memory on other card (#30338)
4 years ago
ShenLiang a60f17b89d
Support unused parameters in dynamic graph distributed (#30224)
4 years ago
tangwei12 25f80fd304
Fix/distributed proto (#29981)
4 years ago
Chengmo d479ae1725
【Paddle.Fleet】Support local save sparse param (#30175)
4 years ago
AshburnLee 924aac2216
Add tf32 switch for cuDNN (#29192)
4 years ago
pangyoki da16b33f2e
add View(reuse allocation) strategy on squeeze, unsqueeze, reshape, flatten op (#29913)
4 years ago
Leo Chen 8696335f86
Fix dtype of ungenerated grad var (#28511)
4 years ago
Leo Chen 1f97d61c68
Add callback after TensorCopy (#30123)
4 years ago
Chengmo 528e03fc08
【Paddle.Fleet】Fix tensor table (#30075)
4 years ago
123malin 198fbdfb60
Add Lookahead and ModelAverage Optimizer (#30004)
4 years ago
Leo Chen adac38c506
add dispenable input for core.ops.reshape2/expand/slice (#30072)
4 years ago
liym27 9922bd4125
Fix bug: In dynamic mode, if start or end is negetive, __getitem__ return wrong result(#30003)
4 years ago
Thunderbrook 0b8e1fadc5
add topo-aware in heter-ps (#30087)
4 years ago
cc 68398abce9
[Inference] zero_copy_tensor supports int8_t (#30053)
4 years ago
liym27 9602a182b2
[Dynamic Inplace] Support ShareInplaceVersionCounterWith for C++ Tensor (#29842)
5 years ago
liuyuhui 4427df37cf
[Kunlun] PR2: Support MultiDevicePass and BKCL in parallel executor (#29574)
5 years ago
tangwei12 032414ca2a
[Feature] one ps (3/4) (#29604)
5 years ago
Thunderbrook 09b6e71928
heter box (#29734)
5 years ago
ShenLiang 01e2874a0e
Support multi-stream communication for dynamic graph distributed (#29525)
5 years ago
liuyuhui f13c3a9cd7
[Kunlun] PR1:Support one Kunlun card training in parallel executor (#29337)
5 years ago
Y_Xuan 76738504ad
添加rocm平台支持代码 (#29342)
5 years ago
AshburnLee efea540ca9
Add tf32 support for A100 tensor core acceleration for cuBLAS (#28732)
5 years ago
Wilber 78dad78610
fix none-contiguous bug for python api. (#29615)
5 years ago
Zhou Wei e74e1a226c
support deepcopy for Layer/Tensor/Paramerbase (#29387)
5 years ago
ShenLiang 2ef9e0e23c
Rebuild group automatically in dynamic graph distributed (#29255)
5 years ago
yongqiangma 7c508d8668
update unbind norm add CUDAPlace api doc information (#29322)
5 years ago
liym27 b10ecd9d3a
[inplace] Add ShareHolderWith for class Variable and SharePlaceholderWith in VarBase.detach() to share the same Tensor/SelectedRows (#29267)
5 years ago
Chen Weihang 9ad800ebb2
Support type promote for basic math ops (quantum required) (#29265)
5 years ago
Zhen Wang be3777a50a
Add pure fp16 training with master weights. (#27712)
5 years ago
chentianyu03 8f45d14263
add complex64 and complex128 type; add +-*/@ and slice opreator for c… (#29199)
5 years ago
Zhou Wei c0a991c874
accumulate gradient for leaf tensor with previous graph and expose leaf tensor concept (#28429)
5 years ago
liym27 865a45984f
Check whether there is any inplace operation affecting gradient calculation. (#27901)
5 years ago