Commit Graph

861 Commits (96784ed6c8c5c9f4ef0c56534c613eac1793ebe6)

Author SHA1 Message Date
Leo Chen 3d015f1cf5
Set expected place in child thread for dataloader to avoid costing cuda memory on other card (#30338)
4 years ago
ShenLiang a60f17b89d
Support unused parameters in dynamic graph distributed (#30224)
4 years ago
tangwei12 25f80fd304
Fix/distributed proto (#29981)
4 years ago
Chengmo d479ae1725
【Paddle.Fleet】Support local save sparse param (#30175)
4 years ago
AshburnLee 924aac2216
Add tf32 switch for cuDNN (#29192)
4 years ago
pangyoki da16b33f2e
add View(reuse allocation) strategy on squeeze, unsqueeze, reshape, flatten op (#29913)
4 years ago
Leo Chen 8696335f86
Fix dtype of ungenerated grad var (#28511)
4 years ago
Leo Chen 1f97d61c68
Add callback after TensorCopy (#30123)
4 years ago
Chengmo 528e03fc08
【Paddle.Fleet】Fix tensor table (#30075)
4 years ago
123malin 198fbdfb60
Add Lookahead and ModelAverage Optimizer (#30004)
4 years ago
Leo Chen adac38c506
add dispenable input for core.ops.reshape2/expand/slice (#30072)
4 years ago
liym27 9922bd4125
Fix bug: In dynamic mode, if start or end is negetive, __getitem__ return wrong result(#30003)
4 years ago
Thunderbrook 0b8e1fadc5
add topo-aware in heter-ps (#30087)
4 years ago
cc 68398abce9
[Inference] zero_copy_tensor supports int8_t (#30053)
4 years ago
liym27 9602a182b2
[Dynamic Inplace] Support ShareInplaceVersionCounterWith for C++ Tensor (#29842)
4 years ago
liuyuhui 4427df37cf
[Kunlun] PR2: Support MultiDevicePass and BKCL in parallel executor (#29574)
4 years ago
tangwei12 032414ca2a
[Feature] one ps (3/4) (#29604)
4 years ago
Thunderbrook 09b6e71928
heter box (#29734)
4 years ago
ShenLiang 01e2874a0e
Support multi-stream communication for dynamic graph distributed (#29525)
4 years ago
liuyuhui f13c3a9cd7
[Kunlun] PR1:Support one Kunlun card training in parallel executor (#29337)
4 years ago
Y_Xuan 76738504ad
添加rocm平台支持代码 (#29342)
4 years ago
AshburnLee efea540ca9
Add tf32 support for A100 tensor core acceleration for cuBLAS (#28732)
4 years ago
Wilber 78dad78610
fix none-contiguous bug for python api. (#29615)
4 years ago
Zhou Wei e74e1a226c
support deepcopy for Layer/Tensor/Paramerbase (#29387)
4 years ago
ShenLiang 2ef9e0e23c
Rebuild group automatically in dynamic graph distributed (#29255)
4 years ago
yongqiangma 7c508d8668
update unbind norm add CUDAPlace api doc information (#29322)
4 years ago
liym27 b10ecd9d3a
[inplace] Add ShareHolderWith for class Variable and SharePlaceholderWith in VarBase.detach() to share the same Tensor/SelectedRows (#29267)
4 years ago
Chen Weihang 9ad800ebb2
Support type promote for basic math ops (quantum required) (#29265)
4 years ago
Zhen Wang be3777a50a
Add pure fp16 training with master weights. (#27712)
4 years ago
chentianyu03 8f45d14263
add complex64 and complex128 type; add +-*/@ and slice opreator for c… (#29199)
4 years ago
Zhou Wei c0a991c874
accumulate gradient for leaf tensor with previous graph and expose leaf tensor concept (#28429)
4 years ago
liym27 865a45984f
Check whether there is any inplace operation affecting gradient calculation. (#27901)
4 years ago
ShenLiang e2d01eb650
Support dynamic graph distributed (#28997)
4 years ago
Leo Chen 770395cb93
Split train_mode and has_grad for tracer (#29064)
4 years ago
Zhou Wei 8ca0a8a859
fix tensor detach to zero copy (#27921)
4 years ago
Chen Weihang 768dab441e
polish two api doc detail, test=document_fix (#28971)
4 years ago
gongweibao 1dad8ceaab
Fix gpu memory allocation bug. (#28703)
4 years ago
Zhou Wei 3b0dd5f620
fix bug that to_tensor not support paddle.Place (#28717)
4 years ago
Leo Chen 3d09929b1f
Add check for non-dispensable input (#28666)
4 years ago
Zhou Wei bf6e7cba7a
updata 2.0 API english doc (#28525)
4 years ago
Wilber 1bf4836580
[Inference] Add TryShrinkMemory interface. (#28409)
4 years ago
石晓伟 c41fd033e5
check op_version_registry in CI test, test=develop (#28402)
4 years ago
Leo Chen 8b2436a776
Add broadcast_shape api (#28257)
4 years ago
石晓伟 21a63f6f90
enhance the op_version_registry, test=develop (#28347)
4 years ago
Shang Zhizhou ea851796e5
TensorRT中ernie模型推理性能优化,支持变长输入 (#28367)
4 years ago
Wilber 6f0f45f69c
copy_to_cpu support uint8 (#28372)
4 years ago
wangguanzhong 5262b02585
add generate_proposals_v2 op (#28214)
4 years ago
石晓伟 d9b5f1261c
update the version of pybind, test=develop (#28284)
4 years ago
wangguanzhong 1c385e26f9
add op_function_generator for box_coder (#28303)
4 years ago
Guanghua Yu e8f2614da5
Enhance multiclass_nms op to support LoD for dygraph mode (#28276)
4 years ago