Commit Graph

45 Commits (f1ae017fa9294d8bd024aefbea0678e0fee59dfc)

Author SHA1 Message Date
Yi Liu 2024ef69a2
【paddle.fleet】add comments about localsgd in DistributedStrategy (#26709)
5 years ago
lilong12 1c68138327
[api 2.0] add collective op for cpu using gloo and paddle.distributed.* apis (#26552)
5 years ago
JZ-LIANG 958d7212c7
【paddle.fleet】Document refine lars & lamb (#26533)
5 years ago
Dong Daxiang 08d736ad78
【paddle.fleet】add cudnn related strategies to DistributedStrategy (#26598)
5 years ago
WangXi 7ff197d3ba
Add fleet dgc amp doc, test=document_fix (#26608)
5 years ago
liuyuhui 66596bd20f
【paddle.fleet】solve the initial configuration about fleet and rolemaker (#26368)
5 years ago
Dong Daxiang 7d3e46e1d5
【paddle.fleet】Document refine (#26526)
5 years ago
Dong Daxiang 83cd185947
【paddle.fleet】Meta from optimizer (#26392)
5 years ago
123malin 57d434df5d
add save/load for parameter server (#26235)
5 years ago
mapingshuo cd48bdad31
add feature to fleet2.0 role_maker, distribute_strategy, test=develop (#26267)
5 years ago
Dong Daxiang 4ec51e0205
【paddle.fleet】Clear disable (#26334)
5 years ago
Yi Liu 3b2c580a66
【paddle.fleet】make fleet_localsgd_meta_optimizer work (#26213)
5 years ago
Qinghe JING d549a9b1fe
【paddle.fleet】Set default value to strategy in distributed_optimizer (#26246)
5 years ago
liuyuhui 935da32d25
【paddle.fleet】upgrade fleet: modify role_maker (#26038)
5 years ago
Dong Daxiang 50a5bcfc9d
【paddle.fleet】paddle.fleet -> paddle.distributed.fleet. (#26186)
5 years ago
Yi Liu f45f8363eb
records the offset of log when creating by paddle.distributed.launch (#25725)
5 years ago
gongweibao 80f1c50738
Fix typo in interface. (#24779)
5 years ago
Yi Liu b0f18947f3
fix the compatibility of PY2 and PY3 in paddle.distributed.launch (#25304)
5 years ago
Yi Liu 5209b9a510
cat log to stdout when setting log_dir in launch (#25147)
5 years ago
mapingshuo 9388a6381c
fix popen error (#24767)
5 years ago
zhangchunle f62dfc6238
fs_wrapper add __all__ (#24335)
5 years ago
Kaipeng Deng 80cf3c3c4d
Refine DataLoader support multi-processing (#23107)
5 years ago
gongweibao 63bfe0b946
Fix the default value bug of started port in launch.py. (#23531)
5 years ago
gongweibao 24a063f6ac
Add fleet checkpoint on local fs and remote fs(such as hdfs) for EDL (#22586)
5 years ago
gongweibao 4b40edf359
Use available ports instead of static ports. (#22553)
5 years ago
tianshuo78520a d2ba91aad1
fix typo words (#22653)
5 years ago
gongweibao ad2bc0c364 Fix a distribution bug and cleanup some not need logs. (#22381)
5 years ago
danleifeng f5262865c0 change select_gpus into absolute values in launch.py (#22031)
5 years ago
danleifeng 3fe63d6780 add store_true to use_paddlecloud argument in launch.py (#21168)
5 years ago
Dong Daxiang a6747a6ef1
add launch_ps module so that we can launch a parameter server trainin… (#20936)
5 years ago
WangXi 9d8ec42353 launch.py remove setting for nccl sync, test=develop (#20909)
5 years ago
WangXi e78d7f57bb Print the rank which trainer is error in launch.py, test=develop (#20838)
5 years ago
WangXi 8c2c8dc626 distribute.launch use poll to query subprocess (#19853)
5 years ago
danleifeng 0865b5a9a0 distribute launch : add use_paddlecloud argument (#19273)
6 years ago
gongweibao 86f0591175
Remove node_num function. (#19167)
6 years ago
guru4elephant 30562e371b
refine launch_ps and role_maker (#18795)
6 years ago
guru4elephant 70b03760fd
add parameter server launch (#18687)
6 years ago
gongweibao c0a82748cf
Polish backwards optimizer dependency codes and use more default values. (#18255)
6 years ago
gongweibao da9143c1cc
Polish codes of old prs. (#17938)
6 years ago
gongweibao f3e5a5cf67
Unset https_proxy and http_proxy in our launch.py (#17915)
6 years ago
gongweibao 6a1df46991
Fine tuning launch.py (#17223)
6 years ago
chengduo ca03f4989a
fix distributed launch.py (#17571)
6 years ago
Yan Xu 266444b8af
fix dist launch script test=develop (#17404)
6 years ago
Yan Xu b4c3a6aa0b
[Imperative] implement imperative NCCLParallelContext (#16477)
6 years ago
Yan Xu d424e5b4c9
add launch mp distributed job py module test=develop (#15620)
6 years ago