Commit Graph

128 Commits (b6a26749dc1747d2378e4976366d18268841b74c)

Author SHA1 Message Date
tangwei12 bc5f0246a8
large scale kv speedup (#26510)
5 years ago
danleifeng 0721767ba9
fix server_num bug;test=develop (#27442)
5 years ago
danleifeng 905e2346ac
add endpoints log;test=develop (#27439)
5 years ago
danleifeng fc61efd736
fix port env bug(int);test=develop (#27405)
5 years ago
tangwei12 d6b54de467
【paddle.fleet】Fix/role maker api fix (#27326)
5 years ago
tangwei12 99626502f7
【paddle.fleet】gloo and util (#27213)
5 years ago
123malin f36b9a7f79
【Fleet2.0 Util】 add documents (#26698)
5 years ago
danleifeng 8d05c00c67
fix paddle.fleet en-doc for apis in dynamic mode (#27354)
5 years ago
ShenLiang 746a8ded29
fix comment of adaptive lsgd (#27362)
5 years ago
gongweibao 11bcf0e21c
Cleanup redundant code files (#27319)
5 years ago
ShenLiang 54b81fa32c
add adaptivelsgd in meta_optimizer (#27289)
5 years ago
yaoxuefeng c67c391682
refine fleet dataset class api (#27133)
5 years ago
danleifeng 389a9a7e0e
fix ports conflict when use paddlecloud to launch analogue multi-nodes (#26191)
5 years ago
mapingshuo 9dedafa0df
fix strategy, test=develop (#27323)
5 years ago
ShenLiang 2b6a5793fe
remove auto mode from localsgd optimizer (#27237)
5 years ago
123malin 60c3ef3ab8
【paddle.fleet】parameter_server_optimizer support auto_strategy (#27181)
5 years ago
JZ-LIANG 5d039f4086
modified the implement of Lars optimizer (#26733)
5 years ago
Dong Daxiang f7d08b7db8
【paddle.fleet】refine launch and distributed repr string for print (#27093)
5 years ago
123malin f2d68d3ed5
【paddle.fleet】parameter_server_optimizer support auto_strategy (#26838)
5 years ago
ShenLiang aca450f6fb
fix the localsgd optimizer (#27094)
5 years ago
Dong Daxiang 0443b480b8
【paddle.fleet】add auto parallel L1 implementations (#27090)
5 years ago
Chengmo a72752263b
support heter-xpu-ps (#27018)
5 years ago
mapingshuo 9e4fe92303
fix strategy example (#26856)
5 years ago
danleifeng 6b4ca0d7f1
【paddle.fleet】distributed_optimizer supports dygraph (#26541)
5 years ago
danleifeng 3a2a711681
【paddle.fleet】simplify fleetrun log infos (#26888)
5 years ago
danleifeng e35ad3eee8
【paddle.fleet】support running python train.py for fleet tasks (#26249)
5 years ago
lilong12 030b298e82
fix sample codes in collective.py (#26787)
5 years ago
Chengmo d0962abd20
supplement bug fix of parameter server (#26217)
5 years ago
Chen Weihang 28cb653145
Remove backend argument of init_parallel_env (#26773)
5 years ago
tangwei12 9ded7565ec
【paddle.fleet】FleetAPI 2.0 (#26772)
5 years ago
Chengmo 7f2aa2db3c
【paddle.fleet】Support Heter Parameter Server (#25998)
5 years ago
Dong Daxiang 994217ea05
【paddle.fleet】fix api documents (#26777)
5 years ago
Chen Weihang 31f422ae5e
Add interface to launch parallel dygraph by multiprocessing (#26044)
5 years ago
Yi Liu 2024ef69a2
【paddle.fleet】add comments about localsgd in DistributedStrategy (#26709)
5 years ago
lilong12 1c68138327
[api 2.0] add collective op for cpu using gloo and paddle.distributed.* apis (#26552)
5 years ago
JZ-LIANG 958d7212c7
【paddle.fleet】Document refine lars & lamb (#26533)
5 years ago
Dong Daxiang 08d736ad78
【paddle.fleet】add cudnn related strategies to DistributedStrategy (#26598)
5 years ago
WangXi 7ff197d3ba
Add fleet dgc amp doc, test=document_fix (#26608)
5 years ago
liuyuhui 66596bd20f
【paddle.fleet】solve the initial configuration about fleet and rolemaker (#26368)
5 years ago
Dong Daxiang 7d3e46e1d5
【paddle.fleet】Document refine (#26526)
5 years ago
Dong Daxiang 83cd185947
【paddle.fleet】Meta from optimizer (#26392)
5 years ago
123malin 57d434df5d
add save/load for parameter server (#26235)
5 years ago
mapingshuo cd48bdad31
add feature to fleet2.0 role_maker, distribute_strategy, test=develop (#26267)
5 years ago
Dong Daxiang 4ec51e0205
【paddle.fleet】Clear disable (#26334)
5 years ago
Yi Liu 3b2c580a66
【paddle.fleet】make fleet_localsgd_meta_optimizer work (#26213)
5 years ago
Qinghe JING d549a9b1fe
【paddle.fleet】Set default value to strategy in distributed_optimizer (#26246)
5 years ago
liuyuhui 935da32d25
【paddle.fleet】upgrade fleet: modify role_maker (#26038)
5 years ago
Dong Daxiang 50a5bcfc9d
【paddle.fleet】paddle.fleet -> paddle.distributed.fleet. (#26186)
5 years ago
Yi Liu f45f8363eb
records the offset of log when creating by paddle.distributed.launch (#25725)
5 years ago
gongweibao 80f1c50738
Fix typo in interface. (#24779)
5 years ago
Yi Liu b0f18947f3
fix the compatibility of PY2 and PY3 in paddle.distributed.launch (#25304)
5 years ago
Yi Liu 5209b9a510
cat log to stdout when setting log_dir in launch (#25147)
5 years ago
mapingshuo 9388a6381c
fix popen error (#24767)
5 years ago
zhangchunle f62dfc6238
fs_wrapper add __all__ (#24335)
5 years ago
Kaipeng Deng 80cf3c3c4d
Refine DataLoader support multi-processing (#23107)
5 years ago
gongweibao 63bfe0b946
Fix the default value bug of started port in launch.py. (#23531)
5 years ago
gongweibao 24a063f6ac
Add fleet checkpoint on local fs and remote fs(such as hdfs) for EDL (#22586)
5 years ago
gongweibao 4b40edf359
Use available ports instead of static ports. (#22553)
5 years ago
tianshuo78520a d2ba91aad1
fix typo words (#22653)
5 years ago
gongweibao ad2bc0c364 Fix a distribution bug and cleanup some not need logs. (#22381)
6 years ago
danleifeng f5262865c0 change select_gpus into absolute values in launch.py (#22031)
6 years ago
danleifeng 3fe63d6780 add store_true to use_paddlecloud argument in launch.py (#21168)
6 years ago
Dong Daxiang a6747a6ef1
add launch_ps module so that we can launch a parameter server trainin… (#20936)
6 years ago
WangXi 9d8ec42353 launch.py remove setting for nccl sync, test=develop (#20909)
6 years ago
WangXi e78d7f57bb Print the rank which trainer is error in launch.py, test=develop (#20838)
6 years ago
WangXi 8c2c8dc626 distribute.launch use poll to query subprocess (#19853)
6 years ago
danleifeng 0865b5a9a0 distribute launch : add use_paddlecloud argument (#19273)
6 years ago
gongweibao 86f0591175
Remove node_num function. (#19167)
6 years ago
guru4elephant 30562e371b
refine launch_ps and role_maker (#18795)
6 years ago
guru4elephant 70b03760fd
add parameter server launch (#18687)
6 years ago
gongweibao c0a82748cf
Polish backwards optimizer dependency codes and use more default values. (#18255)
6 years ago
gongweibao da9143c1cc
Polish codes of old prs. (#17938)
6 years ago
gongweibao f3e5a5cf67
Unset https_proxy and http_proxy in our launch.py (#17915)
6 years ago
gongweibao 6a1df46991
Fine tuning launch.py (#17223)
6 years ago
chengduo ca03f4989a
fix distributed launch.py (#17571)
6 years ago
Yan Xu 266444b8af
fix dist launch script test=develop (#17404)
6 years ago
Yan Xu b4c3a6aa0b
[Imperative] implement imperative NCCLParallelContext (#16477)
6 years ago
Yan Xu d424e5b4c9
add launch mp distributed job py module test=develop (#15620)
6 years ago