Commit Graph

175 Commits (develop)

Author SHA1 Message Date
Chen Long d576d6ddeb
fix some docs test=develop;test=document_fix (#29159)
5 years ago
lilong12 216e085605
update, test=develop (#29139)
5 years ago
lilong12 a1add716bc
Add a flag to control whether to initialize gloo (#29150)
5 years ago
ShenLiang cddc70964d
fix InMemoryDataset doc (#28688)
5 years ago
JZ-LIANG 0dadacc4eb
[sharding] doc, api, bug fixed (#28983)
5 years ago
lilong12 2a864c70c4
fix the bug in gloo (#29112)
5 years ago
WangXi e931c7baf9
Fix multi nccl comm & wait server ready (#28663)
5 years ago
gongweibao 1358397e97
Clean up the redundant files and unify the launch interface. (#28928)
5 years ago
Chen Weihang bb16c2515d
Polish parallel api impl & doc details (#28980)
5 years ago
Leo Chen 3815d7aa40
Upgrade string literals to raw string (#28989)
5 years ago
123malin fbf9564f6b
【paddle.distributed.fleet】Optimize ParameterServer's Async Mode (#28442)
5 years ago
lilong12 f77a78cdee
enable pipeline to run with Executor.run() (#28373)
5 years ago
Chen Weihang bff4179cc7
lazily init global group in collective (#28780)
5 years ago
JZ-LIANG 5a9f6889c1
[Sharding] add new features (#28568)
5 years ago
lilong12 e4f9415338
update doc, test=document_fix (#28498)
5 years ago
danleifeng a24d186814
fix nccl init failed in parallel dygraph mode (#28497)
5 years ago
Chengmo 4dc8c44ba1
【Paddle.Fleet】Fix fleetrun heter (#28252)
5 years ago
mapingshuo 81244fbfab
add sharding strategy in fleet(#27900)
6 years ago
WangXi 11acbfae06
refine auto strategy, test=document_fix (#28211)
6 years ago
MRXLT 55098b975e
fleet support paddle.optimzier (#28026)
6 years ago
lilong12 5bb348a1c2
add doc for ReduceOp (#28051)
6 years ago
WangXi fb641c915e
【paddle.fleet】fleet add _get_applied_meta_list and _get_applied_graph_list (#27952)
6 years ago
lilong12 ff0ebefc1e
put gloo initialization log to file (#27969)
6 years ago
tangwei12 202bfab1be
Feature/large scale kv save base/delta (#27470)
6 years ago
123malin aa3b4ed717
【paddle.fleet】geo send sparse optimize (#27719)
6 years ago
danleifeng 8d7908f3fd
【paddle.fleet】raise error when using multi-cards in fleet non_distributed mode (#27854)
6 years ago
chentianyu03 d05058d268
Remove and reorganize the alias of APIs (#27717)
6 years ago
Chengmo 328cb289ed
【paddle.fleet】fix sparse load (#27680)
6 years ago
123malin a4f850748a
【paddle.fleet】bug fix for parameter_recv (#27838)
6 years ago
Chen Weihang ed31dac6eb
remove scale loss and coll grads, test=document_fix (#27874)
6 years ago
WangXi 50619cd842
use floyd algorithm to find meta optimizer max path, test=develop (#27867)
6 years ago
mapingshuo 8d2cb14f98
support gradient merge with recompute, test=develop (#27834)
6 years ago
Chengmo c5f2802d56
【paddle.fleet】Update fleetrun & ps-heter (#27472)
6 years ago
WangXi 0a1862d1d2
fleet combine amp dgc recompute meta optimizer (#27643)
6 years ago
danleifeng a01bc6b31d
【paddle.fleet】fleet support non_distributed training in dygraph mode (#27714)
6 years ago
lilong12 742cbe6660
[bug fix] avoiding multiple initialization of gloo for fleet in dygraph mode (#27706)
6 years ago
lilong12 5132f5129d
terminate http server used by gloo for fleet after init (#27698)
6 years ago
lilong12 bbc2add703
Initialize gloo for low level collective apis (#27672)
6 years ago
Qinghe JING 1539a23822
Fix bugs in hdfs download (#27344)
6 years ago
yaoxuefeng 780140599f
【paddle.distributed.fleet】add data_generator in distributed.fleet.dataset (#27345)
6 years ago
lilong12 36c0410223
Revert "Initialize gloo for low level collective apis (#27356)", test=document_fix (#27665)
6 years ago
123malin 6822307745
test=develop, rm netifaces (#27581)
6 years ago
lilong12 fa73e4a284
Initialize gloo for low level collective apis (#27356)
6 years ago
Dong Daxiang 4e8f18ab25
Get final strategy (#27602)
6 years ago
Chengmo 0e101c4f6f
Fix test dist fleet heter ctr (#27513)
6 years ago
WangXi e550fc02ae
fleet2.0 add fp16 grad compression (#27480)
6 years ago
123malin 32ad4f90a4
【paddle.fleet】 Usages Change: from fleet.util() to fleet.util (#27468)
6 years ago
tangwei12 bc5f0246a8
large scale kv speedup (#26510)
6 years ago
danleifeng 0721767ba9
fix server_num bug;test=develop (#27442)
6 years ago
danleifeng 905e2346ac
add endpoints log;test=develop (#27439)
6 years ago
danleifeng fc61efd736
fix port env bug(int);test=develop (#27405)
6 years ago
tangwei12 d6b54de467
【paddle.fleet】Fix/role maker api fix (#27326)
6 years ago
tangwei12 99626502f7
【paddle.fleet】gloo and util (#27213)
6 years ago
123malin f36b9a7f79
【Fleet2.0 Util】 add documents (#26698)
6 years ago
danleifeng 8d05c00c67
fix paddle.fleet en-doc for apis in dynamic mode (#27354)
6 years ago
ShenLiang 746a8ded29
fix comment of adaptive lsgd (#27362)
6 years ago
gongweibao 11bcf0e21c
Cleanup redundant code files (#27319)
6 years ago
ShenLiang 54b81fa32c
add adaptivelsgd in meta_optimizer (#27289)
6 years ago
yaoxuefeng c67c391682
refine fleet dataset class api (#27133)
6 years ago
danleifeng 389a9a7e0e
fix ports conflict when use paddlecloud to launch analogue multi-nodes (#26191)
6 years ago
mapingshuo 9dedafa0df
fix strategy, test=develop (#27323)
6 years ago
ShenLiang 2b6a5793fe
remove auto mode from localsgd optimizer (#27237)
6 years ago
123malin 60c3ef3ab8
【paddle.fleet】parameter_server_optimizer support auto_strategy (#27181)
6 years ago
JZ-LIANG 5d039f4086
modified the implement of Lars optimizer (#26733)
6 years ago
Dong Daxiang f7d08b7db8
【paddle.fleet】refine launch and distributed repr string for print (#27093)
6 years ago
123malin f2d68d3ed5
【paddle.fleet】parameter_server_optimizer support auto_strategy (#26838)
6 years ago
ShenLiang aca450f6fb
fix the localsgd optimizer (#27094)
6 years ago
Dong Daxiang 0443b480b8
【paddle.fleet】add auto parallel L1 implementations (#27090)
6 years ago
Chengmo a72752263b
support heter-xpu-ps (#27018)
6 years ago
mapingshuo 9e4fe92303
fix strategy example (#26856)
6 years ago
danleifeng 6b4ca0d7f1
【paddle.fleet】distributed_optimizer supports dygraph (#26541)
6 years ago
danleifeng 3a2a711681
【paddle.fleet】simplify fleetrun log infos (#26888)
6 years ago
danleifeng e35ad3eee8
【paddle.fleet】support running python train.py for fleet tasks (#26249)
6 years ago
lilong12 030b298e82
fix sample codes in collective.py (#26787)
6 years ago
Chengmo d0962abd20
supplement bug fix of parameter server (#26217)
6 years ago
Chen Weihang 28cb653145
Remove backend argument of init_parallel_env (#26773)
6 years ago
tangwei12 9ded7565ec
【paddle.fleet】FleetAPI 2.0 (#26772)
6 years ago
Chengmo 7f2aa2db3c
【paddle.fleet】Support Heter Parameter Server (#25998)
6 years ago
Dong Daxiang 994217ea05
【paddle.fleet】fix api documents (#26777)
6 years ago
Chen Weihang 31f422ae5e
Add interface to launch parallel dygraph by multiprocessing (#26044)
6 years ago
Yi Liu 2024ef69a2
【paddle.fleet】add comments about localsgd in DistributedStrategy (#26709)
6 years ago
lilong12 1c68138327
[api 2.0] add collective op for cpu using gloo and paddle.distributed.* apis (#26552)
6 years ago
JZ-LIANG 958d7212c7
【paddle.fleet】Document refine lars & lamb (#26533)
6 years ago
Dong Daxiang 08d736ad78
【paddle.fleet】add cudnn related strategies to DistributedStrategy (#26598)
6 years ago
WangXi 7ff197d3ba
Add fleet dgc amp doc, test=document_fix (#26608)
6 years ago
liuyuhui 66596bd20f
【paddle.fleet】solve the initial configuration about fleet and rolemaker (#26368)
6 years ago
Dong Daxiang 7d3e46e1d5
【paddle.fleet】Document refine (#26526)
6 years ago
Dong Daxiang 83cd185947
【paddle.fleet】Meta from optimizer (#26392)
6 years ago
123malin 57d434df5d
add save/load for parameter server (#26235)
6 years ago
mapingshuo cd48bdad31
add feature to fleet2.0 role_maker, distribute_strategy, test=develop (#26267)
6 years ago
Dong Daxiang 4ec51e0205
【paddle.fleet】Clear disable (#26334)
6 years ago
Yi Liu 3b2c580a66
【paddle.fleet】make fleet_localsgd_meta_optimizer work (#26213)
6 years ago
Qinghe JING d549a9b1fe
【paddle.fleet】Set default value to strategy in distributed_optimizer (#26246)
6 years ago
liuyuhui 935da32d25
【paddle.fleet】upgrade fleet: modify role_maker (#26038)
6 years ago
Dong Daxiang 50a5bcfc9d
【paddle.fleet】paddle.fleet -> paddle.distributed.fleet. (#26186)
6 years ago
Yi Liu f45f8363eb
records the offset of log when creating by paddle.distributed.launch (#25725)
6 years ago
gongweibao 80f1c50738
Fix typo in interface. (#24779)
6 years ago
Yi Liu b0f18947f3
fix the compatibility of PY2 and PY3 in paddle.distributed.launch (#25304)
6 years ago
Yi Liu 5209b9a510
cat log to stdout when setting log_dir in launch (#25147)
6 years ago
mapingshuo 9388a6381c
fix popen error (#24767)
6 years ago