Commit Graph

139 Commits (e6177072295dd9e54b7968d82327a0cbee68d332)

Author SHA1 Message Date
ShenLiang 01e2874a0e
Support multi-stream communication for dynamic graph distributed (#29525)
4 years ago
WangXi 9cbcc6cadc
fleet sync build strategy, test=develop (#29732)
4 years ago
JZ-LIANG d33d468f02
[Sharding] add hybrid-dp feature (#29518)
4 years ago
ShenLiang 2ef9e0e23c
Rebuild group automatically in dynamic graph distributed (#29255)
4 years ago
lilong12 b122d0bb76
Fix bug in gloo that gloo initialization hangs (#29447)
4 years ago
ShenLiang 4064354a01
support dp run single card (#29358)
4 years ago
gongweibao 96de8b008f
cleanup enum test=develop (#29294)
4 years ago
ShenLiang 2d6aa1a5bb
fix warning of fleet (#29317)
4 years ago
ShenLiang 2cd0bf5764
Fix doc of fleet api (#29282)
4 years ago
ShenLiang 46b73e6cd9
Change the api of DataParallel and Fleet (#29224)
4 years ago
123malin cc9c619679
test=develop, fix doc (#29200)
4 years ago
WangXi 0c2a51d240
optimizer amp, all use fp16 communication, overlap last comm and compute (#28957)
4 years ago
123malin 92817f8005
test=develop, rm pathlib (#28658)
4 years ago
ShenLiang e2d01eb650
Support dynamic graph distributed (#28997)
4 years ago
Chen Long d576d6ddeb
fix some docs test=develop;test=document_fix (#29159)
4 years ago
lilong12 216e085605
update, test=develop (#29139)
4 years ago
lilong12 a1add716bc
Add a flag to control whether to initialize gloo (#29150)
4 years ago
ShenLiang cddc70964d
fix InMemoryDataset doc (#28688)
4 years ago
JZ-LIANG 0dadacc4eb
[sharding] doc, api, bug fixed (#28983)
4 years ago
lilong12 2a864c70c4
fix the bug in gloo (#29112)
4 years ago
WangXi e931c7baf9
Fix multi nccl comm & wait server ready (#28663)
4 years ago
gongweibao 1358397e97
Clean up the redundant files and unify the launch interface. (#28928)
4 years ago
Chen Weihang bb16c2515d
Polish parallel api impl & doc details (#28980)
4 years ago
Leo Chen 3815d7aa40
Upgrade string literals to raw string (#28989)
4 years ago
123malin fbf9564f6b
【paddle.distributed.fleet】Optimize ParameterServer's Async Mode (#28442)
4 years ago
lilong12 f77a78cdee
enable pipeline to run with Executor.run() (#28373)
4 years ago
Chen Weihang bff4179cc7
lazily init global group in collective (#28780)
4 years ago
JZ-LIANG 5a9f6889c1
[Sharding] add new features (#28568)
4 years ago
lilong12 e4f9415338
update doc, test=document_fix (#28498)
4 years ago
danleifeng a24d186814
fix nccl init failed in parallel dygraph mode (#28497)
4 years ago
Chengmo 4dc8c44ba1
【Paddle.Fleet】Fix fleetrun heter (#28252)
4 years ago
mapingshuo 81244fbfab
add sharding strategy in fleet(#27900)
4 years ago
WangXi 11acbfae06
refine auto strategy, test=document_fix (#28211)
4 years ago
MRXLT 55098b975e
fleet support paddle.optimzier (#28026)
4 years ago
lilong12 5bb348a1c2
add doc for ReduceOp (#28051)
4 years ago
WangXi fb641c915e
【paddle.fleet】fleet add _get_applied_meta_list and _get_applied_graph_list (#27952)
4 years ago
lilong12 ff0ebefc1e
put gloo initialization log to file (#27969)
4 years ago
tangwei12 202bfab1be
Feature/large scale kv save base/delta (#27470)
4 years ago
123malin aa3b4ed717
【paddle.fleet】geo send sparse optimize (#27719)
4 years ago
danleifeng 8d7908f3fd
【paddle.fleet】raise error when using multi-cards in fleet non_distributed mode (#27854)
4 years ago
chentianyu03 d05058d268
Remove and reorganize the alias of APIs (#27717)
4 years ago
Chengmo 328cb289ed
【paddle.fleet】fix sparse load (#27680)
4 years ago
123malin a4f850748a
【paddle.fleet】bug fix for parameter_recv (#27838)
4 years ago
Chen Weihang ed31dac6eb
remove scale loss and coll grads, test=document_fix (#27874)
4 years ago
WangXi 50619cd842
use floyd algorithm to find meta optimizer max path, test=develop (#27867)
4 years ago
mapingshuo 8d2cb14f98
support gradient merge with recompute, test=develop (#27834)
4 years ago
Chengmo c5f2802d56
【paddle.fleet】Update fleetrun & ps-heter (#27472)
4 years ago
WangXi 0a1862d1d2
fleet combine amp dgc recompute meta optimizer (#27643)
4 years ago
danleifeng a01bc6b31d
【paddle.fleet】fleet support non_distributed training in dygraph mode (#27714)
4 years ago
lilong12 742cbe6660
[bug fix] avoiding multiple initialization of gloo for fleet in dygraph mode (#27706)
4 years ago