Commit Graph

71 Commits (670937e11dca65253c5693a0328284cd4003cb2f)

Author SHA1 Message Date
gongweibao 89c4b3ddcf
Add bash_test_modules function to capture the timeout or failed context. () 6 years ago
tangwei12 8f0b3c0516
the integrated communicator () 6 years ago
Yi Liu 4ef6b8457a
adapte fleet api for localsgd and support nccl comm configuration in executor () 6 years ago
chengduo 5a579df9ba
[Speedup] Make dygraph data parallel faster () 6 years ago
kh2se2013 27e85625b8 add python coverage launch when WITH_COVERAGE=ON () 6 years ago
gongweibao 29d8781240
Polish fleet API to support cuda collective mode and nccl2 mode. () 6 years ago
Zeng Jinle c194b0c835
Try to deprecate unstable python memory optimize () 6 years ago
chengduo 17d62ab220
Enhance fuse optimization op pass () 6 years ago
gongweibao c0a82748cf
Polish backwards optimizer dependency codes and use more default values. () 6 years ago
guru4elephant 7d76e34ec2
add more print function for timeout issue, make timeout value larger () 6 years ago
guru4elephant 0941e3e013
add class name and timeline for test_dist_base.py () 6 years ago
guru4elephant b2cfdc3891
Refine unittest log () 6 years ago
gongweibao f5caf3443c
Fix reinitialized ncclid error! () 6 years ago
gongweibao fbbdc9ccad
Add backward and optimizer operator dependency pass. () 6 years ago
gongweibao 65bbf950ee
Add multi-ncclcomm and 2D ncclallreduce support. () 6 years ago
Yan Xu 0217555530 polish parallel dygraph code () 6 years ago
Yan Xu 0b07eef118
ParallelDyGraph with GPU collective mode () 6 years ago
tangwei12 1a4a51db2b
Fleet unify distributed training () 6 years ago
Qiao Longfei 61912e879d test_dist_base set runtime_split_send_recv to false test=develop 6 years ago
Qiao Longfei d8974e6da0 Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into add-async-ssa-graph-executor-communicator 6 years ago
gongweibao eb83abeac3
Add DGC(Deep Gradient Compression) interface. () 6 years ago
Qiao Longfei 30618409db Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into add-async-ssa-graph-executor-communicator 6 years ago
Wu Yi 8bebfe5640
add resnet nccl2 dist training, mp training unit test () 6 years ago
Wu Yi 6382b62f6b
Collective ops () 6 years ago
liuwei1031 caadd0581d
add IfElse test case for ir memory optimize () 6 years ago
Qiao Longfei 5cf0092825 add more log and fix test_dist_base in multi_batch_merge_pass 6 years ago
Qiao Longfei 4356f186b4 complete parameter_send 6 years ago
Zeng Jinle dec89bd7ed
Merge pull request from sneaxiy/try_to_turn_on_remove_unnecessary_lock 6 years ago
tangwei12 8b50ad80ff
checkpoint at distributed training () 6 years ago
sneaxiy ef788603d4 merge develop 6 years ago
WangZhen bac08c4a26 Fix some bugs caused by set functions of the Pass class. test=develop 6 years ago
sneaxiy d8568acd19 turn on remove_unnecessary_lock 6 years ago
Xin Pan 7526ac14e3 add comments 6 years ago
Xin Pan beaae61a16 polish 6 years ago
Xin Pan 5e928e579a try unify Executor and ParallelExecutor 6 years ago
Yancey1989 8cad371a60 fix nccl unittest acc test=develop 6 years ago
Yan Xu 5384206aec
Merge pull request from Yancey1989/fix_dist_unittest 6 years ago
Yancey1989 fa1f77e20c enable ci test=develop 6 years ago
Wu Yi f95ee9c09f
fix nccl dist test acc () 6 years ago
Wu Yi 554bcdbdfc
add more log for dist test for ci test=develop () 6 years ago
Wu Yi aebc175cd4
add nccl2 dist tests () 6 years ago
Wu Yi e2011f1353
test dist ut fixes test=develop () 6 years ago
Xin Pan 44ecf9a481 fix 6 years ago
Xin Pan 9735e3016a fix test 6 years ago
Wu Yi 306236c2c0
feature/DC asgd () 6 years ago
Wu Yi d186e7434e
Refine dist ut () 6 years ago
minqiyang 59420d5bd2 Polish code 7 years ago
minqiyang 2cc939bbfa Fix Mac Python3 CI job 7 years ago
Wu Yi 26200f2e42
[1.1] [project] train imagenet using large batch size () 7 years ago
tangwei12 b35239df2b fix dist ut with place, test=develop () 7 years ago