Commit Graph

64 Commits (c2063217e7458cfa97a16874b6a6982448259304)

Author SHA1 Message Date
chengduo 17d62ab220
Enhance fuse optimization op pass (#19010)
6 years ago
gongweibao c0a82748cf
Polish backwards optimizer dependency codes and use more default values. (#18255)
6 years ago
guru4elephant 7d76e34ec2
add more print function for timeout issue, make timeout value larger (#18219)
6 years ago
guru4elephant 0941e3e013
add class name and timeline for test_dist_base.py (#18122)
6 years ago
guru4elephant b2cfdc3891
Refine unittest log (#18084)
6 years ago
gongweibao f5caf3443c
Fix reinitialized ncclid error! (#18025)
6 years ago
gongweibao fbbdc9ccad
Add backward and optimizer operator dependency pass. (#17746)
6 years ago
gongweibao 65bbf950ee
Add multi-ncclcomm and 2D ncclallreduce support. (#17263)
6 years ago
Yan Xu 0217555530 polish parallel dygraph code (#17164)
6 years ago
Yan Xu 0b07eef118
ParallelDyGraph with GPU collective mode (#16827)
6 years ago
tangwei12 1a4a51db2b
Fleet unify distributed training (#16791)
6 years ago
Qiao Longfei 61912e879d test_dist_base set runtime_split_send_recv to false test=develop
6 years ago
Qiao Longfei d8974e6da0 Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into add-async-ssa-graph-executor-communicator
6 years ago
gongweibao eb83abeac3
Add DGC(Deep Gradient Compression) interface. (#15841)
6 years ago
Qiao Longfei 30618409db Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into add-async-ssa-graph-executor-communicator
6 years ago
Wu Yi 8bebfe5640
add resnet nccl2 dist training, mp training unit test (#16167)
6 years ago
Wu Yi 6382b62f6b
Collective ops (#15572)
6 years ago
liuwei1031 caadd0581d
add IfElse test case for ir memory optimize (#15998)
6 years ago
Qiao Longfei 5cf0092825 add more log and fix test_dist_base in multi_batch_merge_pass
6 years ago
Qiao Longfei 4356f186b4 complete parameter_send
6 years ago
Zeng Jinle dec89bd7ed
Merge pull request #15460 from sneaxiy/try_to_turn_on_remove_unnecessary_lock
6 years ago
tangwei12 8b50ad80ff
checkpoint at distributed training (#14854)
6 years ago
sneaxiy ef788603d4 merge develop
6 years ago
WangZhen bac08c4a26 Fix some bugs caused by set functions of the Pass class. test=develop
6 years ago
sneaxiy d8568acd19 turn on remove_unnecessary_lock
6 years ago
Xin Pan 7526ac14e3 add comments
6 years ago
Xin Pan beaae61a16 polish
6 years ago
Xin Pan 5e928e579a try unify Executor and ParallelExecutor
6 years ago
Yancey1989 8cad371a60 fix nccl unittest acc test=develop
6 years ago
Yan Xu 5384206aec
Merge pull request #14869 from Yancey1989/fix_dist_unittest
6 years ago
Yancey1989 fa1f77e20c enable ci test=develop
6 years ago
Wu Yi f95ee9c09f
fix nccl dist test acc (#14867)
6 years ago
Wu Yi 554bcdbdfc
add more log for dist test for ci test=develop (#14813)
6 years ago
Wu Yi aebc175cd4
add nccl2 dist tests (#14755)
6 years ago
Wu Yi e2011f1353
test dist ut fixes test=develop (#14706)
6 years ago
Xin Pan 44ecf9a481 fix
6 years ago
Xin Pan 9735e3016a fix test
6 years ago
Wu Yi 306236c2c0
feature/DC asgd (#12722)
6 years ago
Wu Yi d186e7434e
Refine dist ut (#14118)
6 years ago
minqiyang 59420d5bd2 Polish code
6 years ago
minqiyang 2cc939bbfa Fix Mac Python3 CI job
6 years ago
Wu Yi 26200f2e42
[1.1] [project] train imagenet using large batch size (#13766)
6 years ago
tangwei12 b35239df2b fix dist ut with place, test=develop (#13647)
7 years ago
Wu Yi 7a5f3f750b
Fix memory optimization with dist train (#13535)
7 years ago
tangwei12 97cf1eb6d7
Add distributed unit tests about text_classification/simnet-bow/ctr (#12812)
7 years ago
Wu Yi 437debf40e Fix mac ci dist (#13393)
7 years ago
Yancey1989 a267155006 fix parallel run dist unit test
7 years ago
Wu Yi 0b8067c0dc
fix dist train reduce mode (#13068)
7 years ago
Wu Yi a615ad46e4
Add test for dist and memopt (#13049)
7 years ago
Wu Yi f63368db5e
Add async dist tests (#12798)
7 years ago