Commit Graph

59 Commits (c308c88d71e183e3ae79075c94f4ee5f72982fa8)

Author SHA1 Message Date
gongweibao 57f0f0f2dc
Delete pserver complete file before executor running. (#19468)
6 years ago
tangwei12 65c7368400
Fix the correctness of async mode at distributed training (#18863)
6 years ago
tangwei12 19dac67e9f
fix distribute transpiler GRPC error code 4, RPC Deadline (#18984)
6 years ago
zhang wenhui 539c870753
add fl_listen_and_serv &fl_transpiler,test=develop (#19091)
6 years ago
gongweibao fd4b15a2f6
Unset unittests http_proxy env to avoid timeout. (#19269)
6 years ago
Leo Chen 80eab822c1 Remove unused DefaultGradOpDescMaker in REGISTER_OPERATOR() (#19166)
6 years ago
HaoRen b7128bac5f supports collective communicated training (#18175)
6 years ago
gongweibao 0d561ef442
fix 2dconn test=develop (#17681)
6 years ago
gongweibao 65bbf950ee
Add multi-ncclcomm and 2D ncclallreduce support. (#17263)
6 years ago
chengduo b5f4d5ed0e
Add broadcast operators (#17503)
6 years ago
Qiao Longfei 58f7695ab2
Async exe support communicator (#17386)
6 years ago
Yan Xu 0217555530 polish parallel dygraph code (#17164)
6 years ago
Yan Xu 0b07eef118
ParallelDyGraph with GPU collective mode (#16827)
6 years ago
Qiao Longfei 0e663d7f51 fix split_byref_op infer shape
6 years ago
Qiao Longfei 8b8a0487c7 fix compile test=develop
6 years ago
Qiao Longfei a541c25ab6 fix cpplint test=develop
6 years ago
Qiao Longfei 0608f8ca56 Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into add-async_sparse_param_update_recorder
6 years ago
Qiao Longfei fb6cc3a1bd follow commnet, optimize code and add comment test=develop
6 years ago
Qiao Longfei b542639dc0 code clean test=develop
6 years ago
Qiao Longfei 30618409db Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into add-async-ssa-graph-executor-communicator
6 years ago
Qiao Longfei be0c482304 update trainer_id
6 years ago
Qiao Longfei de65398cb8 update transpiler and listen and serv op
6 years ago
Wu Yi 6382b62f6b
Collective ops (#15572)
6 years ago
minqiyang b40e41fbd1 Polish code style
6 years ago
Qiao Longfei ea0df4e8a2 add some check
6 years ago
minqiyang db0c970823 Polish code
6 years ago
minqiyang 438bca9c3d Implement Runtime Var Type Inference
6 years ago
minqiyang ca392c7e97 Implement infer var type context
6 years ago
Qiao Longfei 065b68b6ca clean code
6 years ago
Qiao Longfei 63cd70a8b8 fix blocking problem
6 years ago
Qiao Longfei c0e5941e31 add commnet for recv do_not_run
6 years ago
Qiao Longfei ff8054c5a7 can run
6 years ago
Qiao Longfei 255b36dad2 can run
6 years ago
Qiao Longfei c2cce6bafa simplify parameter send and recv
6 years ago
Qiao Longfei 3c6b733d14 remove exe context
6 years ago
Qiao Longfei 9573d610ef use rpc common in parameter send and recv
6 years ago
Qiao Longfei 02425b2f64 fix compile
6 years ago
Qiao Longfei fbd186bd5d complete recv op
6 years ago
Qiao Longfei a0585d08ed init parameter recv
6 years ago
Qiao Longfei 4356f186b4 complete parameter_send
7 years ago
Qiao Longfei 657a4f9430 code can compile
7 years ago
Qiao Longfei c7e3868007 Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into add-communicator
7 years ago
乔龙飞 Qiao Longfei 5f89ce7fcd
Merge pull request #15536 from jacquesqiao/fix-prefetch-one-parameter
7 years ago
Qiao Longfei c750be6d9d add some log
7 years ago
tangwei12 981fc2bdba
fix bug in merge_ids (#15503)
7 years ago
Qiao Longfei 1edc0423d2 update send_op
7 years ago
Qiao Longfei 74040cb4aa code clean
7 years ago
Qiao Longfei ca5d96bb3d complete send lod tensor
7 years ago
tangwei12 8b50ad80ff
checkpoint at distributed training (#14854)
7 years ago
Wu Yi a8bc05b5ff
Refactor distributed RPC (#15075)
7 years ago