Commit Graph

79 Commits (8acd745c25ba2fedc1f6c5bb48f13a2de94a58f1)

Author SHA1 Message Date
Wilber de009152a7 Compile without nccl deps. [2/2] (#22484)
6 years ago
Wilber 7bc4b09500
add WITH_NCCL option for cmake. (#22384)
6 years ago
tangwei12 82bc814a57
integrated HALF_ASYNC to communicator (#21869)
6 years ago
zhangchunle 805328e13b fix typo in error message (#22312)
6 years ago
123malin 985bceac53
Bug fix for sparse recorder (#21969)
6 years ago
123malin 7fb817d447
add distributed_strategy (#21710)
6 years ago
zhouwei25 a01663ca1f remove patch command and file of cares to Improved quality of Paddle Repo (#21776)
6 years ago
Chen Weihang 1fd1f06f11 Rename paddle throw error macro (#21657)
6 years ago
tangwei12 9ad940fdfe
memory leak for cpu (#21174)
6 years ago
Tao Luo 70eb397677
remove unused snappy/snappystream depends in distributed codes (#21484)
6 years ago
tangwei12 0bddb951c2
fix async mode, test=develop (#21367)
6 years ago
hong ac8546701d
Add dygraph execution context (#20157)
6 years ago
123malin 20cdff0e02
Optimize decay (#20816)
6 years ago
hong 8c4573a3cb
GradMaker for dygraph (#19706)
6 years ago
Chen Weihang 26cc1fe508
Replace risky GetInputType method with secure IndicateVarDataType interface (#20668)
6 years ago
Chengmo 940c6ff1c8
Fix communicator slow bug & fix communicator stop bug (#20366)
6 years ago
123malin b4a3b75002
bug fix: invalid learning rate decay in pserver async mode (#20325)
6 years ago
tangwei12 b5a410466c
Trainer heartbeat for async mode (#19600)
6 years ago
Chengmo 728ec1b43d
Add GEO-SGD distribute training algorithm (#20018)
6 years ago
tangwei12 8f0b3c0516
the integrated communicator (#19849)
6 years ago
gongweibao 57f0f0f2dc
Delete pserver complete file before executor running. (#19468)
6 years ago
tangwei12 65c7368400
Fix the correctness of async mode at distributed training (#18863)
6 years ago
tangwei12 19dac67e9f
fix distribute transpiler GRPC error code 4, RPC Deadline (#18984)
6 years ago
zhang wenhui 539c870753
add fl_listen_and_serv &fl_transpiler,test=develop (#19091)
6 years ago
gongweibao fd4b15a2f6
Unset unittests http_proxy env to avoid timeout. (#19269)
6 years ago
Leo Chen 80eab822c1 Remove unused DefaultGradOpDescMaker in REGISTER_OPERATOR() (#19166)
6 years ago
HaoRen b7128bac5f supports collective communicated training (#18175)
6 years ago
gongweibao 0d561ef442
fix 2dconn test=develop (#17681)
6 years ago
gongweibao 65bbf950ee
Add multi-ncclcomm and 2D ncclallreduce support. (#17263)
6 years ago
chengduo b5f4d5ed0e
Add broadcast operators (#17503)
6 years ago
Qiao Longfei 58f7695ab2
Async exe support communicator (#17386)
6 years ago
Yan Xu 0217555530 polish parallel dygraph code (#17164)
6 years ago
Yan Xu 0b07eef118
ParallelDyGraph with GPU collective mode (#16827)
7 years ago
Qiao Longfei 0e663d7f51 fix split_byref_op infer shape
7 years ago
Qiao Longfei 8b8a0487c7 fix compile test=develop
7 years ago
Qiao Longfei a541c25ab6 fix cpplint test=develop
7 years ago
Qiao Longfei 0608f8ca56 Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into add-async_sparse_param_update_recorder
7 years ago
Qiao Longfei fb6cc3a1bd follow commnet, optimize code and add comment test=develop
7 years ago
Qiao Longfei b542639dc0 code clean test=develop
7 years ago
Qiao Longfei 30618409db Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into add-async-ssa-graph-executor-communicator
7 years ago
Qiao Longfei be0c482304 update trainer_id
7 years ago
Qiao Longfei de65398cb8 update transpiler and listen and serv op
7 years ago
Wu Yi 6382b62f6b
Collective ops (#15572)
7 years ago
minqiyang b40e41fbd1 Polish code style
7 years ago
Qiao Longfei ea0df4e8a2 add some check
7 years ago
minqiyang db0c970823 Polish code
7 years ago
minqiyang 438bca9c3d Implement Runtime Var Type Inference
7 years ago
minqiyang ca392c7e97 Implement infer var type context
7 years ago
Qiao Longfei 065b68b6ca clean code
7 years ago
Qiao Longfei 63cd70a8b8 fix blocking problem
7 years ago