Commit Graph

265 Commits (59c049995e036f80fc7e068a432037a9c8a4a014)

Author SHA1 Message Date
tangwei12 65c7368400
Fix the correctness of async mode at distributed training (#18863)
6 years ago
gongweibao fd4b15a2f6
Unset unittests http_proxy env to avoid timeout. (#19269)
6 years ago
Zeng Jinle 708bd9798d
move_flags_to_unified_files_for_management, test=develop (#19224)
6 years ago
gongweibao 29d8781240
Polish fleet API to support cuda collective mode and nccl2 mode. (#18966)
6 years ago
tangwei12 999d9a59a5
fix communicator with pyreader (#18350)
6 years ago
Qiao Longfei 0e08e91c18
optimize communicator merge sparse gradient test=develop (#18159)
6 years ago
tangwei12 101f74cb19
fix save/load in fleet (#17675)
6 years ago
Zeng Jinle 3ece61f71e
Remove attribute in Allocator::Allocate (#17878)
6 years ago
Qiao Longfei 58f7695ab2
Async exe support communicator (#17386)
6 years ago
Tao Luo 3d19f44a89
remove unused SERIAL compiler option (#17500)
6 years ago
Qiao Longfei 287de41c04
Optimize communicator flags (#17494)
6 years ago
Qiao Longfei d831f1b0ba fix brpc code
6 years ago
Qiao Longfei 8b8a0487c7 fix compile test=develop
6 years ago
Qiao Longfei a541c25ab6 fix cpplint test=develop
6 years ago
Qiao Longfei 0608f8ca56 Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into add-async_sparse_param_update_recorder
6 years ago
gongweibao bf606bce8a
Fix grpc log message. (#16735)
6 years ago
Qiao Longfei 766666a957 add log for FLAGS_communicator_send_wait_times
6 years ago
Qiao Longfei 4031c1a7b1 fix ci build test=develop
6 years ago
Qiao Longfei 9861a92f6f change the return type of NewTempScope to unique ptr test=develop
6 years ago
Qiao Longfei fb6cc3a1bd follow commnet, optimize code and add comment test=develop
6 years ago
Qiao Longfei d8974e6da0 Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into add-async-ssa-graph-executor-communicator
6 years ago
Qiao Longfei 392e97aae5 fix cpplint test=develop
6 years ago
Qiao Longfei b65adf7f65 add communicator_send_wait_times
6 years ago
Qiao Longfei 63acbe7a65 fix bug
6 years ago
Qiao Longfei 0ff1e64fab fix a bug
6 years ago
Qiao Longfei 0997cf8f65 add more check
6 years ago
sneaxiy f8ed2c229e try to fix ci error
6 years ago
Qiao Longfei 93464b25ac update async_sparse_param_update_recorder
6 years ago
Qiao Longfei 542b52fac3 fix trainer_id
6 years ago
Qiao Longfei be0c482304 update trainer_id
6 years ago
Qiao Longfei c60f312d1b add trick
6 years ago
Qiao Longfei 103c9bb376 update rpc_client
6 years ago
Qiao Longfei b7661d7e56 add some log
6 years ago
Qiao Longfei e8fe5186a1 complete parameter_recv
6 years ago
Qiao Longfei d5c7898201 complete pserver side update
6 years ago
Qiao Longfei de65398cb8 update transpiler and listen and serv op
6 years ago
Qiao Longfei 25e2b41729 add AsyncSparseParamUpdateRecorder test
6 years ago
Qiao Longfei c6e82785aa init async_sparse_param_update_recorder
6 years ago
Qiao Longfei 039d783db5 change communicator_recv_wait_ms to communicator_max_send_grad_num_before_recv
6 years ago
Qiao Longfei ea0df4e8a2 add some check
6 years ago
Qiao Longfei 065b68b6ca clean code
6 years ago
Qiao Longfei 347178bd97 fix pserver memory leak
6 years ago
Qiao Longfei c567debcd9 optimize log
6 years ago
Qiao Longfei 0fcdae8418 add communicator_test
6 years ago
Qiao Longfei 9b74707cbf fix compile problem
6 years ago
Qiao Longfei 23d3929a4b optimize merge vars
6 years ago
Qiao Longfei d3a14377d5 add fake rpc to send
6 years ago
Qiao Longfei 43378ad626 add flags to init
6 years ago
Qiao Longfei ad5a2b3edf add some debug flags for communicator
6 years ago
Qiao Longfei 0a828fef82 add some flags for communicator
6 years ago
Qiao Longfei 63cd70a8b8 fix blocking problem
6 years ago
Qiao Longfei 3225e19591 fix remove recv op
6 years ago
Qiao Longfei 446fdf9563 fix compile problem
6 years ago
Qiao Longfei a23f1ee85a optimize code
6 years ago
Qiao Longfei 7d5dc4ef06 fix cmake list
6 years ago
Qiao Longfei 255b36dad2 can run
6 years ago
Qiao Longfei 8c38aca954 tmp commit
6 years ago
Qiao Longfei 13e8b5bf89 clear gradient before merge
6 years ago
Qiao Longfei 50601501e5 improve communicator
6 years ago
Qiao Longfei c2cce6bafa simplify parameter send and recv
6 years ago
Qiao Longfei 3c6b733d14 remove exe context
6 years ago
Qiao Longfei 9573d610ef use rpc common in parameter send and recv
6 years ago
Qiao Longfei 3691a46fa3 improve communicator
6 years ago
Qiao Longfei b8491bfd4e Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into add-communicator
6 years ago
Dun a83e470405
Profiler refine and add CUDA runtime api tracer (#15301)
6 years ago
Qiao Longfei a804a2ae2a complete parameter recv
6 years ago
Qiao Longfei a0585d08ed init parameter recv
6 years ago
Qiao Longfei 5c36eb8b69 fix build
6 years ago
Qiao Longfei 4356f186b4 complete parameter_send
6 years ago
Qiao Longfei 741b7cfda9 fix compile test=develop
6 years ago
Qiao Longfei 381f383989 Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into add-communicator
6 years ago
Qiao Longfei 657a4f9430 code can compile
6 years ago
Qiao Longfei c7e3868007 Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into add-communicator
6 years ago
sneaxiy ba4f43fd62 fix compile error in distributed mode
6 years ago
乔龙飞 Qiao Longfei 5f89ce7fcd
Merge pull request #15536 from jacquesqiao/fix-prefetch-one-parameter
6 years ago
Qiao Longfei 806658d72b add space after colon in commnet test=develop
6 years ago
Qiao Longfei 4d13434443 fix a little problem test=develop
6 years ago
Qiao Longfei 9c3910f390 IncreaseBatchBarrier should be in the right condition test=develop
6 years ago
Qiao Longfei 5a0c6593d5 revert RequestGetHandler
6 years ago
gongweibao d54494ba87
cleanup test=develop (#15347)
6 years ago
Qiao Longfei 84220765a7 refine code, add more log
6 years ago
Qiao Longfei c750be6d9d add some log
6 years ago
gongweibao fe8f28c957
Add GetVariableNoBarrier on brpc. (#15488)
6 years ago
Qiao Longfei 74040cb4aa code clean
6 years ago
Qiao Longfei 1866d2dbef parameter send support selected_rows
6 years ago
Qiao Longfei ca5d96bb3d complete send lod tensor
6 years ago
Qiao Longfei be72940b76 Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into add-communicator
6 years ago
tangwei12 8b50ad80ff
checkpoint at distributed training (#14854)
6 years ago
gongweibao f4dec5cdee
Check collective server's data. (#15449)
6 years ago
gongweibao 7f8b40f68d
Fix brpc complation error. (#15451)
6 years ago
gongweibao 7ab4af2716
Fix brpc compilation. (#15417)
6 years ago
Wu Yi 7e651a38dd
fix mac cmake version 3.13 build (#15386)
6 years ago
Qiao Longfei b5aefc8b6d fix compile problem
6 years ago
Qiao Longfei 9958775b31 add NewTmpScope to scope
6 years ago
Yiqun Liu f413b6892b
Revert the modification of while_op in #14764. (#15372)
6 years ago
Yiqun Liu 568cc2ffa8
Optimize while_op for test (#14764)
6 years ago
乔龙飞 Qiao Longfei e1679b8847
Merge pull request #14893 from JiabinYang/feature/add_prefech_hs
6 years ago
gongweibao ce70229ba6
Add max_body_size flags to brpc (#15084)
6 years ago
Wu Yi a8bc05b5ff
Refactor distributed RPC (#15075)
6 years ago
minqiyang 8ec3d863b0 Fix throw_on_error direct call bug
6 years ago