Commit Graph

9 Commits (17299b8d217c0872408cc9146a58f0769d8b05ba)

Author SHA1 Message Date
123malin 20cdff0e02
Optimize decay (#20816)
5 years ago
123malin b4a3b75002
bug fix: invalid learning rate decay in pserver async mode (#20325)
5 years ago
tangwei12 b5a410466c
Trainer heartbeat for async mode (#19600)
5 years ago
123malin 1bc285a53a
add retry function to try to solve grpc error code 14 (#19661)
6 years ago
gongweibao 29d8781240
Polish fleet API to support cuda collective mode and nccl2 mode. (#18966)
6 years ago
Qiao Longfei 103c9bb376 update rpc_client
6 years ago
Dun a83e470405
Profiler refine and add CUDA runtime api tracer (#15301)
6 years ago
tangwei12 8b50ad80ff
checkpoint at distributed training (#14854)
6 years ago
Wu Yi a8bc05b5ff
Refactor distributed RPC (#15075)
6 years ago