Commit Graph

190 Commits (14f98c88e87caa36589bb6276e4e74fd63ea6ccd)

Author SHA1 Message Date
Helin Wang f64539bef9 use random port for embed etcd to avoid port collision
8 years ago
Helin Wang b8461c79fc implement init parameters selection with etcd
8 years ago
helinwang 544c7db780 Merge pull request #3223 from helinwang/master_timeout
8 years ago
Helin Wang 01a62511b4 add curPass into log, remove JobTasks
8 years ago
Helin Wang 10794cf4de Master persist more states to etcd, schedule pending timeout after load pending state.
8 years ago
Helin Wang 33fb8d7abf fix according to comments
8 years ago
Helin Wang 5ce7703ce8 fix test by not triggering save checkpoint when not intended(change save duration to 1 hour)
8 years ago
Helin Wang 2ee418db78 fix pserver save / load checkpoint
8 years ago
Yancey ec9d4d527e Add start_record interface (#3128)
8 years ago
Yancey 53ea896996 Add master server unit test (#3086)
8 years ago
Helin Wang 6fab04f4e1 fix vet shadow report
8 years ago
Helin Wang 54eac40f64 fix according to comments
8 years ago
Helin Wang 42fe3e88c7 gracefully shutdown pserver, fix gometalinter errors
8 years ago
Helin Wang cb5c7526e5 shutdown master server gracefully
8 years ago
武毅 c10121e13c [Done] Sync master client between passes and fix recordio split (#2948)
8 years ago
武毅 39af255959 Fix new optimizer lr (#3074)
8 years ago
Helin Wang c67d8276b7 fix according to comments
8 years ago
Helin Wang 3ff0a9fbb1 Implement distributed training save model, improve master.NewClient interface
8 years ago
helinwang bea4056531 Merge pull request #2997 from helinwang/checkpoint
8 years ago
Helin Wang a46198e5b1 fix client discover pserver context cancelled
8 years ago
Helin Wang a7e69d949f do not do log.Errorln when checkpoint is not found (which is normal)
8 years ago
dongzhihong e1e7309789 boring copyright
8 years ago
Helin Wang 25e57949cc add more linters, fix errors found by them.
8 years ago
Helin Wang 5d7bccb2a3 fix golint errors
8 years ago
Helin Wang 2b1cac4113 Handle all unchecked errors
8 years ago
Yancey 83f263e6ec Fix fetch record from master failed (#2848)
8 years ago
武毅 23b8346072 Fault tolerant distributed training, just work version, with etcd (#2849)
8 years ago
Helin Wang 9eb9b2c29c fix race condition in test
8 years ago
Helin Wang 777a5cca91 Client test: concurrently init param. Concurrently send grad and get param
8 years ago
Helin Wang 11660eab0e Fix optimizer parameter buffer allocation size.
8 years ago
dongzhihong 7cfcda5f4f "fix checkpoint pointer"
8 years ago
dongzhihong 46c704ecf0 "fix init error"
8 years ago
gangliao e6e2bf45e5 Merge pull request #2832 from helinwang/go_cmake
8 years ago
Yancey 19bfb8a1f2 PServer recovery from checkpoint (#2741)
8 years ago
Helin Wang b04986da9f add pserver client test
8 years ago
Helin Wang 2231b92a89 go_binary: remove hardcoded library link path
8 years ago
helinwang f5f7d6bd4f Merge pull request #2811 from helinwang/go_test_1
8 years ago
Helin Wang 59287cd1ca add .gitignore
8 years ago
Helin Wang e4be077ffa Add go testing into cmake and fix libpaddle_go_optimizer.a link path
8 years ago
武毅 bcf9f421c3 Merge pull request #2774 from typhoonzero/fix_newupdater
8 years ago
gongweibao dd8685ff1c fix bug
8 years ago
gongweibao d05d19ba03 Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into taskfail
8 years ago
gongweibao b64c7a635d fix by helin's comments
8 years ago
gongweibao a40a7a5cb1 fix by helin's comments
8 years ago
yi.wu 5a4f33df7e Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into fix_newupdater
8 years ago
dzhwinter e8296ff291 restart teamcity JOB
8 years ago
dongzhihong 0ad7053e96 "make parameterCheckpoint exported"
8 years ago
dongzhihong 87e7924e4e "pserver flags type error"
8 years ago
dongzhihong 774604cdb8 "add more NewService argument"
8 years ago
dongzhihong 40295b9ed9 "fix pserver saving etcd"
8 years ago