Helin Wang
b8461c79fc
implement init parameters selection with etcd
8 years ago
helinwang
544c7db780
Merge pull request #3223 from helinwang/master_timeout
...
Master persist more states to etcd, schedule pending timeout after lo…
8 years ago
Helin Wang
01a62511b4
add curPass into log, remove JobTasks
8 years ago
Helin Wang
10794cf4de
Master persist more states to etcd, schedule pending timeout after load pending state.
8 years ago
Helin Wang
33fb8d7abf
fix according to comments
8 years ago
Helin Wang
5ce7703ce8
fix test by not triggering save checkpoint when not intended(change save duration to 1 hour)
8 years ago
Helin Wang
2ee418db78
fix pserver save / load checkpoint
8 years ago
Yancey
ec9d4d527e
Add start_record interface ( #3128 )
...
* add start_record interface
* call master client in reader
* update
* add demo code in comments
* update comments
* delete unittest for recordio reader
8 years ago
Yancey
53ea896996
Add master server unit test ( #3086 )
...
* add master server unit test
* fix comments
* use t.Log
* fix travis can not fetch git repo
* fix git repo
8 years ago
Helin Wang
6fab04f4e1
fix vet shadow report
8 years ago
Helin Wang
54eac40f64
fix according to comments
8 years ago
Helin Wang
42fe3e88c7
gracefully shutdown pserver, fix gometalinter errors
8 years ago
Helin Wang
cb5c7526e5
shutdown master server gracefully
8 years ago
武毅
c10121e13c
[Done] Sync master client between passes and fix recordio split ( #2948 )
...
* fix recordio split and task passes
* update for pre commit
* update
* update, still need to sync client wait for pass end.
* able to sync passes for task dispatching
* update to comment
* update
* fix yapf check
* why local pre-commit fails? version is the same
* fix race condition
* update
* fix race condition
* this still have duplicate problem in unit test
* update
* update
* update by comment
* update
8 years ago
武毅
39af255959
Fix new optimizer lr ( #3074 )
...
* default learning rate, temperary fix
* update
8 years ago
Helin Wang
c67d8276b7
fix according to comments
8 years ago
Helin Wang
3ff0a9fbb1
Implement distributed training save model, improve master.NewClient interface
8 years ago
helinwang
bea4056531
Merge pull request #2997 from helinwang/checkpoint
...
do not do log.Errorln when checkpoint is not found (which is normal)
8 years ago
Helin Wang
a46198e5b1
fix client discover pserver context cancelled
...
It's already fixed by Wuyi's PR, but his PR may take some time to
merge, but I want to get this change in ASAP.
8 years ago
Helin Wang
a7e69d949f
do not do log.Errorln when checkpoint is not found (which is normal)
8 years ago
dongzhihong
e1e7309789
boring copyright
8 years ago
Helin Wang
25e57949cc
add more linters, fix errors found by them.
8 years ago
Helin Wang
5d7bccb2a3
fix golint errors
8 years ago
Helin Wang
2b1cac4113
Handle all unchecked errors
...
Unchecked errors could be handled by: cd go; gometalinter --vendor --disable-all --enable errcheck $(glide nv)
8 years ago
Yancey
83f263e6ec
Fix fetch record from master failed ( #2848 )
...
Fix fetch record from master
8 years ago
武毅
23b8346072
Fault tolerant distributed training, just work version, with etcd ( #2849 )
...
* using etcd as fault tolerant training
* update
* workable version, ft not tested
* small fix
* update
* remove TODO
8 years ago
Helin Wang
9eb9b2c29c
fix race condition in test
8 years ago
Helin Wang
777a5cca91
Client test: concurrently init param. Concurrently send grad and get param
8 years ago
Helin Wang
11660eab0e
Fix optimizer parameter buffer allocation size.
...
The buffer allocation size should be number of bytes, not number of
floats.
8 years ago
dongzhihong
7cfcda5f4f
"fix checkpoint pointer"
8 years ago
dongzhihong
46c704ecf0
"fix init error"
8 years ago
gangliao
e6e2bf45e5
Merge pull request #2832 from helinwang/go_cmake
...
go_binary: remove hardcoded library link path, add pserver client test
8 years ago
Yancey
19bfb8a1f2
PServer recovery from checkpoint ( #2741 )
...
* Server recovery from checkpoint
8 years ago
Helin Wang
b04986da9f
add pserver client test
8 years ago
Helin Wang
2231b92a89
go_binary: remove hardcoded library link path
8 years ago
helinwang
f5f7d6bd4f
Merge pull request #2811 from helinwang/go_test_1
...
Add go testing into cmake
8 years ago
Helin Wang
59287cd1ca
add .gitignore
8 years ago
Helin Wang
e4be077ffa
Add go testing into cmake and fix libpaddle_go_optimizer.a link path
8 years ago
武毅
bcf9f421c3
Merge pull request #2774 from typhoonzero/fix_newupdater
...
Fix new remote updater for go pserver
8 years ago
gongweibao
dd8685ff1c
fix bug
8 years ago
gongweibao
d05d19ba03
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into taskfail
8 years ago
gongweibao
b64c7a635d
fix by helin's comments
8 years ago
gongweibao
a40a7a5cb1
fix by helin's comments
8 years ago
yi.wu
5a4f33df7e
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into fix_newupdater
8 years ago
dzhwinter
e8296ff291
restart teamcity JOB
8 years ago
dongzhihong
0ad7053e96
"make parameterCheckpoint exported"
8 years ago
dongzhihong
87e7924e4e
"pserver flags type error"
8 years ago
dongzhihong
774604cdb8
"add more NewService argument"
8 years ago
dongzhihong
40295b9ed9
"fix pserver saving etcd"
8 years ago
wuyi05
26d95a6bbf
fix new remote updater for go pserver
8 years ago