Xin Pan
a9086bf320
also move a few other dir to legacy/
7 years ago
yuyang18
a229734cbd
Remove cpplint in cmake
7 years ago
qingqing01
24509f4af9
Fix the grammar in copyright. ( #8403 )
7 years ago
dzhwinter
e983cc90fc
"fix decode bug" ( #7711 )
...
* "fix decode bug"
* "follow commnet"
* "fix error"
* "fix hook bug"
* fix based comment
* fix copyright
* fix based on comment
7 years ago
dzhwinter
b9b75377a2
Feature/hooks ( #7513 )
...
* add copyright hook
* add copyright hook
* refine copyright hook
* "test copyright hook"
* fix check style
* fix ci
7 years ago
Luo Tao
761b329793
unify the indentation of license
7 years ago
gongweibao
8d1ad97b3d
Add log to `InitParam` `GetParameter` `SendGrad` and etc. ( #5162 )
...
* add logs and fix a bug
* fix break buf
* modify path bugs
* fix by comments
* fix by comments
* add batch
* add float32tostring
* add pb support
* moidfy gotpaht
* compile ok
* add proto
* delete not need
* add proto
* add empty proto
* clean not need
* clean not need
* modify deps
* fix by comments and update depend
* fix compile error
* fix loop bugs
7 years ago
gongweibao
8c9119afcd
add logs and fix a bug ( #5074 )
...
add logs and fix a python path bug
7 years ago
Helin Wang
00e2dcf37a
Fix according to comments
7 years ago
Helin Wang
32c92640f0
Fix pserver checkpoint
...
The pserver checkpoint before failed because the MD5 checksum is
calculated incorrectly. Now changed to CRC32 checksum.
7 years ago
Helin Wang
fc57c09dc9
add detailed log for the pserver
8 years ago
Helin Wang
60238a1bfb
Go master, pserver, trainer: switch to log15, away from logrus
8 years ago
Helin Wang
f28b4d6805
Fix parameter server checkpoint serialization
8 years ago
武毅
40f3e0c194
fix_fault_tolerant_dist_lock ( #4888 )
8 years ago
武毅
0c72649afc
Fix gometalinter versioning ( #4832 )
...
* fix gometalinter versioning
* stop gometalinter
8 years ago
Helin Wang
05176bd1bb
master server will wait etcd forever
8 years ago
Helin Wang
5270585e10
fix according to comment
8 years ago
Helin Wang
da7a1f2f6c
master client: retry connecting to etcd
8 years ago
武毅
886e66a5ff
golang pserver use OptimizerConfig.proto ( #3358 )
...
* golang pserver optimizer config for user
* update
* update
* update
* update
* update by comments
* fix errors
* fix errors
8 years ago
Helin Wang
f64539bef9
use random port for embed etcd to avoid port collision
8 years ago
Helin Wang
b8461c79fc
implement init parameters selection with etcd
8 years ago
helinwang
544c7db780
Merge pull request #3223 from helinwang/master_timeout
...
Master persist more states to etcd, schedule pending timeout after lo…
8 years ago
Helin Wang
01a62511b4
add curPass into log, remove JobTasks
8 years ago
Helin Wang
10794cf4de
Master persist more states to etcd, schedule pending timeout after load pending state.
8 years ago
Helin Wang
33fb8d7abf
fix according to comments
8 years ago
Helin Wang
5ce7703ce8
fix test by not triggering save checkpoint when not intended(change save duration to 1 hour)
8 years ago
Helin Wang
2ee418db78
fix pserver save / load checkpoint
8 years ago
Yancey
ec9d4d527e
Add start_record interface ( #3128 )
...
* add start_record interface
* call master client in reader
* update
* add demo code in comments
* update comments
* delete unittest for recordio reader
8 years ago
Yancey
53ea896996
Add master server unit test ( #3086 )
...
* add master server unit test
* fix comments
* use t.Log
* fix travis can not fetch git repo
* fix git repo
8 years ago
Helin Wang
6fab04f4e1
fix vet shadow report
8 years ago
Helin Wang
54eac40f64
fix according to comments
8 years ago
Helin Wang
42fe3e88c7
gracefully shutdown pserver, fix gometalinter errors
8 years ago
Helin Wang
cb5c7526e5
shutdown master server gracefully
8 years ago
武毅
c10121e13c
[Done] Sync master client between passes and fix recordio split ( #2948 )
...
* fix recordio split and task passes
* update for pre commit
* update
* update, still need to sync client wait for pass end.
* able to sync passes for task dispatching
* update to comment
* update
* fix yapf check
* why local pre-commit fails? version is the same
* fix race condition
* update
* fix race condition
* this still have duplicate problem in unit test
* update
* update
* update by comment
* update
8 years ago
武毅
39af255959
Fix new optimizer lr ( #3074 )
...
* default learning rate, temperary fix
* update
8 years ago
Helin Wang
c67d8276b7
fix according to comments
8 years ago
Helin Wang
3ff0a9fbb1
Implement distributed training save model, improve master.NewClient interface
8 years ago
helinwang
bea4056531
Merge pull request #2997 from helinwang/checkpoint
...
do not do log.Errorln when checkpoint is not found (which is normal)
8 years ago
Helin Wang
a46198e5b1
fix client discover pserver context cancelled
...
It's already fixed by Wuyi's PR, but his PR may take some time to
merge, but I want to get this change in ASAP.
8 years ago
Helin Wang
a7e69d949f
do not do log.Errorln when checkpoint is not found (which is normal)
8 years ago
dongzhihong
e1e7309789
boring copyright
8 years ago
Helin Wang
25e57949cc
add more linters, fix errors found by them.
8 years ago
Helin Wang
5d7bccb2a3
fix golint errors
8 years ago
Helin Wang
2b1cac4113
Handle all unchecked errors
...
Unchecked errors could be handled by: cd go; gometalinter --vendor --disable-all --enable errcheck $(glide nv)
8 years ago
Yancey
83f263e6ec
Fix fetch record from master failed ( #2848 )
...
Fix fetch record from master
8 years ago
武毅
23b8346072
Fault tolerant distributed training, just work version, with etcd ( #2849 )
...
* using etcd as fault tolerant training
* update
* workable version, ft not tested
* small fix
* update
* remove TODO
8 years ago
Helin Wang
9eb9b2c29c
fix race condition in test
8 years ago
Helin Wang
777a5cca91
Client test: concurrently init param. Concurrently send grad and get param
8 years ago
Helin Wang
11660eab0e
Fix optimizer parameter buffer allocation size.
...
The buffer allocation size should be number of bytes, not number of
floats.
8 years ago
dongzhihong
7cfcda5f4f
"fix checkpoint pointer"
8 years ago