Commit Graph

45 Commits (ee76ea72de46df2f9f79c1aa96030362a6000ee7)

Author SHA1 Message Date
Qi Li a60d93fb77
[ROCM] update fluid framework for rocm (part2), test=develop (#31010)
5 years ago
Thunderbrook 565354f676
support save multi sparse table in one path (#31108)
5 years ago
Thunderbrook 0073f9bdb0
support ps-gpu (#28752)
5 years ago
Thunderbrook 6f69a4cb05
add xpu in heter mode (#27000)
5 years ago
wanghuancoder df43905f12
use iwyu clean include (#27267)
5 years ago
yaoxuefeng a47d92d868
fleet add save with whitelist test=develop (#23376)
5 years ago
Thunderbrook 0cb60c700d
add heter ps mode (#25682)
5 years ago
xujiaqi01 1034ca316f
add timeout and http store in communication (#23436)
6 years ago
xujiaqi01 d98084e7ec
add save with prefix (#23449)
6 years ago
xujiaqi01 3a45767d49
add fleet pslib pull and push sparse op and push dense op (#23139)
6 years ago
xujiaqi01 68ea1ad55b
add clear one table (#23089)
6 years ago
yaoxuefeng 2235ee1a5e
multi-loss optimization by adding a DownpourOpt worker (#22025)
6 years ago
xujiaqi01 e3a457d34b
add collective communication library in fleet (#22211)
6 years ago
Thunderbrook 9a7832f8be
print table stat info for pslib (#21296)
6 years ago
Thunderbrook 0d17c1b816
solve pslib core in stop worker (#21263)
6 years ago
Thunderbrook 349e82d669
support general embedding params (#21217)
6 years ago
xujiaqi01 23876de55b
fix cache table bug, add save_paddle_inference_model, fix hdfs util bug (#21052)
6 years ago
xujiaqi01 9e045170c0
add copy table (#21086)
6 years ago
xujiaqi01 48669aa8f0
fix several sparse table issuses (#20686)
6 years ago
xujiaqi01 cedc04775c
support change shuffle and train thread num (#19841)
6 years ago
xujiaqi01 6bf298bf09
support preload thread, optimize hdfs log, fix master+patch bug (#19695)
6 years ago
jiaqi b104ea0684
add get_last_save_xbox_base/get_last_save_xbox (#19122)
6 years ago
yaoxuefeng 9150cf50fc
add save cache model api in fleet& add slots shuffle in dataset module & add metric op to calculate ctr related metrics (#18871)
6 years ago
Thunderbrook 52c1431eee
add clear_model interface in fleetwrapper (#18815)
6 years ago
fuyinno4 c167a4b4dd
Fix shrink-dense and add scale-datanorm (#18746)
6 years ago
Thunderbrook d8396281ef
add slot to sparse table (#18686)
6 years ago
jiaqi d18aabb472
support patch data, add load_one_table, fix bug (#18509)
6 years ago
jiaqi 66d51206b1
add save/load model, shrink table, cvm, config file & fix pull dense bug (#17118)
7 years ago
xjqbest 782ab2e2bd add some doc
7 years ago
xjqbest a99c8d0c29 fix client to client communication bug
7 years ago
dongdaxiang 98dda08a85 fix pull sparse slow problem
7 years ago
dongdaxiang d4514949bf remove local random engine in fleet with rand_r()
7 years ago
dongdaxiang a0b59773af fix code style
7 years ago
xjqbest a34fe6248f add some doc
7 years ago
xujiaqi01 f5c6a14b54 fix runtime error
7 years ago
xujiaqi01 a5b1a0e12b support multi dataset && add init model && fix bug
7 years ago
dongdaxiang 317eb0aad3 add incubate for unified API
7 years ago
xujiaqi01 39449ba0b9 fix bug && add DestroyReaders in trainer
7 years ago
xujiaqi01 ecfc7df913 add dataset factory && fix style
7 years ago
xujiaqi01 3cea00bd52 store memory data in Dataset && fix bug
7 years ago
dongdaxiang be757096da add pybind for fleet
7 years ago
dongdaxiang f2bde9c241 fix destructor problem
7 years ago
dongdaxiang 378037c535 make s_instance_ private to ensure singleton
7 years ago
dongdaxiang c165012031 refine device_worker and trainer code
7 years ago
dongdaxiang 855bf579d2 add dist_multi_trainer for distributed training, add trainer_factory and device_worker_factory so that we can easily extend new training mode, add pull dense worker which is a singleton for parameter fetching
7 years ago