Commit Graph

102 Commits (59c049995e036f80fc7e068a432037a9c8a4a014)

Author SHA1 Message Date
wanghuancoder df43905f12
use iwyu clean include (#27267)
5 years ago
tangwei12 bc5f0246a8
large scale kv speedup (#26510)
5 years ago
Chen Weihang 41b5955538
Polish no onwer ops error message (#27448)
5 years ago
Yi Liu e9a0fbfff2
OP报错信息优化 (#27301)
5 years ago
ShenLiang c296618c94
fix error message in broadcast/allreduce/gather (#27302)
5 years ago
Chengmo d0962abd20
supplement bug fix of parameter server (#26217)
5 years ago
Chengmo 7f2aa2db3c
【paddle.fleet】Support Heter Parameter Server (#25998)
5 years ago
Chengmo eeeef957c7
Fix ps gpu (#26218)
5 years ago
tangwei12 caa90a6510
Integrated Trainer of Parameter Server (API add `fluid.contrib.layers.sparse_embedding` only) (#22957)
5 years ago
tangwei12 4b3778a3ee
Revert/barrier for sync (#25417)
5 years ago
Chengmo e85fcaa712
Fix fluid.embedding in Distributed Training (#25174)
5 years ago
tangwei12 be6a315fbd
Fix/sync barrier (#25016)
5 years ago
Chen Weihang d1062d5278
Replace all errors thrown by LOG(FATAL) with PADDLE_THROW (#24759)
5 years ago
Yi Liu 12bffdc086
Enhance error message of checkpoint_notify_op, fake_init_op gen_nccl_id_op and listen_and_serv_op (#24554)
5 years ago
yaoxuefeng 16817c70fc
OP(datanorm lookupsparsetable lookuptable) error message enhancement (#24506)
5 years ago
ShenLiang 53e3c53423
fix error message, test=develop (#24425)
5 years ago
gongweibao f1c57d648c
Enhance error message of prefetch_op, proximal_adagrad_op, proximal_gd_op (#24436)
5 years ago
tangwei12 a97d5a6153
fix op error, test=develop (#24451)
5 years ago
Chen Weihang aa0f254fbe
Add macro BOOST_GET to enrich the error information of boost :: get (#24175)
5 years ago
liuwei1031 9a93f6aae0
improve efficiency of runtime InferVarType (#22778)
5 years ago
tianshuo78520a 433cef03e5
fix typo word (#22784)
5 years ago
guofei ae8b5f11a3
Change ShareDataWith() to TensorCopy() in ref_by_trainer_id (#22717)
5 years ago
tangwei12 66a3150135
SYNC with communicaotor (#22344)
5 years ago
Wilber de009152a7 Compile without nccl deps. [2/2] (#22484)
5 years ago
Wilber 7bc4b09500
add WITH_NCCL option for cmake. (#22384)
5 years ago
tangwei12 82bc814a57
integrated HALF_ASYNC to communicator (#21869)
6 years ago
zhangchunle 805328e13b fix typo in error message (#22312)
6 years ago
123malin 985bceac53
Bug fix for sparse recorder (#21969)
6 years ago
123malin 7fb817d447
add distributed_strategy (#21710)
6 years ago
zhouwei25 a01663ca1f remove patch command and file of cares to Improved quality of Paddle Repo (#21776)
6 years ago
Chen Weihang 1fd1f06f11 Rename paddle throw error macro (#21657)
6 years ago
tangwei12 9ad940fdfe
memory leak for cpu (#21174)
6 years ago
Tao Luo 70eb397677
remove unused snappy/snappystream depends in distributed codes (#21484)
6 years ago
tangwei12 0bddb951c2
fix async mode, test=develop (#21367)
6 years ago
hong ac8546701d
Add dygraph execution context (#20157)
6 years ago
123malin 20cdff0e02
Optimize decay (#20816)
6 years ago
hong 8c4573a3cb
GradMaker for dygraph (#19706)
6 years ago
Chen Weihang 26cc1fe508
Replace risky GetInputType method with secure IndicateVarDataType interface (#20668)
6 years ago
Chengmo 940c6ff1c8
Fix communicator slow bug & fix communicator stop bug (#20366)
6 years ago
123malin b4a3b75002
bug fix: invalid learning rate decay in pserver async mode (#20325)
6 years ago
tangwei12 b5a410466c
Trainer heartbeat for async mode (#19600)
6 years ago
Chengmo 728ec1b43d
Add GEO-SGD distribute training algorithm (#20018)
6 years ago
tangwei12 8f0b3c0516
the integrated communicator (#19849)
6 years ago
gongweibao 57f0f0f2dc
Delete pserver complete file before executor running. (#19468)
6 years ago
tangwei12 65c7368400
Fix the correctness of async mode at distributed training (#18863)
6 years ago
tangwei12 19dac67e9f
fix distribute transpiler GRPC error code 4, RPC Deadline (#18984)
6 years ago
zhang wenhui 539c870753
add fl_listen_and_serv &fl_transpiler,test=develop (#19091)
6 years ago
gongweibao fd4b15a2f6
Unset unittests http_proxy env to avoid timeout. (#19269)
6 years ago
Leo Chen 80eab822c1 Remove unused DefaultGradOpDescMaker in REGISTER_OPERATOR() (#19166)
6 years ago
HaoRen b7128bac5f supports collective communicated training (#18175)
6 years ago