Thunderbrook
09b6e71928
heter box ( #29734 )
...
* add heter box
* add trainer, worker, wrapper...
* format
* for ci
* format
* remove boost get
* boost & copyright
* rename
* rename
* format
* format
* format
Co-authored-by: yaoxuefeng6 <yaoxuefeng@baidu.com>
4 years ago
yaoxuefeng
545df287fc
add user_define_dump ( #28596 )
4 years ago
lilong12
f77a78cdee
enable pipeline to run with Executor.run() ( #28373 )
...
* update, test=develop
4 years ago
Thunderbrook
0073f9bdb0
support ps-gpu ( #28752 )
...
* ps gpu transpile
* ps gpu
* remove op
* gps trainer
* local ps
* add macro
* HeterBox
* def cuda
* tab
* code style
* style
Co-authored-by: Thunderbrook <a754913769#163.com>
4 years ago
Thunderbrook
6f69a4cb05
add xpu in heter mode ( #27000 )
...
* add xpu in heter mode
test=develop
* BOOST_CONST_GET; PADDLE_THROW
test=develop
* code style
test=develop
* code style
test=develop
* code style
test=develop
* refine
test=develop
* refine
test=develop
* refine
test=develop
* refine code
test=develop
5 years ago
wanghuancoder
df43905f12
use iwyu clean include ( #27267 )
...
* use iwyu clean include, test=develop, test=win
* compilation error, test=develop
* fix compilation error2, test=develop
* fix compilation error3, test=develop
* fix compilation error4, test=develop
* fix compilation error5, test=develop
* fix compilation error6, test=develop
* fix compilation error7, test=develop
* fix compilation error8, test=develop
* fix compilation error8, test=develop
* fix compilation error10, test=develop
* fix compilation error11, test=develop
5 years ago
Thunderbrook
0cb60c700d
add heter ps mode ( #25682 )
...
* add heter ps mode
* code style
test=develop
* add with_pslib
test=develop
* unitest
test=develop
* code style
test=develop
* code style
test=develop
* code style
test=develop
* code style
test=develop
* code style
test=develop
* code style
test=develop
* code style
test=develop
* code style
test=develop
* test monitor
test=develop
* prepare trainer
test=develop
* code style
test=develop
5 years ago
lilong12
e39aa70ec7
add the support for pipeline ( #24560 )
...
* add device_worker for pipeline, test=develop
5 years ago
hutuxian
0ec3a42e97
Random Dump ( #24477 )
...
* Refactor code for dump_field & dump_param: abstracting the common function in base class.
* Support dump randomly & random with lineid
* Support specify the random interval, which avoids printing too much logs.
5 years ago
xujiaqi01
3a45767d49
add fleet pslib pull and push sparse op and push dense op ( #23139 )
...
* add fleet pslib pull and push sparse op and push dense op
* test=develop
5 years ago
hutuxian
175954d894
PaddleBox Framework Part2 ( #22466 )
...
* Add two types of Metric Calculator: MultiTaskCalculator & CmatchRankCalculator.
* Add a config for DynamicAdjustChannelNum function to denote whether we will discard the remaining instances when they are not be distributed evenly.
* Remove CPU code in Pull/PushSparse and we will add it back when testing it fully.
* Fix some known issues: such as copying persistable vars after one epoch running.
5 years ago
123malin
00594c1c88
support dumping params/grads in transpiler mode ( #22490 )
5 years ago
Wilber
a90fa54092
Compile without nccl deps. [1/2] ( #22509 )
...
支持不依赖nccl进行编译。[1/2]
多卡下,如果没有打开WITH_NCCL开关编译,多卡不能通信,则只能选择一张卡使用。
Co-authored-by: 石晓伟 <39303645+Shixiaowei02@users.noreply.github.com>
5 years ago
hutuxian
c5aec2fe68
Paddlebox Related to Framework ( #21586 )
...
* Add a single_process_multi_thread transpiler.
* Add some UTs.
* Fix some API description.
5 years ago
Thunderbrook
59bcdc8a19
support dump param of model into afs ( #20302 )
...
* support dump param to afs
test=develop
* code style
test=develop
* code style
test=develop
* dump param
test=develop
* dump param
test=develop
* dump param
test=develop
* dump param
test=develop
5 years ago
Chengmo
940c6ff1c8
Fix communicator slow bug & fix communicator stop bug ( #20366 )
...
* test=develop,Fix communicator slow bug
* test=develop, delete if() in stop_worker()
* test=develop
* fix UT, test=develop
* fix bug in fetch handler, test=develop
* fix bug in fetch handler, test=develop
* test=develop, fix fetch barrier bug
* test=develop, bug fix
* test=develop, bug fix
* test=develop, fix bug
5 years ago
Thunderbrook
f76a32df4a
dump fix dov vec file num ( #20539 )
...
* support dump multi file
test=develop
* dump fix num file
test=develop
5 years ago
tangwei12
c9139c3db3
trainer from dataset fetch targets ( #19760 )
...
add executor.FetchHandler for train/infer from the dataset
6 years ago
yaoxuefeng
10ca3f9609
add thread scope stat accurate metrics test=develop ( #19480 )
...
* add thread scope stat accurate metrics test=develop
* fix style
* fix style
* fix style
* fix style test=develop
* fix style test=develop
* fix style test=develop
* fix style test=develop
* fix style test=develop
* fix style test=develop
* fix style test=develop
* fix conflict
* fix style
* fix style test=develop
* fix error test=develop
* fix error test=develop
6 years ago
Thunderbrook
1fe468d319
support debug each output of each ins ( #19004 )
...
* dump slot
* test
* proto
* dump slot
* test
* proto
* code style
* code style
* code style
* style
* add delete after unseen days
* add unseen days
* code style
* conflict solve
test=develop
* add clear model
* code style
test=develop
* code style
test=develop
* support debug tensor of each ins
test=develop
* support debug tensor of each ins
test=develop
* learning rate
* code style
* code style
* code style
* code style
* code style
* code style
* code style
* code style
* code style
* code style
* code style
* code style
* code style
test=develop
* code style
test=develop
* unitest
* style
* style
* multi phase
* add channel
* code style
* style
* style
* unitest
* style
* define
* define
test=develop
* style
test=develop
* rm define
test=develop
* linux
* linux
test=develop
* style
test=develop
* output format
test=develop
* windows ci
test=develop
6 years ago
jiaqi
3f8031e256
dataset ( #17973 )
...
(1) use channel instead of vector/BlockingQueue in Dataset,to keep same with existing implementation, and make code more readable and flexible (dataset single output channel or multi output channel). one previous memory out of limit problem is cause by not release memory after training.
(2) add Record because MultiSlotType costs too much memory (80B),fix memory out of limit problem.
(3) add Channel, Archive in paddle/fluid/framework
(4) change dataset from shared_ptr to unique_ptr in pybind
(5) move create/destroy readers from trainer to dataset
(6) move shuffle from datafeed to dataset. dataset holds memory, datafeed is only for load data and feed data to network.
(7) fix thread num bug of Dataset when filelist size < thread num
(8) support set_queue_num in InMemoryDataset
6 years ago
hutuxian
969e6378b9
Pipeline Concurrency ( #17402 )
...
Add Pipeline Concurrency Train Mode:
- Cpp: pipeline_trainer & section_worker
- Python: PipelineOptimizer
- Add a new data_feed type: PrivateInstantDataFeed
- Add a test demo of pipeline trainer and the test model is gnn
- Do not support win32 now
6 years ago
dongdaxiang
9e51ad4a65
fix io and fs compile on mac
...
test=develop
6 years ago
dongdaxiang
4ce35815fb
fix windows GLOG problem
...
test=develop
6 years ago
dongdaxiang
6af697adb0
add trainfileswithprofiler for downpour worker
6 years ago
xujiaqi01
39449ba0b9
fix bug && add DestroyReaders in trainer
6 years ago
dongdaxiang
ff87698a44
refactor downpour optimization
6 years ago
xjqbest
dd67ad08a2
modify c++ and python dataset related code & fix bug
6 years ago
dongdaxiang
2486389793
add RunFromDataset in executor
6 years ago
xjqbest
824b84d185
add DataSet and InMemoryDataFeed, support load data into memory and shuffle data
6 years ago
dongdaxiang
c165012031
refine device_worker and trainer code
...
test=develop
6 years ago
dongdaxiang
855bf579d2
add dist_multi_trainer for distributed training, add trainer_factory and device_worker_factory so that we can easily extend new training mode, add pull dense worker which is a singleton for parameter fetching
6 years ago