jiaqi
66d51206b1
add save/load model, shrink table, cvm, config file & fix pull dense bug ( #17118 )
...
* add save/load model, shrink table, cvm, config file & fix pull dense bug
test=develop
* fix global shuffle bug, fix pull dense bug, fix release memeory bug, fix shrink error
add client flush, add get data size
test=develop
* fix global shuffle bug
test=develop
* fix global shuffle bug
test=develop
* fix code style
test=develop
* fix code style & modify pslib cmake
test=develop
* fix error of _role_maker
test=develop
* fix code style
test=develop
* fix code style
test=develop
* fix code style
test=develop
* fix code style
test=develop
* fix code style
test=develop
* fix windows compile error of fleet
test=develop
* fix global shuffle bug
* add comment
test=develop
* update pslib.cmake
test=develop
* fix fill sparse bug
test=develop
* fix push sparse bug
test=develop
6 years ago
Tao Luo
68ec0a6f74
make parallel_executor support FLAGS_use_mkldnn ( #17341 )
...
* make parallel_executor support FLAGS_use_mkldnn
test=develop
* add warning when set mkldnn_enabled_op_types_ in non-mkldnn env
test=develop
6 years ago
chengduo
bc833945a4
Add DropLocalExeScopes in ParallelExecutor ( #17297 )
...
* reset drop local scope counter
test=develop
6 years ago
qingqing01
e32c9888f5
Double backward of conv2d. ( #17211 )
...
* Add conv2d_grad_grad_op
* Extracte the cuDNN conv algo searching code in conv_cudnn_helper.h.
- Now use it in conv2d_grad_grad.
- Will simply the searching code in conv2d and conv2d_grad in next PR.
* Enhance and fix bug in unit testing of gradient_checker.
* Support to fetch empty variables,return None in Python.
6 years ago
Zeng Jinle
5e5e7b3305
fix data_type error message ( #17312 )
...
test=develop
6 years ago
guru4elephant
5d6a1fcf16
fix infer_from_dataset and train_from_dataset ( #17243 )
...
* fix train_from_dataset and infer_from_dataset example
* add inductive dim for data_reader, example: shape=[-1, 1], then -1 will be inducted through run-time reading of number of elements
6 years ago
chengduo
516317cf91
use sync copy ( #17291 )
...
test=develop
6 years ago
Hongyu Liu
c3195de522
Fix concat shape check ( #17247 )
...
* fix shape_check; test=develop
* fix format; test=develop
* fix format; test=develop
* fix ddim bug; test=develop
* fix c++ format; test=develop
* change function name; test=develop
6 years ago
chengduo
04bd413acb
Code Clean: Move all pass to paddle::framework::ir ( #17228 )
...
* move pass to ir
* polish code
test=develop
* fix dependency
test=develop
6 years ago
Zeng Jinle
4f8594088d
Enhance inplace/mem-opt pass and enhance softmax_with_cross_entropy op inplace ( #17225 )
...
* add use_cuda to inplace pass,test=develop
* add test softmax_with_xe_inplace test,test=develop
* fix potential inplace bug
test=develop
* add more skip vars in mem opt pass,test=develop
* follow comment,test=develop
* follow comments,move duplicate out arg check to program->graph,test=develop
6 years ago
songhao
c2e20e2a29
fix build warning like 'comparison between signed and unsigned ( #17240 )
...
integer', test=develop
6 years ago
石晓伟
a72dbe9abf
Cherry-pick benchmark related changes from release/1.4 ( #17156 )
...
* cherry-pick commit from 8877054
* cherry-pick commit from 3f0b97d
* cherry-pick from 16691:Anakin subgraph support yolo_v3 and faster-rcnn
(cherry picked from commit 8643dbc233
)
* Cherry-Pick from 16662 : Anakin subgraph cpu support
(cherry picked from commit 7ad182e16c
)
* Cherry-pick from 1662, 16797.. : add anakin int8 support
(cherry picked from commit e14ab180fe
)
* Cherry-pick from 16813 : change singleton to graph RegistBlock
test=release/1.4
(cherry picked from commit 4b9fa42307
)
* Cherry Pick : 16837 Support ShuffleNet and MobileNet-v2
Support ShuffleNet and MobileNet-v2, test=release/1.4
(cherry picked from commit a6fb066f90
)
* Cherry-pick : anakin subgraph add opt config layout argument #16846
test=release/1.4
(cherry picked from commit 8121b3eccb
)
* 1. add shuffle_channel_detect
(cherry picked from commit 6efdea8997
)
* update shuffle_channel op convert, test=release/1.4
(cherry picked from commit e4726a066f
)
* Modify symbol export rules
test=develop
6 years ago
Zeng Jinle
ee2028a110
Add use_cuda to inplace pass ( #17205 )
...
* add use_cuda to inplace pass,test=develop
* add test softmax_with_xe_inplace test,test=develop
6 years ago
chengduo
950aec55fd
It doesn't need sync when fetch_list nit not empty ( #17201 )
...
test=develop
6 years ago
tensor-tang
79ed1c76cd
fix bn fuse vardesc and add model saver ( #17143 )
...
* fix bn fuse vardesc and add model saver
test=develop
* unify save model in test helper
test=develop
* fix mkdir on windows
test=develop
* remove magic number use bn bias var desc
test=develop
6 years ago
Zeng Jinle
4e1bc6e805
Rewrite inplace pass and fix gc bug ( #17126 )
...
* fix op graph view
test=develop
* rewrite inplace pass and fix reference count pass bug
test=develop
* fix unittest failed
test=develop
* follow comments, test=develop
6 years ago
chengduo
794a195881
fix fuse optimizer ops ( #17102 )
...
test=develop
6 years ago
Tao Luo
aca60e9a20
remove unnecessary prepare_data ( #17080 )
...
test=develop
6 years ago
Zeng Jinle
842ded14b0
fix reference_count_pass,test=develop ( #17060 )
...
test=develop
6 years ago
Tao Luo
d9cd989825
Merge pull request #17048 from luotao1/fix_runtime_cache_bug
...
fix runtime_context_cache bug when gpu model has an op runs only on cpu
6 years ago
chengduo
cc31681687
use fast executor as default ( #17044 )
...
test=develop
6 years ago
chengduo
a2be4b4d91
Add fuse momenutum ops ( #16745 )
...
* Add fuse momenutum ops
6 years ago
luotao1
490e746269
fix runtime_context_cache bug when gpu model has an op runs only on cpu
...
test=develop
6 years ago
wopeizl
51a0243a56
fix nccl wrapper on windows
...
test=develop
6 years ago
Zeng Jinle
1202d3fc74
Refine model gpu memory ( #16993 )
...
* speedup gc and inplace softmax_with_cross_entropy_grad
test=develop
* refine models gpu mem
Merge skip vars and warning messages of mem opt
remove relu mem opt
test=develop
* follow comments
test=develop
6 years ago
Yibing Liu
3c375751f8
Support seq len equal to 0 in sequence ops ( #16935 )
...
* Support seq len equal to 0 in sequence ops
test=develop
* Add more test cases
* Fix some comments
test=develop
* Fix py3 error
test=develop
6 years ago
jiaqi
8bcba3db84
Merge pull request #16896 from xjqbest/develop
...
fix bug of num > INT_MAX
6 years ago
guru4elephant
bbc6c5714f
Merge pull request #16887 from guru4elephant/add_nccl_context_pybind
...
Add nccl context pybind
6 years ago
gongweibao
cbdb8a17b1
Polish DGC code ( #16818 )
6 years ago
dongdaxiang
2ab2869c2d
fix GPU compile error problem
6 years ago
dongdaxiang
466d177d09
add pybind dependency
...
test=develop
6 years ago
xjqbest
10991e00a9
fix bug of num > INT_MAX
6 years ago
xjqbest
241120d94d
fix bug of num > INT_MAX
6 years ago
xjqbest
dac70ad4c5
fix bug of num > INT_MAX
6 years ago
xjqbest
74471397cf
fix bug of num > INT_MAX
6 years ago
dongdaxiang
b091139049
add nccl wrapper for python API
6 years ago
dongdaxiang
fff795e5c8
add nccl_wrapper
6 years ago
乔龙飞 Qiao Longfei
82cff5ec42
Merge pull request #16762 from jacquesqiao/add-async_sparse_param_update_recorder
...
Add async sparse param update recorder
6 years ago
Yibing Liu
4267a81afc
Correct the lod level of compiled time in lod_reset ( #16790 )
...
test=develop
6 years ago
chengduo
e9409665f7
Refine Fuse Optimize Ops ( #16810 )
...
* fix bug of fuse optimize ops
6 years ago
chengduo
d105c06b50
Replace ThreadedExecutor with FastThreadedExecutor ( #16650 )
...
* replace ThreadedExecutor with FastThreadedExecutor
test=develop
* Fix Travise CI
test=develop
* Test FastThreadedSSAGraphExecutor
test=develop
* refine parallel_ssa_graph_executor.cc
test=develop
6 years ago
Qiao Longfei
1526a3e4da
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into add-async_sparse_param_update_recorder
...
test=develop
6 years ago
Yihua Xu
93cedfdb9c
Fix the order while sorting the operators ( #16756 )
...
* Fix the order when sorting operators.
test=develop
* Enable transfomer compare test item.
test=develop
* Use set to replace vector.
test=develop
6 years ago
Qiao Longfei
afc56949c1
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into add-async_sparse_param_update_recorder
6 years ago
liuwei1031
85363848a1
Security issue ( #16774 )
...
* disable memory_optimize and inpalce strategy by default, test=develop
* fix security issue
http://newicafe.baidu.com:80/issue/PaddleSec-3/show?from=page
http://newicafe.baidu.com:80/issue/PaddleSec-8/show?from=page
http://newicafe.baidu.com:80/issue/PaddleSec-12/show?from=page
http://newicafe.baidu.com:80/issue/PaddleSec-32/show?from=page
http://newicafe.baidu.com:80/issue/PaddleSec-35/show?from=page
http://newicafe.baidu.com:80/issue/PaddleSec-37/show?from=page
http://newicafe.baidu.com:80/issue/PaddleSec-40/show?from=page
http://newicafe.baidu.com:80/issue/PaddleSec-43/show?from=page
http://newicafe.baidu.com:80/issue/PaddleSec-44/show?from=page
http://newicafe.baidu.com:80/issue/PaddleSec-45/show?from=page
test=develop
* revert piece.cc, test=develop
* adjust api.cc,test=develop
6 years ago
guru4elephant
aa46caf3d9
Merge pull request #16765 from guru4elephant/gpu_dataset_train
...
add gpu training for Executor.train_from_dataset
6 years ago
dongdaxiang
3c2d236815
remove all warnings
...
test=develop
6 years ago
Yiqun Liu
112f16143b
Add an option to enable the cache of expected kernel in train phase. ( #16724 )
...
* Add an option to enable the cache of expected kernel in train phase.
test=develop
* Change the default value of cache_expected_kernel to true.
6 years ago
liuwei1031
2e07c19a9c
disable memory_optimize and inpalce strategy by default, test=develop ( #16760 )
6 years ago
dongdaxiang
ea07eb8cd2
remove comment in data_feed.cc
...
develop=test
6 years ago
dongdaxiang
05464e7c5c
add gpu training for Executor.train_from_dataset
...
test=develop
6 years ago
Qiao Longfei
0608f8ca56
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into add-async_sparse_param_update_recorder
6 years ago
Zeng Jinle
9f7b027dce
fix activation grad op desc maker ( #16715 )
...
test=develop
6 years ago
liuwei1031
fdb719a1bf
avoid optimize variable used in subblock, test=develop ( #16739 )
6 years ago
liuwei1031
a18ef10c87
only use the latest version variable for inplace strategy ( #16736 )
...
* bug-fix, test=develop
* tweak code, test=develop
6 years ago
Tao Luo
5c364cda3c
Merge pull request #16711 from luotao1/has_attr
...
reduce hasAttr elapsed time in RunImpl
6 years ago
chengduo
55b15db5af
Add unit test for fuse all_reduce ops ( #16699 )
...
* test fuse all_reduce
6 years ago
luotao1
4098ba29ed
reduce hasAttr elapsed time in RunImpl
...
test=develop
6 years ago
luotao1
f89a9c5d95
Merge branch 'develop' into has_attr
6 years ago
Tao Luo
ad4a1bd13c
Merge pull request #16339 from luotao1/core_opt_choose_kernel
...
Cache the chosen kernel of operators
6 years ago
luotao1
6afc97ca6b
reduce hasAttr elapsed time in RunImpl
...
test=develop
6 years ago
gongweibao
8b793d0efd
Fix DGC bug. ( #16697 )
6 years ago
Yiqun Liu
3fe8cb0dd7
Enable the runtime_context_cache pass in train phase ( #16640 )
...
* Try to enable the runtime_context_cache pass in train phase.
* Put the append of runtime_context_cache pass ahead of multi_dev passes.
test=develop
6 years ago
xjqbest
6a57e8075a
remove trainer_id in datafeed and dataset
...
test=develop
6 years ago
luotao1
695f2db6a0
update expected_kernel_cache_pass
...
test=develop
6 years ago
luotao1
226596a296
Merge branch 'develop' into core_opt_choose_kernel
6 years ago
xjqbest
5e5139283b
fix runtime error
...
test=develop
6 years ago
xjqbest
271b7147cc
fix dataset bug
...
test=develop
6 years ago
Zeng Jinle
1c526e1d1a
Fix some grad op desc makers ( #16633 )
...
* fix some grad op desc maker
test=develop
* fix grad op desc makers
test=develop
6 years ago
chengduo
ea2a2f778a
Fix the bug of AllReduceDepPass ( #16393 )
6 years ago
chengduo
b75a69bad6
Add Stream for fetch op handle ( #16600 )
...
* expose fuse broadcast ops
6 years ago
chengduo
1342e2ea04
Fix the bug of the fast threaded executor ( #16514 )
...
* Fix the bug of the fast threaded executor. I
6 years ago
gongweibao
423bc515da
fix batch merge bug ( #16601 )
6 years ago
liuwei1031
bd193781df
fix the bug of reusing different types of variables in memory_optimiz… ( #16547 )
...
* fix the bug of reusing different types of variables in memory_optimize_pass, test=develop
* disable SELECTED_ROWS AND LOD_TENSOR_ARRAY reusage, test=develop
6 years ago
乔龙飞 Qiao Longfei
21622ca30b
Merge pull request #16172 from jacquesqiao/add-async-ssa-graph-executor-communicator
...
Add async ssa graph executor communicator
6 years ago
sneaxiy
10249c0b78
Merge develop
...
test=develop
6 years ago
Qiao Longfei
9861a92f6f
change the return type of NewTempScope to unique ptr test=develop
6 years ago
Qiao Longfei
fb6cc3a1bd
follow commnet, optimize code and add comment test=develop
6 years ago
Qiao Longfei
adf272bcec
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into add-async-ssa-graph-executor-communicator
...
test=develop
6 years ago
guru4elephant
76b49f02ee
Merge pull request #16539 from guru4elephant/train_with_pipe_reader_merge_develop
...
Train with pipe reader merge develop
6 years ago
Qiao Longfei
baf02328b2
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into add-async-ssa-graph-executor-communicator
...
test=develop
6 years ago
Qiao Longfei
9db1a9e128
change log level test=develop
6 years ago
gongweibao
a61ed9782e
fix log level test=develop ( #16554 )
6 years ago
Qiao Longfei
8342f12e31
fix set remote_prefetch test=develop
6 years ago
Qiao Longfei
df45c8c538
update nce and hierarchical_sigmoid remote_prefetch
...
test=develop
6 years ago
Qiao Longfei
a1821a0449
remote remote_prefetch in embedding layer test=develop
6 years ago
dongdaxiang
718ea6dbd5
fix fleet code style
...
test=develop
6 years ago
xjqbest
782ab2e2bd
add some doc
...
test=develop
6 years ago
xjqbest
a99c8d0c29
fix client to client communication bug
...
test=develop
6 years ago
gongweibao
fea91164b7
Fix windows compilation error! ( #16546 )
...
* fix compiled
test=develop
* follow comments test=develop
6 years ago
Zhaolong Xing
3e6aa498d6
Merge pull request #16526 from NHZlX/refine_trt_anakin
...
refine subgraph trt and anakin
6 years ago
sneaxiy
33473890f3
Merge develop
...
test=develop
6 years ago
dongdaxiang
ade9337486
fix API.spec
...
test=develop
6 years ago
liuwei1031
278debab71
fix comments of 16410, test=develop ( #16499 )
...
* fix comments of 16410, test=develop
* modify inplace_op_inference_test according to pass interface change, test=develop
6 years ago
dongdaxiang
720647e17f
rebase current develop and fix conflict
...
test=develop
6 years ago
dongdaxiang
98dda08a85
fix pull sparse slow problem
...
test=develop
6 years ago
dongdaxiang
d739bab844
fix async_executor problem and remove some unnecessary testcase, fix trainer_desc import problem
...
test=develop
6 years ago
dongdaxiang
241d8808be
add timer to distributed executor
...
test=develop
6 years ago
dongdaxiang
3c73859eec
add trainer_desc.proto to distributed executor
...
test=develop
6 years ago
dongdaxiang
60b7bf6fa6
add infer_from_dataset for inference
6 years ago
xjqbest
030c7e7e9d
fix FillSparseValue error
...
test=develop
6 years ago
dongdaxiang
88880d9b69
fix import trainer_desc_pb2 error
...
test=develop
6 years ago
dongdaxiang
0030eb2a61
fix distributed building
...
test=develop
6 years ago
dongdaxiang
ed31874397
undefine rand_r()
...
test=develop
6 years ago
dongdaxiang
f7e4813804
add WIN32 for rand_r and usleep
...
test=develop
6 years ago
dongdaxiang
cedbc161da
add more _LINUX maroc on data_feed.cc for mac and window compile
...
test=develop
6 years ago
dongdaxiang
c5980c3566
add _LINUX macro
...
test=develop
6 years ago
dongdaxiang
433301fbc2
remove glog in shell.h
...
test=develop
6 years ago
dongdaxiang
9e51ad4a65
fix io and fs compile on mac
...
test=develop
6 years ago
dongdaxiang
6eca88ac76
fix io and fs compile on mac
...
test=develop
6 years ago
dongdaxiang
2708108a08
fix fleet_wrapper compile on windows
...
test=develop
6 years ago
dongdaxiang
4ce35815fb
fix windows GLOG problem
...
test=develop
6 years ago
dongdaxiang
e3107a6ae0
fix windows compile problem
...
test=develop
6 years ago
dongdaxiang
398004ece0
disable sys/wait.h to fix windows compile problem, include scope in lodtensor_printer
...
test=develop
6 years ago
dongdaxiang
d4514949bf
remove local random engine in fleet with rand_r()
...
test=develop
6 years ago
dongdaxiang
45eb6f0765
run pre-commit check files and fix code style problem
...
test=develop
6 years ago
dongdaxiang
d87ba58c14
refine document of python API, make device_worker and trainer's API private
...
test=develop
6 years ago
dongdaxiang
5687f234bf
fix trainer_desc.proto error
6 years ago
dongdaxiang
b95b80bc76
add doc string for executor and update API.spec
...
test=develop
6 years ago
dongdaxiang
6be9f719e2
make string_helper dependency work
...
test=develop
6 years ago
xjqbest
e95cafd9a7
fix code style & add dataset testcase
...
test=develop
6 years ago
dongdaxiang
ba15d6b164
move root_scope->DropKids() into Finalize() so that we do not have to drop all the kids
...
test=develop
6 years ago
xjqbest
be74de2c61
fix code style & fix register bug & add release_memory
...
test=develop
6 years ago
dongdaxiang
a0b59773af
fix code style
6 years ago
dongdaxiang
f39b323ed7
remove trainer_library in CMakeLists
...
test=develop
6 years ago
dongdaxiang
365be5d559
support win32 flag in io.cc shell.cc, fix code style problem in fleet_wrapper, fix lodtensor_printer_test problem
...
test=develop
6 years ago
dongdaxiang
6bf796df14
refine print fetch list
6 years ago
xjqbest
589467f24c
fix bug
6 years ago
xjqbest
b7940c2918
fix bug of gen_worker_desc and set_filelist, add some doc
6 years ago
dongdaxiang
68d7bf3de5
add fetch var function
...
test=develop
6 years ago
xjqbest
a34fe6248f
add some doc
6 years ago
xujiaqi01
f5c6a14b54
fix runtime error
6 years ago
xujiaqi01
a5b1a0e12b
support multi dataset && add init model && fix bug
6 years ago
dongdaxiang
3c65cc1bbd
add document for role_maker and fleet parameter, data_generator
6 years ago
dongdaxiang
f6c9232a3d
fix dataset float32 type problem
6 years ago
dongdaxiang
73b1f396d7
add data_generator into paddle.fluid.incubate.data_generator, add op run log in hogwild_device_worker and downpour_device_worker
...
test=develop
6 years ago
dongdaxiang
73544e8b8d
add training speed log
6 years ago
dongdaxiang
9419de521f
add IO percent for multi_trainer
6 years ago
dongdaxiang
6af697adb0
add trainfileswithprofiler for downpour worker
6 years ago
dongdaxiang
2644b88685
add comment for MPI Symetric role maker
...
test=develop
6 years ago
dongdaxiang
cf45c54340
add distributed optimizer factory
6 years ago
dongdaxiang
b7a202aa38
add distributed optimizer factory
6 years ago
xujiaqi01
70a5d4f797
fix error
6 years ago
xujiaqi01
d25389fefd
add some log && fix error
6 years ago
dongdaxiang
317eb0aad3
add incubate for unified API
6 years ago
xujiaqi01
39449ba0b9
fix bug && add DestroyReaders in trainer
6 years ago
dongdaxiang
e657c127a8
hide opt_info in distirbuted optimizer
6 years ago
xujiaqi01
ecfc7df913
add dataset factory && fix style
6 years ago
dongdaxiang
328f11b8b6
refactor downpour optimization
...
test=develop
6 years ago
xujiaqi01
3cea00bd52
store memory data in Dataset && fix bug
6 years ago
dongdaxiang
ff87698a44
refactor downpour optimization
6 years ago
dongdaxiang
b66f0074b6
fix data reading bugs in api, add VLOG(3) log for setup
6 years ago
dongdaxiang
b415ec27e8
make Dataset* as an argument
6 years ago
xjqbest
dd67ad08a2
modify c++ and python dataset related code & fix bug
6 years ago
dongdaxiang
cc4def6ba5
fix some conflict for compilation
6 years ago
heqiaozhi
9bca1926c1
refactor & fix bug
6 years ago
xjqbest
2e9a836c6f
add DataSet and InMemoryDataFeed, support load data into memory and shuffle data
6 years ago
dongdaxiang
2486389793
add RunFromDataset in executor
6 years ago
dongdaxiang
e36bbcc871
fix some typo and CMakefile.txt
6 years ago
xjqbest
824b84d185
add DataSet and InMemoryDataFeed, support load data into memory and shuffle data
6 years ago
dongdaxiang
08c25995a2
add run from dataset in executor.
6 years ago
dongdaxiang
c28bbdf8ba
add dataset_generator.py
...
dataset_generator.py is a framework for generating data with python
the generated data with a fixed format will be feeded into c++ reader
test=develop
6 years ago
dongdaxiang
be757096da
add pybind for fleet
6 years ago
dongdaxiang
687cb79dbb
add pipe command io interface
6 years ago
dongdaxiang
1fe54416c9
move fs.cc and shell.cc into paddle/fluid/framework/io
...
test=develop
6 years ago
dongdaxiang
53fbab5d33
add fs_local_open example
6 years ago
dongdaxiang
afaf937010
add fs_local_open example
6 years ago
dongdaxiang
cf1360643f
add printer for fetch variable
6 years ago
dongdaxiang
d65cb13ad5
add pslib flag on fleet_wrapper CMakefile
6 years ago
dongdaxiang
6de9ebc65c
refine VLOG in fleet_wrapper.h
...
test=develop
6 years ago
dongdaxiang
97d5cd30f0
make pull dense worker work
6 years ago
dongdaxiang
39014b9f9f
fix class register problem
6 years ago
dongdaxiang
f0dd1201cc
fix destructor problem
...
test=develop
6 years ago
dongdaxiang
f2bde9c241
fix destructor problem
6 years ago
dongdaxiang
54f047a126
fix ngraph compile option
6 years ago
dongdaxiang
dd1dc9bcf0
add common.h.in back
6 years ago
dongdaxiang
378037c535
make s_instance_ private to ensure singleton
6 years ago
dongdaxiang
a446d26e8a
add todo for asynce executor
6 years ago
dongdaxiang
c165012031
refine device_worker and trainer code
...
test=develop
6 years ago
dongdaxiang
8a335b50be
add downpour device_worker pb configuration
6 years ago
dongdaxiang
24a8001142
make -DWITH_PSLIB=ON compilable
6 years ago
dongdaxiang
67b1d6d721
add dist_multi_trainer for distributed training, add trainer_factory and device_worker_factory so that we can easily extend new training mode, add pull dense worker which is a singleton for parameter fetching
6 years ago
dongdaxiang
855bf579d2
add dist_multi_trainer for distributed training, add trainer_factory and device_worker_factory so that we can easily extend new training mode, add pull dense worker which is a singleton for parameter fetching
6 years ago
Qiao Longfei
d8974e6da0
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into add-async-ssa-graph-executor-communicator
...
test=develop
6 years ago
chengduo
1096746cbf
Fuse Adam And SGD ops ( #15933 )
...
* fuse optimizer
6 years ago
Jacek Czaja
2632327429
[MKL-DNN] Tensor modifications revert ( #16462 )
...
* Revert "[MKL-DNN] Fix to crash of Transformer when mkldnn is to be used (#16233 )"
This reverts commit 13816dd4ac
.
Apart from enabling transformer for MKL-DNN
* Revert "- MKL-DNN pooling updated to set_prim_desc"
This reverts commit c63f6b2039
.
Conflicts:
paddle/fluid/operators/mkldnn/concat_mkldnn_op.cc
* Revert "[MKL-DNN] MKL-DNN specific Tensor modification (#15429 )"
test=develop
This reverts commit dec9cf53c8
.
* - concat compilation fix
- lint
test=develop
- Lint fixes
test=develop
- Lint fixes
test=develop
- Fix Transpose MKLDNN op
test=develop
6 years ago
chengduo
2265d091e6
Fix threaded executor bug ( #16508 )
...
* fix threaded executor bug
test=develop
* change the order of class member
test=develop
* Fix Travis CI
test=develop
6 years ago
sneaxiy
2c836ff914
check default grad maker
...
test=develop
6 years ago
nhzlx
d065b5bf2b
Anakin ssd support
...
refine trt first run
add quant dequant fuse pass
omit simplify_anakin_priorbox_detection template
omit transpose_flatten_concat_fuse template
test=develop
6 years ago
Zeng Jinle
69cb9792ea
Merge pull request #16506 from sneaxiy/revert-16424-fix_allocator_bug
...
Revert "Fix allocator bug"
6 years ago
chengduo
ed61d67c73
Fix the interface of Pass::Apply ( #16484 )
...
* modify the interface of Pass::Allay
test=develop
* Polish code
test=develop
* Fix Travis CI
test=develop
* fix Pass::Apply interface
test=develop
* Fix Travis CI
test=develop
6 years ago
Zeng Jinle
2aa18e2bda
Merge pull request #16496 from sneaxiy/fix_gc_bug
...
Fix gc bug
6 years ago
Zeng Jinle
174d0d0b90
Revert "Fix allocator bug"
...
add include headers to fix travis-ci
test=develop
6 years ago
gongweibao
eb83abeac3
Add DGC(Deep Gradient Compression) interface. ( #15841 )
6 years ago
Zeng Jinle
644e8af4cf
Merge pull request #16424 from sneaxiy/fix_allocator_bug
...
Fix allocator bug
6 years ago
sneaxiy
c4c6205268
fix gc bug
...
test=develop
6 years ago
Zeng Jinle
c7c6eeb44e
Merge pull request #16409 from sneaxiy/feature/advance_gc
...
Enhance gc to support deleting tensor buffer in advance
6 years ago
Qiao Longfei
33be014535
fix distribute compile problem test=develop
6 years ago
Qiao Longfei
b542639dc0
code clean test=develop
6 years ago
liuwei1031
8d22bc17a4
Memory optimize ( #16410 )
...
* fix cdn issue, test=develop
* fix memory optimize bugs, test=develop
* fix memory optimize bugs, test=develop
* remove add/sub_2 op, test=develop
* disable memory_optimize by default, test=develop
* disable inplace activation in python, test=develop
* fix unittests, test=develop
* fix unittests, test=develop
* bug-fix, test=develop
7 years ago
Zhaolong Xing
fa1796a30a
Merge pull request #16330 from NHZlX/merge_anakin_branch_to_dev
...
Cherry-pick from PaddlePaddle:feature/anakin-engine: Anakin subgraph support.
7 years ago
sneaxiy
a0f4fefb60
delete source file no_need_buffer_vars_inference.cc
...
test=develop
7 years ago
Qiao Longfei
392e97aae5
fix cpplint test=develop
7 years ago
Qiao Longfei
37f6b9ab7a
fix build test=develop
7 years ago
Qiao Longfei
30618409db
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into add-async-ssa-graph-executor-communicator
7 years ago
Wu Yi
9ffd5eecef
test fix fetch bar place for ce ( #16406 )
...
* test fix fetch bar place for ce
* fix ps mode dist train in develop test=develop
* fix style check test=develop
* update test=develop
7 years ago
nhzlx
953bdde058
Merge branch 'develop' of https://github.com/paddlepaddle/paddle into HEAD
...
test=develop
7 years ago
Tao Luo
e0a3a49096
Merge pull request #16438 from wojtuss/wojtuss/move-cpu-quantize-passes
...
Move cpu_quantize_* passes into mkldnn subfolder
7 years ago
gongweibao
ec6519e806
Fix allreducedep bug ( #16443 )
7 years ago
sneaxiy
78fb3a62e0
fix env variable settting bug
...
test=develop
7 years ago
sneaxiy
2d92b6be98
merge develop
...
test=develop
7 years ago
sneaxiy
7000ec85d9
fix some op grad maker
...
fix ctest eager deletion disable bug
test=develop
7 years ago
sneaxiy
f8ed2c229e
try to fix ci error
...
test=develop
7 years ago
Wojciech Uss
46677fb080
Move cpu_quantize_* passes into mkldnn subfolder
...
test=develop
7 years ago
sneaxiy
c20db6357b
split PR
...
test=develop
7 years ago
Qiao Longfei
be0c482304
update trainer_id
7 years ago
sneaxiy
072d95d8f6
Merge develop
...
test=develop
7 years ago
sneaxiy
a93a9eef8f
add op registry type
...
refine gc code
test=develop
7 years ago
chengduo
a6a3b2fbbc
[Speed]Refine ParallelExecutor ( #16190 )
...
* refine parallelExecutor
test=develop
* Polish op_handle
test=develop
* Remove unnecessary op_handle
test=develop
* Fix Travis CI
test=develop
* Fix fetch bug
test=develop
* Remove WaitInputVarGenerated
* Fix OpHandleBase::Run
test=develop
* debug
test=develop
* use origin fetch_op_handle
test=develop
* Revert op_handle_base.cc
test=develop
* Polish code
test=develop
* Fix OpHandleBase::Run
test=develop
* code refine
* test CI and CE
test=develop
* fix OpHandle::Run
test=develop
* refine AllReduceOpHandle
test=develop
* Polish code
test=develop
7 years ago
nhzlx
3df7b98a0f
Merge branch 'develop' of https://github.com/paddlepaddle/paddle into HEAD
7 years ago
chengduo
33965527fd
Add unit test for fuse all reduce ( #16354 )
...
* refine fused_all_reduce_op
* add unit test in test_parallel_executor_seresnext
test=develop
7 years ago
sneaxiy
953214ad97
add more unittest
...
modify allocator strategy
remove changes of legacy buddy_allocator
test=develop
7 years ago
luotao1
056599a738
add expected_kernel_cache_pass
...
test=develop
7 years ago
Wojciech Uss
cbe2dbf0db
Add enabling quantization ( #16326 )
...
* Add enabling quantization
test=develop
* remove unused (here) function
7 years ago
Tao Luo
9a05859179
Merge pull request #16322 from wojtuss/wojtuss/fix_cpu_quantize_pass
...
fix pattern maching conv2d with(out) ResidualData
7 years ago
nhzlx
c407dfa3cb
cherry-pick from feature/anakin-engine: refine paddle-anakin to new interface. #16276
7 years ago
nhzlx
a25331bc26
cherry-pick from feature/anakin-engine: deal the changing shape when using anakin #16189
7 years ago
nhzlx
69d37f81d7
cherry-pick from feature/anakin-engine: refine anakin subgraph. #16157
...
support change input size
7 years ago
nhzlx
a1d200a5de
cherry-pick from feature/anakin-engine: Anakin support facebox #16111
7 years ago
luotao1
bfdab00e5b
Merge branch 'develop' into core_opt_choose_kernel
7 years ago
Tao Luo
a5124ee0bb
Merge pull request #16301 from luotao1/runtime_context_pass
...
add runtime_context_cache_pass
7 years ago
luotao1
6c6a39222b
Merge branch 'core_opt_choose_kernel' of https://github.com/Xreki/Paddle into core_opt_choose_kernel
7 years ago
chengduo
f26ba5bddd
Fuse AllReduce ( #15921 )
...
* fuse all_reduce
test=develop
* add fuse_parameter_groups_size
test=develop
* Polish code
test=develop
* Fix travis-ci
test=develop
* Add SetGroupAccordingToLayers and SetGroupAccordingToGroupSize
test=develop
* Add SetGroupAccordingToMemorySize
test=develop
* fix multi_devices_graph
test=develop
* reset params_grads
test=develop
* Polish code
test=develop
7 years ago
Zeng Jinle
d0ef682552
Merge pull request #16274 from sneaxiy/fix_grad_maker
...
Remove unused variables in op grad maker
7 years ago
Wojciech Uss
104a9f1e27
fix pattern maching conv2d with(out) ResidualData
...
test=develop
7 years ago
Wu Yi
6382b62f6b
Collective ops ( #15572 )
...
* wip allreduce in op
* wip
* wip
* wip
* wip adding test
* wip for conflict with mp mode
* fix tests test=develop
* fix cpu build test=develop
* fix travis clang format test=develop
* fix cpu build test=develop
* update api.spec test=develop
* delete comment test=develop
* fix cpplint test=develop
* fix test=develop
* follow comment test=develop
* add file test=develop
* fix build test=develop
* update test=develop
* to be compatible with sync_bn, and fix mp mode in develop test=develop
7 years ago
sneaxiy
023a3a3d62
fix op grad maker
...
test=develop
7 years ago
luotao1
82af8031d9
add runtime_context_cache_pass
...
test=develop
7 years ago
Tao Luo
7d2740db83
Revert "cache runtime_context"
7 years ago
sneaxiy
fd23262e0c
merge develop, fix conflict
...
test=develop
7 years ago
Qiyang Min
c7f1f3ed0c
Merge pull request #16214 from velconia/imperative_infer_var_type
...
Implement imperative infer var type
7 years ago
Jacek Czaja
13816dd4ac
[MKL-DNN] Fix to crash of Transformer when mkldnn is to be used ( #16233 )
...
* - Fix to crash of Transformer when mkldnn is to be used
Desc: TensorCopy was not setting MKLDNN primitive descriptor when layout was to be kMKLDNN
test=develop
* - Enable transformer for mkl-dnn
test=develo
* - Compilation fix
test=develop
* - Removed manual selection of MKL-DNN ops to be used in Transformer test
test=develop
7 years ago
Wojciech Uss
af03008890
Add cpu_quantize_placement_pass for C-API quantization ( #16265 )
...
* Add cpu_quantize_placement_pass for C-API quantization
test=develop
* added a comment on required pass attributes
test=develop
7 years ago
Tao Luo
dbb92ee4b1
Merge pull request #16002 from luotao1/runtime_context
...
cache runtime_context
7 years ago
minqiyang
b40e41fbd1
Polish code style
...
test=develop
7 years ago
Qiyang Min
8e4ad008fb
Merge pull request #16198 from velconia/imperative_train_speed
...
Improve imperative mode training speed
7 years ago
minqiyang
36dce65bb3
Take DataType and VarType apart
...
test=develop
7 years ago
luotao1
cc0ae1f1a1
refine with comments
...
test=develop
7 years ago
luotao1
a275fd6e0c
Merge branch 'develop' into runtime_context
7 years ago
Wojciech Uss
2579ade45f
Add cpu_quantize_pass for C-API quantization ( #16127 )
...
* Add cpu_quantize_pass for C-API quantization
test=develop
* add cpu_quantize_pass test
* fix lint: add include memory unorderd_map and unordered_set
test=develop
* fuse_relu 1
test=develop
* tuned 2 without squash
* fixes
test=develop
* remove unused vars
test=develop
* refactored
test=develop
* fix lint c-style cast -> C++ style cast
test=develop
* remove QuantMax and c style casts
test=develop
* last usage of QuantMax removed
test=develop
* Fix Analysis Predictor UT
Check if memory_optimize_pass has already been added
to the analysis config before adding a new one, so
that it is not added multiple times.
test=develop
* change map to unordered_map
fix the forgotten part of cpu_quantize_pass_tester.cc
test=develop
* removed quantized attribute
* fixed cpu_quantize_pass_tester and op attr comments
test=develop
* removed redundant line
test=debug
* removed gmock
test=develop
* fix after merge
7 years ago