Commit Graph

2568 Commits (0d6ea52958ffcb7ed7bda0917aecb872a96ea2ca)

Author SHA1 Message Date
chengduo bc833945a4
Add DropLocalExeScopes in ParallelExecutor (#17297)
6 years ago
qingqing01 e32c9888f5
Double backward of conv2d. (#17211)
6 years ago
Zeng Jinle 5e5e7b3305
fix data_type error message (#17312)
6 years ago
guru4elephant 5d6a1fcf16
fix infer_from_dataset and train_from_dataset (#17243)
6 years ago
chengduo 516317cf91
use sync copy (#17291)
6 years ago
Hongyu Liu c3195de522
Fix concat shape check (#17247)
6 years ago
chengduo 04bd413acb
Code Clean: Move all pass to paddle::framework::ir (#17228)
6 years ago
Zeng Jinle 4f8594088d
Enhance inplace/mem-opt pass and enhance softmax_with_cross_entropy op inplace (#17225)
6 years ago
songhao c2e20e2a29 fix build warning like 'comparison between signed and unsigned (#17240)
6 years ago
石晓伟 a72dbe9abf
Cherry-pick benchmark related changes from release/1.4 (#17156)
6 years ago
Zeng Jinle ee2028a110
Add use_cuda to inplace pass (#17205)
6 years ago
chengduo 950aec55fd
It doesn't need sync when fetch_list nit not empty (#17201)
6 years ago
tensor-tang 79ed1c76cd
fix bn fuse vardesc and add model saver (#17143)
6 years ago
Zeng Jinle 4e1bc6e805
Rewrite inplace pass and fix gc bug (#17126)
6 years ago
chengduo 794a195881
fix fuse optimizer ops (#17102)
6 years ago
Tao Luo aca60e9a20
remove unnecessary prepare_data (#17080)
6 years ago
Zeng Jinle 842ded14b0
fix reference_count_pass,test=develop (#17060)
6 years ago
Tao Luo d9cd989825
Merge pull request #17048 from luotao1/fix_runtime_cache_bug
6 years ago
chengduo cc31681687
use fast executor as default (#17044)
6 years ago
chengduo a2be4b4d91
Add fuse momenutum ops (#16745)
6 years ago
luotao1 490e746269 fix runtime_context_cache bug when gpu model has an op runs only on cpu
6 years ago
wopeizl 51a0243a56 fix nccl wrapper on windows
6 years ago
Zeng Jinle 1202d3fc74
Refine model gpu memory (#16993)
6 years ago
Yibing Liu 3c375751f8
Support seq len equal to 0 in sequence ops (#16935)
6 years ago
jiaqi 8bcba3db84
Merge pull request #16896 from xjqbest/develop
6 years ago
guru4elephant bbc6c5714f
Merge pull request #16887 from guru4elephant/add_nccl_context_pybind
6 years ago
gongweibao cbdb8a17b1
Polish DGC code (#16818)
6 years ago
dongdaxiang 2ab2869c2d fix GPU compile error problem
6 years ago
dongdaxiang 466d177d09 add pybind dependency
6 years ago
xjqbest 10991e00a9 fix bug of num > INT_MAX
6 years ago
xjqbest 241120d94d fix bug of num > INT_MAX
6 years ago
xjqbest dac70ad4c5 fix bug of num > INT_MAX
6 years ago
xjqbest 74471397cf fix bug of num > INT_MAX
6 years ago
dongdaxiang b091139049 add nccl wrapper for python API
6 years ago
dongdaxiang fff795e5c8 add nccl_wrapper
6 years ago
乔龙飞 Qiao Longfei 82cff5ec42
Merge pull request #16762 from jacquesqiao/add-async_sparse_param_update_recorder
6 years ago
Yibing Liu 4267a81afc
Correct the lod level of compiled time in lod_reset (#16790)
6 years ago
chengduo e9409665f7
Refine Fuse Optimize Ops (#16810)
6 years ago
chengduo d105c06b50
Replace ThreadedExecutor with FastThreadedExecutor (#16650)
6 years ago
Qiao Longfei 1526a3e4da Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into add-async_sparse_param_update_recorder
6 years ago
Yihua Xu 93cedfdb9c Fix the order while sorting the operators (#16756)
6 years ago
Qiao Longfei afc56949c1 Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into add-async_sparse_param_update_recorder
6 years ago
liuwei1031 85363848a1
Security issue (#16774)
6 years ago
guru4elephant aa46caf3d9
Merge pull request #16765 from guru4elephant/gpu_dataset_train
6 years ago
dongdaxiang 3c2d236815 remove all warnings
6 years ago
Yiqun Liu 112f16143b
Add an option to enable the cache of expected kernel in train phase. (#16724)
6 years ago
liuwei1031 2e07c19a9c
disable memory_optimize and inpalce strategy by default, test=develop (#16760)
6 years ago
dongdaxiang ea07eb8cd2 remove comment in data_feed.cc
6 years ago
dongdaxiang 05464e7c5c add gpu training for Executor.train_from_dataset
6 years ago
Qiao Longfei 0608f8ca56 Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into add-async_sparse_param_update_recorder
6 years ago
Zeng Jinle 9f7b027dce
fix activation grad op desc maker (#16715)
6 years ago
liuwei1031 fdb719a1bf
avoid optimize variable used in subblock, test=develop (#16739)
6 years ago
liuwei1031 a18ef10c87
only use the latest version variable for inplace strategy (#16736)
6 years ago
Tao Luo 5c364cda3c
Merge pull request #16711 from luotao1/has_attr
6 years ago
chengduo 55b15db5af
Add unit test for fuse all_reduce ops (#16699)
6 years ago
luotao1 4098ba29ed reduce hasAttr elapsed time in RunImpl
6 years ago
luotao1 f89a9c5d95 Merge branch 'develop' into has_attr
6 years ago
Tao Luo ad4a1bd13c
Merge pull request #16339 from luotao1/core_opt_choose_kernel
6 years ago
luotao1 6afc97ca6b reduce hasAttr elapsed time in RunImpl
6 years ago
gongweibao 8b793d0efd
Fix DGC bug. (#16697)
6 years ago
Yiqun Liu 3fe8cb0dd7
Enable the runtime_context_cache pass in train phase (#16640)
6 years ago
xjqbest 6a57e8075a remove trainer_id in datafeed and dataset
6 years ago
luotao1 695f2db6a0 update expected_kernel_cache_pass
6 years ago
luotao1 226596a296 Merge branch 'develop' into core_opt_choose_kernel
6 years ago
xjqbest 5e5139283b fix runtime error
6 years ago
xjqbest 271b7147cc fix dataset bug
6 years ago
Zeng Jinle 1c526e1d1a
Fix some grad op desc makers (#16633)
6 years ago
chengduo ea2a2f778a Fix the bug of AllReduceDepPass (#16393)
6 years ago
chengduo b75a69bad6
Add Stream for fetch op handle (#16600)
6 years ago
chengduo 1342e2ea04
Fix the bug of the fast threaded executor (#16514)
6 years ago
gongweibao 423bc515da
fix batch merge bug (#16601)
6 years ago
liuwei1031 bd193781df
fix the bug of reusing different types of variables in memory_optimiz… (#16547)
6 years ago
乔龙飞 Qiao Longfei 21622ca30b
Merge pull request #16172 from jacquesqiao/add-async-ssa-graph-executor-communicator
6 years ago
sneaxiy 10249c0b78 Merge develop
6 years ago
Qiao Longfei 9861a92f6f change the return type of NewTempScope to unique ptr test=develop
6 years ago
Qiao Longfei fb6cc3a1bd follow commnet, optimize code and add comment test=develop
6 years ago
Qiao Longfei adf272bcec Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into add-async-ssa-graph-executor-communicator
6 years ago
guru4elephant 76b49f02ee
Merge pull request #16539 from guru4elephant/train_with_pipe_reader_merge_develop
6 years ago
Qiao Longfei baf02328b2 Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into add-async-ssa-graph-executor-communicator
6 years ago
Qiao Longfei 9db1a9e128 change log level test=develop
6 years ago
gongweibao a61ed9782e
fix log level test=develop (#16554)
6 years ago
Qiao Longfei 8342f12e31 fix set remote_prefetch test=develop
6 years ago
Qiao Longfei df45c8c538 update nce and hierarchical_sigmoid remote_prefetch
6 years ago
Qiao Longfei a1821a0449 remote remote_prefetch in embedding layer test=develop
6 years ago
dongdaxiang 718ea6dbd5 fix fleet code style
6 years ago
xjqbest 782ab2e2bd add some doc
6 years ago
xjqbest a99c8d0c29 fix client to client communication bug
6 years ago
gongweibao fea91164b7 Fix windows compilation error! (#16546)
6 years ago
Zhaolong Xing 3e6aa498d6
Merge pull request #16526 from NHZlX/refine_trt_anakin
6 years ago
sneaxiy 33473890f3 Merge develop
6 years ago
dongdaxiang ade9337486 fix API.spec
6 years ago
liuwei1031 278debab71
fix comments of 16410, test=develop (#16499)
6 years ago
dongdaxiang 720647e17f rebase current develop and fix conflict
6 years ago
dongdaxiang 98dda08a85 fix pull sparse slow problem
6 years ago
dongdaxiang d739bab844 fix async_executor problem and remove some unnecessary testcase, fix trainer_desc import problem
6 years ago
dongdaxiang 241d8808be add timer to distributed executor
6 years ago
dongdaxiang 3c73859eec add trainer_desc.proto to distributed executor
6 years ago
dongdaxiang 60b7bf6fa6 add infer_from_dataset for inference
6 years ago
xjqbest 030c7e7e9d fix FillSparseValue error
6 years ago
dongdaxiang 88880d9b69 fix import trainer_desc_pb2 error
6 years ago
dongdaxiang 0030eb2a61 fix distributed building
6 years ago
dongdaxiang ed31874397 undefine rand_r()
6 years ago
dongdaxiang f7e4813804 add WIN32 for rand_r and usleep
6 years ago
dongdaxiang cedbc161da add more _LINUX maroc on data_feed.cc for mac and window compile
6 years ago
dongdaxiang c5980c3566 add _LINUX macro
6 years ago
dongdaxiang 433301fbc2 remove glog in shell.h
6 years ago
dongdaxiang 9e51ad4a65 fix io and fs compile on mac
6 years ago
dongdaxiang 6eca88ac76 fix io and fs compile on mac
6 years ago
dongdaxiang 2708108a08 fix fleet_wrapper compile on windows
6 years ago
dongdaxiang 4ce35815fb fix windows GLOG problem
6 years ago
dongdaxiang e3107a6ae0 fix windows compile problem
6 years ago
dongdaxiang 398004ece0 disable sys/wait.h to fix windows compile problem, include scope in lodtensor_printer
6 years ago
dongdaxiang d4514949bf remove local random engine in fleet with rand_r()
6 years ago
dongdaxiang 45eb6f0765 run pre-commit check files and fix code style problem
6 years ago
dongdaxiang d87ba58c14 refine document of python API, make device_worker and trainer's API private
6 years ago
dongdaxiang 5687f234bf fix trainer_desc.proto error
6 years ago
dongdaxiang b95b80bc76 add doc string for executor and update API.spec
6 years ago
dongdaxiang 6be9f719e2 make string_helper dependency work
6 years ago
xjqbest e95cafd9a7 fix code style & add dataset testcase
6 years ago
dongdaxiang ba15d6b164 move root_scope->DropKids() into Finalize() so that we do not have to drop all the kids
6 years ago
xjqbest be74de2c61 fix code style & fix register bug & add release_memory
6 years ago
dongdaxiang a0b59773af fix code style
6 years ago
dongdaxiang f39b323ed7 remove trainer_library in CMakeLists
6 years ago
dongdaxiang 365be5d559 support win32 flag in io.cc shell.cc, fix code style problem in fleet_wrapper, fix lodtensor_printer_test problem
6 years ago
dongdaxiang 6bf796df14 refine print fetch list
6 years ago
xjqbest 589467f24c fix bug
6 years ago
xjqbest b7940c2918 fix bug of gen_worker_desc and set_filelist, add some doc
6 years ago
dongdaxiang 68d7bf3de5 add fetch var function
6 years ago
xjqbest a34fe6248f add some doc
6 years ago
xujiaqi01 f5c6a14b54 fix runtime error
6 years ago
xujiaqi01 a5b1a0e12b support multi dataset && add init model && fix bug
6 years ago
dongdaxiang 3c65cc1bbd add document for role_maker and fleet parameter, data_generator
6 years ago
dongdaxiang f6c9232a3d fix dataset float32 type problem
6 years ago
dongdaxiang 73b1f396d7 add data_generator into paddle.fluid.incubate.data_generator, add op run log in hogwild_device_worker and downpour_device_worker
6 years ago
dongdaxiang 73544e8b8d add training speed log
6 years ago
dongdaxiang 9419de521f add IO percent for multi_trainer
6 years ago
dongdaxiang 6af697adb0 add trainfileswithprofiler for downpour worker
6 years ago
dongdaxiang 2644b88685 add comment for MPI Symetric role maker
6 years ago
dongdaxiang cf45c54340 add distributed optimizer factory
6 years ago
dongdaxiang b7a202aa38 add distributed optimizer factory
6 years ago
xujiaqi01 70a5d4f797 fix error
6 years ago
xujiaqi01 d25389fefd add some log && fix error
6 years ago
dongdaxiang 317eb0aad3 add incubate for unified API
6 years ago
xujiaqi01 39449ba0b9 fix bug && add DestroyReaders in trainer
6 years ago
dongdaxiang e657c127a8 hide opt_info in distirbuted optimizer
6 years ago
xujiaqi01 ecfc7df913 add dataset factory && fix style
6 years ago
dongdaxiang 328f11b8b6 refactor downpour optimization
6 years ago
xujiaqi01 3cea00bd52 store memory data in Dataset && fix bug
6 years ago
dongdaxiang ff87698a44 refactor downpour optimization
6 years ago
dongdaxiang b66f0074b6 fix data reading bugs in api, add VLOG(3) log for setup
6 years ago
dongdaxiang b415ec27e8 make Dataset* as an argument
6 years ago
xjqbest dd67ad08a2 modify c++ and python dataset related code & fix bug
6 years ago
dongdaxiang cc4def6ba5 fix some conflict for compilation
6 years ago
heqiaozhi 9bca1926c1 refactor & fix bug
6 years ago
xjqbest 2e9a836c6f add DataSet and InMemoryDataFeed, support load data into memory and shuffle data
6 years ago
dongdaxiang 2486389793 add RunFromDataset in executor
6 years ago
dongdaxiang e36bbcc871 fix some typo and CMakefile.txt
6 years ago
xjqbest 824b84d185 add DataSet and InMemoryDataFeed, support load data into memory and shuffle data
6 years ago
dongdaxiang 08c25995a2 add run from dataset in executor.
6 years ago
dongdaxiang c28bbdf8ba add dataset_generator.py
6 years ago
dongdaxiang be757096da add pybind for fleet
6 years ago
dongdaxiang 687cb79dbb add pipe command io interface
6 years ago
dongdaxiang 1fe54416c9 move fs.cc and shell.cc into paddle/fluid/framework/io
6 years ago
dongdaxiang 53fbab5d33 add fs_local_open example
6 years ago
dongdaxiang afaf937010 add fs_local_open example
6 years ago
dongdaxiang cf1360643f add printer for fetch variable
6 years ago
dongdaxiang d65cb13ad5 add pslib flag on fleet_wrapper CMakefile
6 years ago
dongdaxiang 6de9ebc65c refine VLOG in fleet_wrapper.h
6 years ago
dongdaxiang 97d5cd30f0 make pull dense worker work
6 years ago
dongdaxiang 39014b9f9f fix class register problem
6 years ago
dongdaxiang f0dd1201cc fix destructor problem
6 years ago
dongdaxiang f2bde9c241 fix destructor problem
6 years ago
dongdaxiang 54f047a126 fix ngraph compile option
6 years ago
dongdaxiang dd1dc9bcf0 add common.h.in back
6 years ago
dongdaxiang 378037c535 make s_instance_ private to ensure singleton
6 years ago
dongdaxiang a446d26e8a add todo for asynce executor
6 years ago
dongdaxiang c165012031 refine device_worker and trainer code
6 years ago
dongdaxiang 8a335b50be add downpour device_worker pb configuration
6 years ago
dongdaxiang 24a8001142 make -DWITH_PSLIB=ON compilable
6 years ago
dongdaxiang 67b1d6d721 add dist_multi_trainer for distributed training, add trainer_factory and device_worker_factory so that we can easily extend new training mode, add pull dense worker which is a singleton for parameter fetching
6 years ago
dongdaxiang 855bf579d2 add dist_multi_trainer for distributed training, add trainer_factory and device_worker_factory so that we can easily extend new training mode, add pull dense worker which is a singleton for parameter fetching
6 years ago
Qiao Longfei d8974e6da0 Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into add-async-ssa-graph-executor-communicator
6 years ago
chengduo 1096746cbf
Fuse Adam And SGD ops (#15933)
6 years ago
Jacek Czaja 2632327429 [MKL-DNN] Tensor modifications revert (#16462)
6 years ago
chengduo 2265d091e6
Fix threaded executor bug (#16508)
6 years ago
sneaxiy 2c836ff914 check default grad maker
6 years ago
nhzlx d065b5bf2b Anakin ssd support
6 years ago
Zeng Jinle 69cb9792ea
Merge pull request #16506 from sneaxiy/revert-16424-fix_allocator_bug
6 years ago
chengduo ed61d67c73
Fix the interface of Pass::Apply (#16484)
6 years ago
Zeng Jinle 2aa18e2bda
Merge pull request #16496 from sneaxiy/fix_gc_bug
6 years ago
Zeng Jinle 174d0d0b90 Revert "Fix allocator bug"
6 years ago
gongweibao eb83abeac3
Add DGC(Deep Gradient Compression) interface. (#15841)
6 years ago
Zeng Jinle 644e8af4cf
Merge pull request #16424 from sneaxiy/fix_allocator_bug
6 years ago
sneaxiy c4c6205268 fix gc bug
6 years ago
Zeng Jinle c7c6eeb44e
Merge pull request #16409 from sneaxiy/feature/advance_gc
6 years ago
Qiao Longfei 33be014535 fix distribute compile problem test=develop
6 years ago
Qiao Longfei b542639dc0 code clean test=develop
6 years ago
liuwei1031 8d22bc17a4
Memory optimize (#16410)
6 years ago
Zhaolong Xing fa1796a30a
Merge pull request #16330 from NHZlX/merge_anakin_branch_to_dev
6 years ago
sneaxiy a0f4fefb60 delete source file no_need_buffer_vars_inference.cc
6 years ago