Commit Graph

720 Commits (bda92434db2d75c9c1f29de84005df28c7b1c9e5)

Author SHA1 Message Date
Zeng Jinle bb4f8dee83
add logs to left var memory size, test=develop (#19722)
6 years ago
wangguanzhong 25dcd74d34
merge empty lod tensor, test=develop (#19228)
6 years ago
Tao Luo bcddbc78d4
remove -Wmaybe-uninitialized warning (#19653)
6 years ago
baojun a3a4b6e570 Enable ngraph through build_strategy (#19266)
6 years ago
Zeng Jinle 19474019c2
fix fast pe to run highest priority ops first, test=develop (#19575)
6 years ago
chengduo b6d1d8901f
Increase num_iteration_per_drop_scope (#19075)
6 years ago
Zeng Jinle 708bd9798d
move_flags_to_unified_files_for_management, test=develop (#19224)
6 years ago
chengduo e044e84264
open fuse_all_optimizer_ops (#19087)
6 years ago
gongweibao 29d8781240
Polish fleet API to support cuda collective mode and nccl2 mode. (#18966)
6 years ago
chengduo e7da0940f9
Disable fuse optimization option (#18924)
6 years ago
Zeng Jinle 8008ab4e6b
Remove legacy C++ memory optimization codes (#18834)
6 years ago
chengduo 4140fe11a4
Open fuse optimization ops (#18741)
6 years ago
Zeng Jinle a802da650b
Feature/mem opt pass refactor (#18735)
6 years ago
chengduo fd3aad6cb3
Make fuse_optimizer_op_pass also work when the model contains sparse gradients. (#18664)
6 years ago
chengduo a6d468a265
fix PE fetch bug (#18644)
6 years ago
gongweibao c0a82748cf
Polish backwards optimizer dependency codes and use more default values. (#18255)
6 years ago
Zeng Jinle d3003a1620
Feature/buffer_shared_inplace (#17911)
6 years ago
Zeng Jinle be24e5b391
Clean unused code of dim and place (#18565)
6 years ago
chengduo 7453857324 Make fuse_all_reduce_op_pass support mix_precision (#17652)
6 years ago
tangwei12 999d9a59a5
fix communicator with pyreader (#18350)
6 years ago
chengduo 5489216eba
Clean build strategy (#18148)
6 years ago
chengduo 24e988a471
Fix bug of scope_buffered_ssa_graph_executor (#18100)
6 years ago
gongweibao f5caf3443c
Fix reinitialized ncclid error! (#18025)
6 years ago
gongweibao fbbdc9ccad
Add backward and optimizer operator dependency pass. (#17746)
6 years ago
chengduo 437520474c
fix DropLocalExeScopes (#17829)
6 years ago
chengduo 67c8dade58
Add Event in ScopeBuffer Executor (#17667)
6 years ago
gongweibao 65bbf950ee
Add multi-ncclcomm and 2D ncclallreduce support. (#17263)
6 years ago
Qiao Longfei 58f7695ab2
Async exe support communicator (#17386)
6 years ago
Tao Luo 32da5e9c3d
remove unused expected_kernel_cache_pass (#17486)
6 years ago
chengduo 5a6ab38013 Add record event And remove CSP (#17447)
6 years ago
chengduo e336dc86bb
[Speed] Refine the Executor when the num_thread=1 (#17405)
6 years ago
Tao Luo 68ec0a6f74
make parallel_executor support FLAGS_use_mkldnn (#17341)
6 years ago
chengduo bc833945a4
Add DropLocalExeScopes in ParallelExecutor (#17297)
6 years ago
chengduo 516317cf91
use sync copy (#17291)
6 years ago
chengduo 04bd413acb
Code Clean: Move all pass to paddle::framework::ir (#17228)
6 years ago
Zeng Jinle 4f8594088d
Enhance inplace/mem-opt pass and enhance softmax_with_cross_entropy op inplace (#17225)
6 years ago
Zeng Jinle ee2028a110
Add use_cuda to inplace pass (#17205)
6 years ago
chengduo 950aec55fd
It doesn't need sync when fetch_list nit not empty (#17201)
6 years ago
Zeng Jinle 4e1bc6e805
Rewrite inplace pass and fix gc bug (#17126)
6 years ago
chengduo 794a195881
fix fuse optimizer ops (#17102)
6 years ago
Zeng Jinle 842ded14b0
fix reference_count_pass,test=develop (#17060)
6 years ago
chengduo cc31681687
use fast executor as default (#17044)
6 years ago
chengduo a2be4b4d91
Add fuse momenutum ops (#16745)
6 years ago
Zeng Jinle 1202d3fc74
Refine model gpu memory (#16993)
6 years ago
gongweibao cbdb8a17b1
Polish DGC code (#16818)
6 years ago
乔龙飞 Qiao Longfei 82cff5ec42
Merge pull request #16762 from jacquesqiao/add-async_sparse_param_update_recorder
6 years ago
chengduo e9409665f7
Refine Fuse Optimize Ops (#16810)
6 years ago
chengduo d105c06b50
Replace ThreadedExecutor with FastThreadedExecutor (#16650)
6 years ago
Qiao Longfei afc56949c1 Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into add-async_sparse_param_update_recorder
6 years ago
Yiqun Liu 112f16143b
Add an option to enable the cache of expected kernel in train phase. (#16724)
6 years ago