Commit Graph

796 Commits (89530384008b023dc1e8c51e5a8e7e710718efff)

Author SHA1 Message Date
zhouwei25 345b67b5e2 remove warning LNK4006 and warning LNK4221 (#21226)
6 years ago
Zeng Jinle cdb3d27985
Fix warn of gcc8 (#21205)
6 years ago
Chen Weihang 8414575b78
Add examples for error message writing specification - PreconditionNotMet, Unimplemented, Unavailable (#21137)
6 years ago
Chen Weihang 7e5f74b825
Add examples for error message writing specification - InvalidArgument (#21132)
6 years ago
WangXi de5d3ff688 Fix dgc buffer illegal & reuse velocity (#21012)
6 years ago
Zeng Jinle 878a40f57d
Support NoNeedBufferVarsInference in dygraph backward (#20868)
6 years ago
Zeng Jinle b0c0ffb9ae
refine pe when exception raises, test=develop (#20894)
6 years ago
123malin 20cdff0e02
Optimize decay (#20816)
6 years ago
hong 8c4573a3cb
GradMaker for dygraph (#19706)
6 years ago
Zeng Jinle 98103d3003
remove some unnecessary logs in pe, test=develop (#20848)
6 years ago
wopeizl 9e5948230e
add support to gcc8, add docker env test=develop (#19807)
6 years ago
WangXi 507afa8a8a Fix dgc nan by stripping nccl from sparseReduce. (#20630)
6 years ago
Zeng Jinle a9c8bdad7b
refine pe codes, test=develop (#20479)
6 years ago
Zeng Jinle 76b321872a
fix cuda dev_ctx by event, test=develop (#20553)
6 years ago
chengduo bfa55c9ddb Add place deps for fused_all_reduce_op_handle (#20077)
6 years ago
tangwei12 8f0b3c0516
the integrated communicator (#19849)
6 years ago
chengduo 2450d15b78
disable fuse_all_optimizer_ops (#19966)
6 years ago
chengduo 55ce696986
clean tensor array (#19930)
6 years ago
chengduo d7251a8e1e
Delete local execution scopes (#19749)
6 years ago
Zeng Jinle b754700fb5
fix reduce and broadcast to avoid multi-stream, test=develop (#19889)
6 years ago
Zeng Jinle db26de8389
[Bug fix] Disable memory reuse on feeded variables (#19835)
6 years ago
chengduo 8281497030
Fix warning info of build_strategy (#19805)
6 years ago
chengduo 056fdedde3
Open fuse all reduce option (#19765)
6 years ago
Huihuang Zheng 12542320c5
Replace TemporaryAllocator by CUDADeviceContextAllocator (#18989)
6 years ago
chengduo e506c99c20
Open fuse broadcast option (#18833)
6 years ago
chengduo 5866a7a5fe
Enable fused_all_reduce_op_handle support GPU and CPU Gradients (#19418)
6 years ago
Zeng Jinle bb4f8dee83
add logs to left var memory size, test=develop (#19722)
6 years ago
wangguanzhong 25dcd74d34
merge empty lod tensor, test=develop (#19228)
6 years ago
Tao Luo bcddbc78d4
remove -Wmaybe-uninitialized warning (#19653)
6 years ago
baojun a3a4b6e570 Enable ngraph through build_strategy (#19266)
6 years ago
Zeng Jinle 19474019c2
fix fast pe to run highest priority ops first, test=develop (#19575)
6 years ago
chengduo b6d1d8901f
Increase num_iteration_per_drop_scope (#19075)
6 years ago
Zeng Jinle 708bd9798d
move_flags_to_unified_files_for_management, test=develop (#19224)
6 years ago
chengduo e044e84264
open fuse_all_optimizer_ops (#19087)
6 years ago
gongweibao 29d8781240
Polish fleet API to support cuda collective mode and nccl2 mode. (#18966)
6 years ago
chengduo e7da0940f9
Disable fuse optimization option (#18924)
6 years ago
Zeng Jinle 8008ab4e6b
Remove legacy C++ memory optimization codes (#18834)
6 years ago
chengduo 4140fe11a4
Open fuse optimization ops (#18741)
6 years ago
Zeng Jinle a802da650b
Feature/mem opt pass refactor (#18735)
6 years ago
chengduo fd3aad6cb3
Make fuse_optimizer_op_pass also work when the model contains sparse gradients. (#18664)
6 years ago
chengduo a6d468a265
fix PE fetch bug (#18644)
6 years ago
gongweibao c0a82748cf
Polish backwards optimizer dependency codes and use more default values. (#18255)
6 years ago
Zeng Jinle d3003a1620
Feature/buffer_shared_inplace (#17911)
6 years ago
Zeng Jinle be24e5b391
Clean unused code of dim and place (#18565)
6 years ago
chengduo 7453857324 Make fuse_all_reduce_op_pass support mix_precision (#17652)
6 years ago
tangwei12 999d9a59a5
fix communicator with pyreader (#18350)
6 years ago
chengduo 5489216eba
Clean build strategy (#18148)
6 years ago
chengduo 24e988a471
Fix bug of scope_buffered_ssa_graph_executor (#18100)
6 years ago
gongweibao f5caf3443c
Fix reinitialized ncclid error! (#18025)
6 years ago
gongweibao fbbdc9ccad
Add backward and optimizer operator dependency pass. (#17746)
6 years ago
chengduo 437520474c
fix DropLocalExeScopes (#17829)
6 years ago
chengduo 67c8dade58
Add Event in ScopeBuffer Executor (#17667)
6 years ago
gongweibao 65bbf950ee
Add multi-ncclcomm and 2D ncclallreduce support. (#17263)
6 years ago
Qiao Longfei 58f7695ab2
Async exe support communicator (#17386)
6 years ago
Tao Luo 32da5e9c3d
remove unused expected_kernel_cache_pass (#17486)
6 years ago
chengduo 5a6ab38013 Add record event And remove CSP (#17447)
6 years ago
chengduo e336dc86bb
[Speed] Refine the Executor when the num_thread=1 (#17405)
6 years ago
Tao Luo 68ec0a6f74
make parallel_executor support FLAGS_use_mkldnn (#17341)
6 years ago
chengduo bc833945a4
Add DropLocalExeScopes in ParallelExecutor (#17297)
6 years ago
chengduo 516317cf91
use sync copy (#17291)
6 years ago
chengduo 04bd413acb
Code Clean: Move all pass to paddle::framework::ir (#17228)
6 years ago
Zeng Jinle 4f8594088d
Enhance inplace/mem-opt pass and enhance softmax_with_cross_entropy op inplace (#17225)
6 years ago
Zeng Jinle ee2028a110
Add use_cuda to inplace pass (#17205)
6 years ago
chengduo 950aec55fd
It doesn't need sync when fetch_list nit not empty (#17201)
6 years ago
Zeng Jinle 4e1bc6e805
Rewrite inplace pass and fix gc bug (#17126)
6 years ago
chengduo 794a195881
fix fuse optimizer ops (#17102)
6 years ago
Zeng Jinle 842ded14b0
fix reference_count_pass,test=develop (#17060)
6 years ago
chengduo cc31681687
use fast executor as default (#17044)
6 years ago
chengduo a2be4b4d91
Add fuse momenutum ops (#16745)
6 years ago
Zeng Jinle 1202d3fc74
Refine model gpu memory (#16993)
6 years ago
gongweibao cbdb8a17b1
Polish DGC code (#16818)
6 years ago
乔龙飞 Qiao Longfei 82cff5ec42
Merge pull request #16762 from jacquesqiao/add-async_sparse_param_update_recorder
6 years ago
chengduo e9409665f7
Refine Fuse Optimize Ops (#16810)
6 years ago
chengduo d105c06b50
Replace ThreadedExecutor with FastThreadedExecutor (#16650)
6 years ago
Qiao Longfei afc56949c1 Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into add-async_sparse_param_update_recorder
6 years ago
Yiqun Liu 112f16143b
Add an option to enable the cache of expected kernel in train phase. (#16724)
6 years ago
liuwei1031 2e07c19a9c
disable memory_optimize and inpalce strategy by default, test=develop (#16760)
6 years ago
Qiao Longfei 0608f8ca56 Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into add-async_sparse_param_update_recorder
6 years ago
Zeng Jinle 9f7b027dce
fix activation grad op desc maker (#16715)
6 years ago
liuwei1031 fdb719a1bf
avoid optimize variable used in subblock, test=develop (#16739)
6 years ago
liuwei1031 a18ef10c87
only use the latest version variable for inplace strategy (#16736)
6 years ago
chengduo 55b15db5af
Add unit test for fuse all_reduce ops (#16699)
6 years ago
gongweibao 8b793d0efd
Fix DGC bug. (#16697)
6 years ago
Yiqun Liu 3fe8cb0dd7
Enable the runtime_context_cache pass in train phase (#16640)
6 years ago
chengduo ea2a2f778a Fix the bug of AllReduceDepPass (#16393)
6 years ago
chengduo b75a69bad6
Add Stream for fetch op handle (#16600)
7 years ago
chengduo 1342e2ea04
Fix the bug of the fast threaded executor (#16514)
7 years ago
liuwei1031 bd193781df
fix the bug of reusing different types of variables in memory_optimiz… (#16547)
7 years ago
乔龙飞 Qiao Longfei 21622ca30b
Merge pull request #16172 from jacquesqiao/add-async-ssa-graph-executor-communicator
7 years ago
sneaxiy 10249c0b78 Merge develop
7 years ago
Qiao Longfei fb6cc3a1bd follow commnet, optimize code and add comment test=develop
7 years ago
Qiao Longfei baf02328b2 Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into add-async-ssa-graph-executor-communicator
7 years ago
Qiao Longfei 9db1a9e128 change log level test=develop
7 years ago
gongweibao a61ed9782e
fix log level test=develop (#16554)
7 years ago
Qiao Longfei 8342f12e31 fix set remote_prefetch test=develop
7 years ago
Qiao Longfei df45c8c538 update nce and hierarchical_sigmoid remote_prefetch
7 years ago
Qiao Longfei a1821a0449 remote remote_prefetch in embedding layer test=develop
7 years ago
gongweibao fea91164b7 Fix windows compilation error! (#16546)
7 years ago
sneaxiy 33473890f3 Merge develop
7 years ago
liuwei1031 278debab71
fix comments of 16410, test=develop (#16499)
7 years ago