Commit Graph

264 Commits (8c81d9949eea828acb76079c685402f6c26c2059)

Author SHA1 Message Date
fengjiayi 964f515e9a fix mac compile
7 years ago
Yancey1989 7e6518e8ca fix compile warning
7 years ago
Yancey1989 7d1b146939 Merge branch 'develop' of github.com:PaddlePaddle/Paddle into overlap_memcpy_with_dist
7 years ago
Qiyang Min 046bb5c8cb Fix NCCLBcast hang up bug in Parallel Executor (#11377)
7 years ago
Yancey1989 6d752bafd8 use get_appropriate_dev to schedule rpc op
7 years ago
Yancey1989 4444e79e46 Merge branch 'develop' of github.com:PaddlePaddle/Paddle into overlap_memcpy_with_dist
7 years ago
chengduoZH 173d72b481 Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into enable_cpu_on_pe
7 years ago
chengduoZH aadaadf735 replace use_event with use_cuda, because use_event means the program running with CUDA, so use_cuda maybe more intuitive.
7 years ago
chengduoZH 1e731f5964 small fix
7 years ago
chengduoZH 5a3c8bf813 fix in c++ side
7 years ago
chengduoZH 0c851cab22 add SSA graph checker
7 years ago
Yancey1989 d5a88b9340 Merge branch 'develop' of github.com:PaddlePaddle/Paddle into overlap_memcpy_with_dist
7 years ago
chengduoZH 8291b916d6 replace graph_builder_factory with ssa_graph_builder_factory
7 years ago
Yancey1989 23433def4b Merge branch 'develop' of github.com:PaddlePaddle/Paddle into overlap_memcpy_with_dist
7 years ago
yuyang18 d9af153232 SSA Graph Builder Factory
7 years ago
Yancey1989 e533a4b4ab Merge branch 'develop' of github.com:PaddlePaddle/Paddle into overlap_memcpy_with_dist
7 years ago
Yancey1989 cb3861538d fix compile failed with CPU
7 years ago
Yancey1989 93401c98e1 overlap rpc op memcpy in distributed training
7 years ago
yuyang18 86a61c177f Add ScopeBufferedSSAGraphExecutor
8 years ago
yuyang18 7c777dd549 Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into feature/exec_strategy
8 years ago
yuyang18 08295f9877 Add build strategy
8 years ago
yuyang18 e5281b3c2d Clean code & add execution strategy
8 years ago
typhoonzero 928418a9ac Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into gen_nccl_id_op
8 years ago
typhoonzero f5840d8925 follow comments
8 years ago
chengduoZH 97cb5479ae change PE strategy
8 years ago
typhoonzero a529d790b6 Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into gen_nccl_id_op
8 years ago
typhoonzero d9320dcd94 complete code
8 years ago
chengduoZH c891189568 update sparse gradient parameter with reduce and broadcast
8 years ago
chengduoZH 5ff1ef36ee update sparse parameter
8 years ago
yangyaming 82571deb89 Change `customize_loss_grad` to `use_default_grad_scale`.
8 years ago
Yu Yang 54ada9449e Add demo for recordio train/test and parallel executor
8 years ago
Yu Yang 7a395881d4 Add customize_loss_grad option to PE
8 years ago
Yu Yang 5305c5f845 Correctly implement destructor of ParallelExecutor
8 years ago
fengjiayi fbe562478d
Merge pull request #9994 from reyoung/feature/debug
8 years ago
Yu Yang 06fb055a2f New group
8 years ago
Yu Yang 71a2e6b73c Reverse create var
8 years ago
Yu Yang 89728f8e66 update
8 years ago
Yu Yang eb2e4eeade Debug
8 years ago
Yu Yang b4aaa00a8a Polish logic of ParallelExecutor
8 years ago
Yu Yang ad73b331c7 Eagerly drop local scope in iteration (#9838)
8 years ago
fengjiayi 90084a25d2
Merge pull request #9743 from JiayiFeng/modify_readers_to_fit_parallel_executor
8 years ago
wanghaoshuang 19c1a68ee9 Fix lost of LoD while splitting tensor in parallel executor.
8 years ago
JiayiFeng ee178d5aeb fix bugs
8 years ago
chengduoZH 7e7611d067 when the number of samples of current batch is less than the count of devices, let it crash.
8 years ago
qingqing01 2b7e5bd366
Support testing during training by ParallelExecutor. (#9738)
8 years ago
Xin Pan 4bbfa9eccb Add feed to ParallelExecutor
8 years ago
Xin Pan b123ce88a1 Add enable/disable for delayed ops
8 years ago
Xin Pan d0ac92531d Improve ParallelExecutor performance
8 years ago
qiaolongfei 9a101cfc08 clean code
8 years ago
qiaolongfei 997e9a1fd2 fix mac compile
8 years ago
chengduoZH 60d0a0594e refine parallel
8 years ago
Yu Yang 3aa2a8ffcf Follow comments
8 years ago
Yu Yang 02aaecca35 Fix CPU compile
8 years ago
Yu Yang edfd741e3a Add simple python wrapper for ParallelExecutor
8 years ago
Yu Yang a7b0d5bd26 Clean code
8 years ago
Yu Yang e3144393e3 Extract Executors to indie modules
8 years ago
Yu Yang c70b60dd70 Make executor steal graph inside
8 years ago
Yu Yang 4c3361cda8 Extract GraphExecutor
8 years ago
Yu Yang b123e43bf9 extract multi devices graph builder
8 years ago
Yu Yang dd73d18bb7 Extract SSAGraph
8 years ago
Yu Yang 79989c9025 Add SSA builder
8 years ago
Yu Yang 64d7a30271 Extract SSAGraph
8 years ago
Yu Yang 8dec4ad7a1 Use int not Place for vars
8 years ago
Yu Yang 3181501013 Rerange code
8 years ago
Yu Yang f28ae6e4b1 Reorganize Code
8 years ago
Yu Yang 5c333e4143 Add dctor for dev_ctx
8 years ago
Yu Yang 15f5f10ed5 AddInput/AddOutput for OpHandle
8 years ago
Yu Yang 5368e50d84 Reorganize code
8 years ago
Yu Yang fe7ed285d1 Extract NCCLCtxMap
8 years ago
Yu Yang 6ebc6bf533 ReorganizeCode
8 years ago
Yu Yang a478a11e0b NCCL Guard for bcast
8 years ago
Yu Yang f2685bed81 Clean code
8 years ago
Yu Yang 41ad632341 Add NCCL Group Guard
8 years ago
Yu Yang 99fe83a020 Move nccl helper
8 years ago
Yu Yang 90f980167d Do not wait computation stream
8 years ago
Yu Yang 7ac969b88c Debug
8 years ago
Yu Yang 599f7a87ba Refine code
8 years ago
Yu Yang 43e54079a8 Debug code
8 years ago
Yu Yang e335f01826 Add more logs
8 years ago
Yu Yang 82693e7227 Wait nccl all reduce
8 years ago
Yu Yang eb0a580e78 Add enforce
8 years ago
Yu Yang 65bc7d17d5 Add mtx to ncclAllReduce
8 years ago
Yu Yang ba227df941 Expose num_threads
8 years ago
Yu Yang 1533bf12df Use event and single thread
8 years ago
Yu Yang 95a0d7c7c1 Illegal memory access
8 years ago
Yu Yang 798e6907b4 Change mem order
8 years ago
Yu Yang 1c2b6100b0 Add
8 years ago
Yu Yang 4e43b71377 Add wait log
8 years ago
Yu Yang dbed123382 Debug
8 years ago
Yu Yang e53b6aba63 Use no thread
8 years ago
Yu Yang a8bd7b9809 Add log
8 years ago
Yu Yang 3c9cea597e Add more log
8 years ago
Yu Yang f8f1a963d9 Add debug code
8 years ago
Yu Yang fbbcedda01 Fix bug
8 years ago
Yu Yang 7643c2cbab Add flag for use event
8 years ago
Yu Yang ca4b3d2532 Use 12 threads
8 years ago
Yu Yang f251a58e85 Use base class manage events
8 years ago
Yu Yang 1dd216dc3b Wait bcast param
8 years ago
Yu Yang 4185dd48e4 Disable multi-thread
8 years ago
Yu Yang 631aa3d10a Wait all inputs ready
8 years ago
Yu Yang 9b1f4d5d62 After nccl add event
8 years ago
Yu Yang feb569f8ea Add log
8 years ago
Yu Yang 260cfe3b86 Stop Wait NCCL Stream
8 years ago
Yu Yang e025e284c6 Exchange wait op
8 years ago
Yu Yang 3238ce0672 Add wait
8 years ago
Yu Yang 8a9de67e17 Remove wait
8 years ago
Yu Yang d2cb3790e9 Wait all evernts
8 years ago
Yu Yang 4137bb4eda Add wait
8 years ago
Yu Yang 3da4159f88 Add run iter
8 years ago
Yu Yang d3c82c356e Wait multiple stream
8 years ago
Yu Yang c18c2f6ab0 Sync all computation streams at the end of run
8 years ago
Yu Yang c372ce2885 Add event for computational op
8 years ago
Yu Yang b94ffacbd7 SetDev
8 years ago
Yu Yang 99f85a9fbc Set dev
8 years ago
Yu Yang d26f093f9d Log
8 years ago
Yu Yang d55a03d916 Scale loss on place
8 years ago
Yu Yang 932364a275 Sync dev
8 years ago
Yu Yang dad7bdabd4 Add setDev
8 years ago
Yu Yang 7fd0d24e0c Add lgo
8 years ago
Yu Yang bade579826 Wait code
8 years ago
Yu Yang 4a330094f9 Add log
8 years ago
Yu Yang 9824e8f311 Scale loss op use event
8 years ago
Yu Yang 071043c388 Add paddle enforce
8 years ago
Yu Yang 8af57706e2 Only wait same device
8 years ago
Yu Yang 29cc9f308d SetDev for nccl
8 years ago
Yu Yang d7badb3ed2 Use event to sync stream
8 years ago
Yu Yang 1f53193a63 Use atomic code
8 years ago
Yu Yang c7beac1426 Add dummy var
8 years ago
Yu Yang 5fa535b717 Wait all thread done
8 years ago
Yu Yang 7bff02b2ca Change to pending op
8 years ago
Yu Yang 866f6f1be0 Debug
8 years ago
Yu Yang a5ba704de0 Counter
8 years ago
Yu Yang a87ce91c4b Use mtx
8 years ago
Yu Yang ea11a0a853 Use volitie
8 years ago
Yu Yang 515e516e77 Add more log
8 years ago
Yu Yang 1f063d0900 Memorder
8 years ago
Yu Yang b1cb8bbd40 Debug
8 years ago
Yu Yang b57b880b05 Debug
8 years ago
Yu Yang f3e983e499 Memory order
8 years ago
Yu Yang 36e0415220 Single Thread
8 years ago
Yu Yang 5957f28b86 Debug
8 years ago
Yu Yang f52714d391 Debug
8 years ago
Yu Yang 0023c3bcf5 Use atomic bool
8 years ago
Yu Yang 09935ab936 Debug
8 years ago
Yu Yang f8141d90c8 Debug
8 years ago
Yu Yang e18a269705 Add debug code
8 years ago
Yu Yang 9cb8f50302 Complete fetch op
8 years ago
Yu Yang 254d7ff4f5 Refactor local_scopes
8 years ago
Yu Yang b2c7a9b828 Wait by stream
8 years ago
Yu Yang e8a7e5d1e6 Update
8 years ago