chengduozh
82d2903b63
Fix fast ParallelExe bug
...
test=develop
7 years ago
sneaxiy
d3ed070e10
test=develop
7 years ago
sneaxiy
fb6201e93e
test=develop
7 years ago
sneaxiy
9606b37ce4
test=develop
7 years ago
chengduo
d6747a9ac2
make check_graph choosable ( #13674 )
...
test=develop
7 years ago
chengduo
5175b3cb2b
Add GraphChecker ( #13580 )
...
* add GraphNum
test=develop
* add graph number check in parallelExecutor
test=develop
* fix transformer_model bug
test=develop
* fix graph num
7 years ago
Xin Pan
36c2a9af27
pass builder allow cutomize pass in python.
7 years ago
chengduo
d402234ba8
Feature/op_fuse_pass ( #12440 )
...
* Add Preface
* Add demo code
* Save file
* Refine code
* seems can work
* use elementwise strategy
* Use ElementwiseComputeEx
* Add comments
* extract functions from operator
* Refine code
* Follow comment
* code refine
* add op_fuse pass
* add backward
* code refine
* use TopologySortOperations
* follow comments
* refine IsFusible
* code enhance
* fix op_fusion_pass
* refine code
* refine fuse_elemwise_act_op
* adjust the input and output
* refine logic
* add intermediate_edge
* disable inplace
* follow comments
* refine logic
* follow comments
* Remove the removable IntermediateOut
* change strategy
* code refine
* enable fuse backward
* code refine
* code refine
* rename unit test
* follow comments
7 years ago
Xin Pan
a83a4fab5c
Merge pull request #13441 from panyx0718/ir2
...
simplify and hide bcast_params
7 years ago
Xin Pan
ec6ee0a293
simplify and hide bcast_params
7 years ago
sneaxiy
612e1a3155
modification
7 years ago
sneaxiy
d0b2453ecd
merge develop
7 years ago
sneaxiy
24ea39c4c6
feature/eager_delete_tensor
7 years ago
minqiyang
dc863aac7e
Add kids exists detection in Scope
7 years ago
minqiyang
681514e15f
Make all scope pointer to shared
7 years ago
yuyang18
05cadf1b24
Add FastExecutor
7 years ago
Xin Pan
626abfc33a
code clean up and renaming
...
Reduce one level of inheritence.
7 years ago
Xin Pan
99c0c20468
add pass test
7 years ago
Xin Pan
ab72d28a5e
clean up and correctness check
7 years ago
Xin Pan
aa1085ddc5
all passes
...
add doc
7 years ago
Xin Pan
e4d7d7ae8f
pass refactoring
7 years ago
Xin Pan
142e832d21
pass registration
7 years ago
Xin Pan
5b183557f3
graph viz pass
7 years ago
Xin Pan
c3f6e0e8a2
add namespace to Graph
7 years ago
Xin Pan
64eaa4c829
clean
7 years ago
Xin Pan
2fa8df1caf
separate graph building pass and graph-based pe builder
7 years ago
Xin Pan
9605fcd124
all graphs
7 years ago
Xin Pan
af79b19207
add a simple program to graph
7 years ago
Xin Pan
68aa500451
polish attrs
7 years ago
Yancey
0042ba93c8
Merge pull request #12127 from Yancey1989/enforce_rpc_timeout
...
Enforce rpc timeout
7 years ago
chengduo
325fbc4f1b
Add learning rate decay test ( #12124 )
...
* Add learning rate decay test
* fix test name
* doesn't share @LR_DECAY_COUNTER@
7 years ago
chengduo
86b0a72576
Refine multi thread cpu parallel exe ( #11406 )
...
* refine multi-thread CPU Parallel exe
* refine multi thread CPU Parallel exe
* Refine CPU version for ParallelExecutor
* add share_parameter_between_cards_
* Fix ParallelExecutor bug
* Fix unit test
* Fix parameter opt balance
* Fix with opti (param->grad)
* Add grad to op var
* Remove shard_param_between_cards
7 years ago
Yancey1989
d14afcedeb
polish function name
7 years ago
Yancey1989
1effba3312
fix pe with cpu place
7 years ago
chengduo
8d76cf397d
Fix TensorCopy bug ( #11822 )
...
* Fix tensorcopy bug
* follow comment
* Refine TensorCopy
7 years ago
chengduo
6711b7b5f1
fix FeedAndSplitTensorIntoLocalScopes ( #11817 )
7 years ago
yi.wu
8d04d0e2a3
update
7 years ago
yi.wu
6f0107126a
fix broadcast bug
7 years ago
yi.wu
8e48c77b54
wip
7 years ago
yi.wu
3d69a82b83
fix dist train broadcasting bug
7 years ago
fengjiayi
964f515e9a
fix mac compile
7 years ago
Yancey1989
7e6518e8ca
fix compile warning
7 years ago
Yancey1989
7d1b146939
Merge branch 'develop' of github.com:PaddlePaddle/Paddle into overlap_memcpy_with_dist
7 years ago
Qiyang Min
046bb5c8cb
Fix NCCLBcast hang up bug in Parallel Executor ( #11377 )
...
* 1. Create buddy allocator in each places before NcclBcast the variables
2. Check the memory usage of ALL gpus rather than the first one
* 1. Make NCCLGroupGuard guards only the ncclBcast part, which avoid ncclGroupEnd blocking the exception throwing
2. NOTE the usage of NCCLGroupGuard
* Remove the memory usage check of gpus
* Fix code style
7 years ago
Yancey1989
6d752bafd8
use get_appropriate_dev to schedule rpc op
7 years ago
Yancey1989
4444e79e46
Merge branch 'develop' of github.com:PaddlePaddle/Paddle into overlap_memcpy_with_dist
7 years ago
chengduoZH
173d72b481
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into enable_cpu_on_pe
7 years ago
chengduoZH
aadaadf735
replace use_event with use_cuda, because use_event means the program running with CUDA, so use_cuda maybe more intuitive.
7 years ago
chengduoZH
1e731f5964
small fix
7 years ago
chengduoZH
5a3c8bf813
fix in c++ side
7 years ago