yangyaming
|
82571deb89
|
Change `customize_loss_grad` to `use_default_grad_scale`.
|
7 years ago |
Yu Yang
|
54ada9449e
|
Add demo for recordio train/test and parallel executor
|
7 years ago |
Yu Yang
|
7a395881d4
|
Add customize_loss_grad option to PE
|
7 years ago |
Yu Yang
|
5305c5f845
|
Correctly implement destructor of ParallelExecutor
|
7 years ago |
fengjiayi
|
fbe562478d
|
Merge pull request #9994 from reyoung/feature/debug
Fix bugs in local_scopes
|
7 years ago |
Yu Yang
|
06fb055a2f
|
New group
|
7 years ago |
Yu Yang
|
71a2e6b73c
|
Reverse create var
|
7 years ago |
Yu Yang
|
89728f8e66
|
update
|
7 years ago |
Yu Yang
|
eb2e4eeade
|
Debug
|
7 years ago |
Yu Yang
|
b4aaa00a8a
|
Polish logic of ParallelExecutor
|
7 years ago |
Yu Yang
|
ad73b331c7
|
Eagerly drop local scope in iteration (#9838)
* Eagerly drop local scope in iteration
* Correct create var
* Fix typo
* Debug
|
7 years ago |
fengjiayi
|
90084a25d2
|
Merge pull request #9743 from JiayiFeng/modify_readers_to_fit_parallel_executor
Modify readers to fit the parallel executor
|
7 years ago |
wanghaoshuang
|
19c1a68ee9
|
Fix lost of LoD while splitting tensor in parallel executor.
|
7 years ago |
JiayiFeng
|
ee178d5aeb
|
fix bugs
|
7 years ago |
chengduoZH
|
7e7611d067
|
when the number of samples of current batch is less than the count of devices, let it crash.
|
7 years ago |
qingqing01
|
2b7e5bd366
|
Support testing during training by ParallelExecutor. (#9738)
* Support testing during training by ParallelExecutor.
* Add unit test.
* Improve the interface.
* Follow comments.
|
7 years ago |
Xin Pan
|
4bbfa9eccb
|
Add feed to ParallelExecutor
|
7 years ago |
Xin Pan
|
b123ce88a1
|
Add enable/disable for delayed ops
|
7 years ago |
Xin Pan
|
d0ac92531d
|
Improve ParallelExecutor performance
|
7 years ago |
qiaolongfei
|
9a101cfc08
|
clean code
|
7 years ago |
qiaolongfei
|
997e9a1fd2
|
fix mac compile
|
7 years ago |
chengduoZH
|
60d0a0594e
|
refine parallel
|
7 years ago |
Yu Yang
|
3aa2a8ffcf
|
Follow comments
|
7 years ago |
Yu Yang
|
02aaecca35
|
Fix CPU compile
|
7 years ago |
Yu Yang
|
edfd741e3a
|
Add simple python wrapper for ParallelExecutor
|
7 years ago |
Yu Yang
|
a7b0d5bd26
|
Clean code
|
7 years ago |
Yu Yang
|
e3144393e3
|
Extract Executors to indie modules
|
7 years ago |
Yu Yang
|
c70b60dd70
|
Make executor steal graph inside
|
7 years ago |
Yu Yang
|
4c3361cda8
|
Extract GraphExecutor
|
7 years ago |
Yu Yang
|
b123e43bf9
|
extract multi devices graph builder
|
7 years ago |
Yu Yang
|
dd73d18bb7
|
Extract SSAGraph
|
7 years ago |
Yu Yang
|
79989c9025
|
Add SSA builder
|
7 years ago |
Yu Yang
|
64d7a30271
|
Extract SSAGraph
|
7 years ago |
Yu Yang
|
8dec4ad7a1
|
Use int not Place for vars
|
7 years ago |
Yu Yang
|
3181501013
|
Rerange code
|
7 years ago |
Yu Yang
|
f28ae6e4b1
|
Reorganize Code
|
7 years ago |
Yu Yang
|
5c333e4143
|
Add dctor for dev_ctx
|
7 years ago |
Yu Yang
|
15f5f10ed5
|
AddInput/AddOutput for OpHandle
|
7 years ago |
Yu Yang
|
5368e50d84
|
Reorganize code
|
7 years ago |
Yu Yang
|
fe7ed285d1
|
Extract NCCLCtxMap
|
7 years ago |
Yu Yang
|
6ebc6bf533
|
ReorganizeCode
|
7 years ago |
Yu Yang
|
a478a11e0b
|
NCCL Guard for bcast
|
7 years ago |
Yu Yang
|
f2685bed81
|
Clean code
|
7 years ago |
Yu Yang
|
41ad632341
|
Add NCCL Group Guard
|
7 years ago |
Yu Yang
|
99fe83a020
|
Move nccl helper
|
7 years ago |
Yu Yang
|
90f980167d
|
Do not wait computation stream
|
7 years ago |
Yu Yang
|
7ac969b88c
|
Debug
* add Check align
* Make FetchData not shared_ptr
* Remove FetchData
* Wait & Fetch Data
|
7 years ago |
Yu Yang
|
599f7a87ba
|
Refine code
|
7 years ago |
Yu Yang
|
43e54079a8
|
Debug code
|
7 years ago |
Yu Yang
|
e335f01826
|
Add more logs
|
7 years ago |