Execute the program with multi threads (#6223)
* multi cpu design * update * multi cpu executor to executor * add graph converting * use parallel operator to execute blocks with multi threads * use auto-transpiler * use auto-transpiler * update * update graphdel_some_in_makelist
parent
e0c4c39812
commit
2d5ec16bc8
@ -0,0 +1,43 @@
|
|||||||
|
# Design Doc: Execute the Program with Multi CPU
|
||||||
|
|
||||||
|
## Abstract
|
||||||
|
|
||||||
|
This Design Doc propose an approach to make the user-defined Op graph
|
||||||
|
running with multi-CPU, we will use an auto transpiler to convert the user-defined
|
||||||
|
Op graph to a multi-CPU Op graph, and run `ParallelDo` Op to run the graph.
|
||||||
|
|
||||||
|
## Transpiler
|
||||||
|
|
||||||
|
<img src="src/multi-threads/single-thread@3x.png" width="300">
|
||||||
|
|
||||||
|
After converted:
|
||||||
|
|
||||||
|
<img src="src/multi-threads/multi-threads@3x.png" width="1000">
|
||||||
|
|
||||||
|
## Implement
|
||||||
|
|
||||||
|
- `Multi-CPU Transpiler` will convert the graph to a multi-CPU graph
|
||||||
|
which would be executed with multi-threads.
|
||||||
|
- `BlockingCounter` will `Init/Decrement` an atomic counter, and Blocking `Wait`
|
||||||
|
for the atomic counter become `0`:
|
||||||
|
```cpp
|
||||||
|
BlockingCounter bc(thread_count);
|
||||||
|
for (int i = 0; i < thread_count; ++i) {
|
||||||
|
thread_pool->Start([&bc] {bc.DecrementCount(); })
|
||||||
|
}
|
||||||
|
bc.Wait();
|
||||||
|
```
|
||||||
|
- `ParallelDo` Operator
|
||||||
|
- Initialize a thread pool which is a Singleton.
|
||||||
|
- Use a block id as the input, and create run the specify Block on independent scope
|
||||||
|
with multi-threads.
|
||||||
|
- Initialize a `BlockingCounter` instance and wait until all threads are done.
|
||||||
|
- `Split` Operator will split the Input Tensor into a TensorArray.
|
||||||
|
- `Merge` merge all the gradients which calculated in different threads
|
||||||
|
with `mean/sum/max/min...` method, and then run the Optimizer Op to optimize `W`.
|
||||||
|
|
||||||
|
## TODO
|
||||||
|
|
||||||
|
- Improve the optimizer stage with multi-threads, since we could
|
||||||
|
assign the parameters to the different threads and execute
|
||||||
|
optimizer with multi-threads.
|
Binary file not shown.
After Width: | Height: | Size: 350 KiB |
After Width: | Height: | Size: 76 KiB |
Loading…
Reference in new issue