Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into dev_new_backward

del_some_in_makelist
fengjiayi 7 years ago
commit febb725102

@ -16,8 +16,6 @@ cmake_minimum_required(VERSION 3.0)
set(CMAKE_MODULE_PATH ${CMAKE_MODULE_PATH} "${CMAKE_CURRENT_SOURCE_DIR}/cmake") set(CMAKE_MODULE_PATH ${CMAKE_MODULE_PATH} "${CMAKE_CURRENT_SOURCE_DIR}/cmake")
set(PADDLE_SOURCE_DIR ${CMAKE_CURRENT_SOURCE_DIR}) set(PADDLE_SOURCE_DIR ${CMAKE_CURRENT_SOURCE_DIR})
set(PADDLE_BINARY_DIR ${CMAKE_CURRENT_BINARY_DIR}) set(PADDLE_BINARY_DIR ${CMAKE_CURRENT_BINARY_DIR})
SET(CMAKE_CXX_FLAGS_RELWITHDEBINFO "-O3 -g -DNDEBUG")
SET(CMAKE_C_FLAGS_RELWITHDEBINFO "-O3 -g -DNDEBUG")
include(system) include(system)
@ -201,6 +199,10 @@ if(WITH_GOLANG)
endif(WITH_GOLANG) endif(WITH_GOLANG)
set(PADDLE_PYTHON_BUILD_DIR "${CMAKE_CURRENT_BINARY_DIR}/python/build") set(PADDLE_PYTHON_BUILD_DIR "${CMAKE_CURRENT_BINARY_DIR}/python/build")
SET(CMAKE_CXX_FLAGS_RELWITHDEBINFO "-O3 -g -DNDEBUG")
SET(CMAKE_C_FLAGS_RELWITHDEBINFO "-O3 -g -DNDEBUG")
add_subdirectory(paddle) add_subdirectory(paddle)
if(WITH_PYTHON) if(WITH_PYTHON)
add_subdirectory(python) add_subdirectory(python)

@ -61,32 +61,32 @@ Please refer to our [release announcement](https://github.com/PaddlePaddle/Paddl
## Installation ## Installation
It is recommended to check out the It is recommended to check out the
[Docker installation guide](http://doc.paddlepaddle.org/develop/doc/getstarted/build_and_install/docker_install_en.html) [Docker installation guide](http://www.paddlepaddle.org/docs/develop/documentation/en/getstarted/build_and_install/docker_install_en.html)
before looking into the before looking into the
[build from source guide](http://doc.paddlepaddle.org/develop/doc/getstarted/build_and_install/build_from_source_en.html). [build from source guide](http://www.paddlepaddle.org/docs/develop/documentation/en/getstarted/build_and_install/build_from_source_en.html).
## Documentation ## Documentation
We provide [English](http://doc.paddlepaddle.org/develop/doc/) and We provide [English](http://www.paddlepaddle.org/docs/develop/documentation/en/getstarted/index_en.html) and
[Chinese](http://doc.paddlepaddle.org/doc_cn/) documentation. [Chinese](http://www.paddlepaddle.org/docs/develop/documentation/zh/getstarted/index_cn.html) documentation.
- [Deep Learning 101](http://book.paddlepaddle.org/index.html) - [Deep Learning 101](http://www.paddlepaddle.org/docs/develop/book/01.fit_a_line/index.html)
You might want to start from this online interactive book that can run in a Jupyter Notebook. You might want to start from this online interactive book that can run in a Jupyter Notebook.
- [Distributed Training](http://doc.paddlepaddle.org/develop/doc/howto/usage/cluster/cluster_train_en.html) - [Distributed Training](http://www.paddlepaddle.org/docs/develop/documentation/en/howto/usage/cluster/cluster_train_en.html)
You can run distributed training jobs on MPI clusters. You can run distributed training jobs on MPI clusters.
- [Distributed Training on Kubernetes](http://doc.paddlepaddle.org/develop/doc/howto/usage/k8s/k8s_en.html) - [Distributed Training on Kubernetes](http://www.paddlepaddle.org/docs/develop/documentation/en/howto/usage/cluster/k8s_en.html)
You can also run distributed training jobs on Kubernetes clusters. You can also run distributed training jobs on Kubernetes clusters.
- [Python API](http://doc.paddlepaddle.org/develop/doc/api/index_en.html) - [Python API](http://www.paddlepaddle.org/docs/develop/documentation/en/api/index_en.html)
Our new API enables much shorter programs. Our new API enables much shorter programs.
- [How to Contribute](http://doc.paddlepaddle.org/develop/doc/howto/dev/contribute_to_paddle_en.html) - [How to Contribute](http://www.paddlepaddle.org/docs/develop/documentation/en/howto/dev/contribute_to_paddle_en.html)
We appreciate your contributions! We appreciate your contributions!

@ -22,6 +22,7 @@ On each machine, we will test and compare the performance of training on single
#### Training #### Training
Test on batch size 64, 128, 256 on Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz Test on batch size 64, 128, 256 on Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz
Pay attetion that the speed below includes forward, backward and parameter update time. So we can not directly compare the data with the benchmark of caffe `time` [command](https://github.com/PaddlePaddle/Paddle/blob/develop/benchmark/caffe/image/run.sh#L9), which only contain forward and backward. The updating time of parameter would become very heavy when the weight size are large, especially on alexnet.
Input image size - 3 * 224 * 224, Time: images/second Input image size - 3 * 224 * 224, Time: images/second
@ -55,6 +56,16 @@ Input image size - 3 * 224 * 224, Time: images/second
<img src="figs/googlenet-cpu-train.png" width="500"> <img src="figs/googlenet-cpu-train.png" width="500">
- Alexnet
| BatchSize | 64 | 128 | 256 |
|--------------|--------| ------ | -------|
| OpenBLAS | 2.13 | 2.45 | 2.68 |
| MKLML | 66.37 | 105.60 | 144.04 |
| MKL-DNN | 399.00 | 498.94 | 626.53 |
chart TBD
#### Inference #### Inference
Test on batch size 1, 2, 4, 8, 16 on Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz Test on batch size 1, 2, 4, 8, 16 on Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz
- VGG-19 - VGG-19

@ -6,8 +6,18 @@ height = 227
width = 227 width = 227
num_class = 1000 num_class = 1000
batch_size = get_config_arg('batch_size', int, 128) batch_size = get_config_arg('batch_size', int, 128)
gp = get_config_arg('layer_num', int, 1)
is_infer = get_config_arg("is_infer", bool, False)
num_samples = get_config_arg('num_samples', int, 2560)
args = {'height': height, 'width': width, 'color': True, 'num_class': num_class} args = {
'height': height,
'width': width,
'color': True,
'num_class': num_class,
'is_infer': is_infer,
'num_samples': num_samples
}
define_py_data_sources2( define_py_data_sources2(
"train.list", None, module="provider", obj="process", args=args) "train.list", None, module="provider", obj="process", args=args)
@ -31,7 +41,7 @@ net = img_pool_layer(input=net, pool_size=3, stride=2)
# conv2 # conv2
net = img_conv_layer( net = img_conv_layer(
input=net, filter_size=5, num_filters=256, stride=1, padding=2, groups=1) input=net, filter_size=5, num_filters=256, stride=1, padding=2, groups=gp)
net = img_cmrnorm_layer(input=net, size=5, scale=0.0001, power=0.75) net = img_cmrnorm_layer(input=net, size=5, scale=0.0001, power=0.75)
net = img_pool_layer(input=net, pool_size=3, stride=2) net = img_pool_layer(input=net, pool_size=3, stride=2)
@ -40,11 +50,11 @@ net = img_conv_layer(
input=net, filter_size=3, num_filters=384, stride=1, padding=1) input=net, filter_size=3, num_filters=384, stride=1, padding=1)
# conv4 # conv4
net = img_conv_layer( net = img_conv_layer(
input=net, filter_size=3, num_filters=384, stride=1, padding=1, groups=1) input=net, filter_size=3, num_filters=384, stride=1, padding=1, groups=gp)
# conv5 # conv5
net = img_conv_layer( net = img_conv_layer(
input=net, filter_size=3, num_filters=256, stride=1, padding=1, groups=1) input=net, filter_size=3, num_filters=256, stride=1, padding=1, groups=gp)
net = img_pool_layer(input=net, pool_size=3, stride=2) net = img_pool_layer(input=net, pool_size=3, stride=2)
net = fc_layer( net = fc_layer(
@ -59,6 +69,9 @@ net = fc_layer(
layer_attr=ExtraAttr(drop_rate=0.5)) layer_attr=ExtraAttr(drop_rate=0.5))
net = fc_layer(input=net, size=1000, act=SoftmaxActivation()) net = fc_layer(input=net, size=1000, act=SoftmaxActivation())
lab = data_layer('label', num_class) if is_infer:
loss = cross_entropy(input=net, label=lab) outputs(net)
outputs(loss) else:
lab = data_layer('label', num_class)
loss = cross_entropy(input=net, label=lab)
outputs(loss)

@ -7,13 +7,15 @@ num_class = 1000
batch_size = get_config_arg('batch_size', int, 128) batch_size = get_config_arg('batch_size', int, 128)
use_gpu = get_config_arg('use_gpu', bool, True) use_gpu = get_config_arg('use_gpu', bool, True)
is_infer = get_config_arg("is_infer", bool, False) is_infer = get_config_arg("is_infer", bool, False)
num_samples = get_config_arg('num_samples', int, 2560)
args = { args = {
'height': height, 'height': height,
'width': width, 'width': width,
'color': True, 'color': True,
'num_class': num_class, 'num_class': num_class,
'is_infer': is_infer 'is_infer': is_infer,
'num_samples': num_samples
} }
define_py_data_sources2( define_py_data_sources2(
"train.list" if not is_infer else None, "train.list" if not is_infer else None,

@ -14,6 +14,7 @@ def initHook(settings, height, width, color, num_class, **kwargs):
else: else:
settings.data_size = settings.height * settings.width settings.data_size = settings.height * settings.width
settings.is_infer = kwargs.get('is_infer', False) settings.is_infer = kwargs.get('is_infer', False)
settings.num_samples = kwargs.get('num_samples', 2560)
if settings.is_infer: if settings.is_infer:
settings.slots = [dense_vector(settings.data_size)] settings.slots = [dense_vector(settings.data_size)]
else: else:
@ -23,7 +24,7 @@ def initHook(settings, height, width, color, num_class, **kwargs):
@provider( @provider(
init_hook=initHook, min_pool_size=-1, cache=CacheType.CACHE_PASS_IN_MEM) init_hook=initHook, min_pool_size=-1, cache=CacheType.CACHE_PASS_IN_MEM)
def process(settings, file_list): def process(settings, file_list):
for i in xrange(2560 if settings.is_infer else 1024): for i in xrange(settings.num_samples):
img = np.random.rand(1, settings.data_size).reshape(-1, 1).flatten() img = np.random.rand(1, settings.data_size).reshape(-1, 1).flatten()
if settings.is_infer: if settings.is_infer:
yield img.astype('float32') yield img.astype('float32')

@ -7,13 +7,15 @@ num_class = 1000
batch_size = get_config_arg('batch_size', int, 64) batch_size = get_config_arg('batch_size', int, 64)
layer_num = get_config_arg("layer_num", int, 50) layer_num = get_config_arg("layer_num", int, 50)
is_infer = get_config_arg("is_infer", bool, False) is_infer = get_config_arg("is_infer", bool, False)
num_samples = get_config_arg('num_samples', int, 2560)
args = { args = {
'height': height, 'height': height,
'width': width, 'width': width,
'color': True, 'color': True,
'num_class': num_class, 'num_class': num_class,
'is_infer': is_infer 'is_infer': is_infer,
'num_samples': num_samples
} }
define_py_data_sources2( define_py_data_sources2(
"train.list" if not is_infer else None, "train.list" if not is_infer else None,

@ -37,7 +37,7 @@ function infer() {
--trainer_count=1 \ --trainer_count=1 \
--num_passes=1 \ --num_passes=1 \
--save_dir="models/${topology}-${layer_num}" \ --save_dir="models/${topology}-${layer_num}" \
--config_args="batch_size=128,layer_num=${layer_num}" \ --config_args="batch_size=128,layer_num=${layer_num},num_samples=256" \
> /dev/null 2>&1 > /dev/null 2>&1
echo "Done" echo "Done"
fi fi
@ -79,8 +79,9 @@ fi
# inference benchmark # inference benchmark
for use_mkldnn in True False; do for use_mkldnn in True False; do
for batchsize in 1 2 4 8 16; do for batchsize in 1 2 4 8 16; do
infer googlenet v1 $batchsize $use_mkldnn
infer resnet 50 $batchsize $use_mkldnn
infer vgg 19 $batchsize $use_mkldnn infer vgg 19 $batchsize $use_mkldnn
infer resnet 50 $batchsize $use_mkldnn
infer googlenet v1 $batchsize $use_mkldnn
infer alexnet 2 $batchsize $use_mkldnn
done done
done done

@ -28,6 +28,10 @@ function train() {
--test_period=100 \ --test_period=100 \
--config_args=$args \ --config_args=$args \
2>&1 | tee ${log} 2>&1 | tee ${log}
avg_time=`tail ${log} -n 1 | awk -F ' ' '{print $8}' | sed 's/avg=//'`
fps=`awk 'BEGIN{printf "%.2f",('$bs' / '$avg_time' * 1000)}'`
echo "FPS: $fps images/sec" 2>&1 | tee -a ${log}
} }
if [ ! -f "train.list" ]; then if [ ! -f "train.list" ]; then
@ -43,5 +47,6 @@ for use_mkldnn in True False; do
train vgg 19 $batchsize $use_mkldnn train vgg 19 $batchsize $use_mkldnn
train resnet 50 $batchsize $use_mkldnn train resnet 50 $batchsize $use_mkldnn
train googlenet v1 $batchsize $use_mkldnn train googlenet v1 $batchsize $use_mkldnn
train alexnet 2 $batchsize $use_mkldnn
done done
done done

@ -0,0 +1,64 @@
set -e
function clock_to_seconds() {
hours=`echo $1 | awk -F ':' '{print $1}'`
mins=`echo $1 | awk -F ':' '{print $2}'`
secs=`echo $1 | awk -F ':' '{print $3}'`
echo `awk 'BEGIN{printf "%.2f",('$secs' + '$mins' * 60 + '$hours' * 3600)}'`
}
function infer() {
unset OMP_NUM_THREADS MKL_NUM_THREADS OMP_DYNAMIC KMP_AFFINITY
topology=$1
layer_num=$2
bs=$3
thread=`nproc`
if [ $thread -gt $bs ]; then
thread=$bs
fi
log="logs/infer-${topology}-${layer_num}-${thread}openblas-${bs}.log"
models_in="models/${topology}-${layer_num}/pass-00000/"
if [ ! -d $models_in ]; then
echo "./run_mkl_infer.sh to save the model first"
exit 0
fi
log_period=$((32 / bs))
paddle train --job=test \
--config="${topology}.py" \
--use_mkldnn=False \
--use_gpu=False \
--trainer_count=$thread \
--log_period=$log_period \
--config_args="batch_size=${bs},layer_num=${layer_num},is_infer=True,num_samples=256" \
--init_model_path=$models_in \
2>&1 | tee ${log}
# calculate the last 5 logs period time of 160(=32*5) samples,
# the time before are burning time.
start=`tail ${log} -n 7 | head -n 1 | awk -F ' ' '{print $2}' | xargs`
end=`tail ${log} -n 2 | head -n 1 | awk -F ' ' '{print $2}' | xargs`
start_sec=`clock_to_seconds $start`
end_sec=`clock_to_seconds $end`
fps=`awk 'BEGIN{printf "%.2f",(160 / ('$end_sec' - '$start_sec'))}'`
echo "Last 160 samples start: ${start}(${start_sec} sec), end: ${end}(${end_sec} sec;" >> ${log}
echo "FPS: $fps images/sec" 2>&1 | tee -a ${log}
}
if [ ! -f "train.list" ]; then
echo " " > train.list
fi
if [ ! -f "test.list" ]; then
echo " " > test.list
fi
if [ ! -d "logs" ]; then
mkdir logs
fi
# inference benchmark
for batchsize in 1 2 4 8 16; do
infer vgg 19 $batchsize
infer resnet 50 $batchsize
infer googlenet v1 $batchsize
infer alexnet 2 $batchsize
done

@ -0,0 +1,41 @@
set -e
function train() {
unset OMP_NUM_THREADS MKL_NUM_THREADS OMP_DYNAMIC KMP_AFFINITY
topology=$1
layer_num=$2
bs=$3
thread=`nproc`
# each trainer_count use only 1 core to avoid conflict
log="logs/train-${topology}-${layer_num}-${thread}openblas-${bs}.log"
args="batch_size=${bs},layer_num=${layer_num}"
config="${topology}.py"
paddle train --job=time \
--config=$config \
--use_mkldnn=False \
--use_gpu=False \
--trainer_count=$thread \
--log_period=3 \
--test_period=30 \
--config_args=$args \
2>&1 | tee ${log}
avg_time=`tail ${log} -n 1 | awk -F ' ' '{print $8}' | sed 's/avg=//'`
fps=`awk 'BEGIN{printf "%.2f",('$bs' / '$avg_time' * 1000)}'`
echo "FPS: $fps images/sec" 2>&1 | tee -a ${log}
}
if [ ! -f "train.list" ]; then
echo " " > train.list
fi
if [ ! -d "logs" ]; then
mkdir logs
fi
# training benchmark
for batchsize in 64 128 256; do
train vgg 19 $batchsize
train resnet 50 $batchsize
train googlenet v1 $batchsize
train alexnet 2 $batchsize
done

@ -7,13 +7,15 @@ num_class = 1000
batch_size = get_config_arg('batch_size', int, 64) batch_size = get_config_arg('batch_size', int, 64)
layer_num = get_config_arg('layer_num', int, 19) layer_num = get_config_arg('layer_num', int, 19)
is_infer = get_config_arg("is_infer", bool, False) is_infer = get_config_arg("is_infer", bool, False)
num_samples = get_config_arg('num_samples', int, 2560)
args = { args = {
'height': height, 'height': height,
'width': width, 'width': width,
'color': True, 'color': True,
'num_class': num_class, 'num_class': num_class,
'is_infer': is_infer 'is_infer': is_infer,
'num_samples': num_samples
} }
define_py_data_sources2( define_py_data_sources2(
"train.list" if not is_infer else None, "train.list" if not is_infer else None,

@ -253,9 +253,9 @@ IF(NOT PROTOBUF_FOUND)
IF(WITH_C_API) IF(WITH_C_API)
INSTALL(DIRECTORY ${PROTOBUF_INCLUDE_DIR} DESTINATION third_party/protobuf) INSTALL(DIRECTORY ${PROTOBUF_INCLUDE_DIR} DESTINATION third_party/protobuf)
IF(ANDROID) IF(ANDROID)
INSTALL(FILES ${PROTOBUF_LIBRARY} DESTINATION third_party/protobuf/lib/${ANDROID_ABI}) INSTALL(FILES ${PROTOBUF_LITE_LIBRARY} DESTINATION third_party/protobuf/lib/${ANDROID_ABI})
ELSE() ELSE()
INSTALL(FILES ${PROTOBUF_LIBRARY} DESTINATION third_party/protobuf/lib) INSTALL(FILES ${PROTOBUF_LITE_LIBRARY} DESTINATION third_party/protobuf/lib)
ENDIF() ENDIF()
ENDIF() ENDIF()

@ -7,3 +7,4 @@ API
模型配置 <v2/model_configs.rst> 模型配置 <v2/model_configs.rst>
数据访问 <v2/data.rst> 数据访问 <v2/data.rst>
训练与应用 <v2/run_logic.rst> 训练与应用 <v2/run_logic.rst>
v2/fluid.rst

@ -467,7 +467,7 @@ lambda_cost
:noindex: :noindex:
square_error_cost square_error_cost
-------- -----------------
.. autoclass:: paddle.v2.layer.square_error_cost .. autoclass:: paddle.v2.layer.square_error_cost
:noindex: :noindex:
@ -533,7 +533,7 @@ Miscs
===== =====
dropout dropout
-------------- --------
.. autoclass:: paddle.v2.layer.dropout .. autoclass:: paddle.v2.layer.dropout
:noindex: :noindex:

File diff suppressed because it is too large Load Diff

@ -3,19 +3,19 @@ Nets
=========== ===========
simple_img_conv_pool simple_img_conv_pool
----------- --------------------
.. autofunction:: paddle.v2.fluid.nets.simple_img_conv_pool .. autofunction:: paddle.v2.fluid.nets.simple_img_conv_pool
:noindex: :noindex:
img_conv_group img_conv_group
----------- ---------------
.. autofunction:: paddle.v2.fluid.nets.img_conv_group .. autofunction:: paddle.v2.fluid.nets.img_conv_group
:noindex: :noindex:
sequence_conv_pool sequence_conv_pool
----------- ------------------
.. autofunction:: paddle.v2.fluid.nets.sequence_conv_pool .. autofunction:: paddle.v2.fluid.nets.sequence_conv_pool
:noindex: :noindex:

@ -18,7 +18,7 @@ SGDOptimizer
MomentumOptimizer MomentumOptimizer
----------- -----------------
.. automodule:: paddle.v2.fluid.optimizer .. automodule:: paddle.v2.fluid.optimizer
:members: MomentumOptimizer :members: MomentumOptimizer
:noindex: :noindex:
@ -26,14 +26,14 @@ MomentumOptimizer
AdagradOptimizer AdagradOptimizer
----------- ----------------
.. automodule:: paddle.v2.fluid.optimizer .. automodule:: paddle.v2.fluid.optimizer
:members: AdagradOptimizer :members: AdagradOptimizer
:noindex: :noindex:
AdamOptimizer AdamOptimizer
----------- -------------
.. automodule:: paddle.v2.fluid.optimizer .. automodule:: paddle.v2.fluid.optimizer
:members: AdamOptimizer :members: AdamOptimizer
:noindex: :noindex:
@ -47,7 +47,7 @@ AdamaxOptimizer
DecayedAdagradOptimizer DecayedAdagradOptimizer
----------- -----------------------
.. automodule:: paddle.v2.fluid.optimizer .. automodule:: paddle.v2.fluid.optimizer
:members: DecayedAdagradOptimizer :members: DecayedAdagradOptimizer
:noindex: :noindex:

@ -3,14 +3,14 @@ Regularizer
=========== ===========
WeightDecayRegularizer WeightDecayRegularizer
----------- ----------------------
.. automodule:: paddle.v2.fluid.regularizer .. automodule:: paddle.v2.fluid.regularizer
:members: WeightDecayRegularizer :members: WeightDecayRegularizer
:noindex: :noindex:
L2DecayRegularizer L2DecayRegularizer
----------- ------------------
.. automodule:: paddle.v2.fluid.regularizer .. automodule:: paddle.v2.fluid.regularizer
:members: L2DecayRegularizer :members: L2DecayRegularizer
:noindex: :noindex:
@ -18,7 +18,7 @@ L2DecayRegularizer
L1DecayRegularizer L1DecayRegularizer
----------- -------------------
.. automodule:: paddle.v2.fluid.regularizer .. automodule:: paddle.v2.fluid.regularizer
:members: L1DecayRegularizer :members: L1DecayRegularizer

@ -291,10 +291,10 @@ public:
} }
void Run(const framework::Scope& scope, void Run(const framework::Scope& scope,
const platform::DeviceContext& dev_ctx) const override { const platform::Place& place) const override {
PADDLE_ENFORCE(symbols_ready_, "operators and variables should be created first."); PADDLE_ENFORCE(symbols_ready_, "operators and variables should be created first.");
for (auto& op : runtime_table_.ops()) { for (auto& op : runtime_table_.ops()) {
op->Run(scope, dev_ctx); op->Run(scope, place);
} }
} }

@ -1,23 +1,29 @@
# Executor Design Doc # Executor Design Doc
## Motivation ## Motivation
In [fluid](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/fluid.md), we encourage the user to use deep learning programming paradigms to describe the training process. When the user-written Python program is executed, it will first create a protobuf message
[`ProgramDesc`](https://github.com/PaddlePaddle/Paddle/blob/a91efdde6910ce92a78e3aa7157412c4c88d9ee8/paddle/framework/framework.proto#L145) that describes the process and is conceptually like an [abstract syntax tree](https://en.wikipedia.org/wiki/Abstract_syntax_tree).
We use executor to do the runtime evaluation of a `ProgramDesc`. The executor runs the `ProgramDesc` like an interpreter. `ProgramDesc` contains the intrinsics (operators in this case) and variables which will be used, executor explicitly executes the stored precompiled code.
## Overview ## Overview
An executor takes a `ProgramDesc`, a `block_id` and a `Scope`. The `ProgramDesc` is a list of blocks and each block contains the protobuf definition of all the parameters and operators. The `block_id` specifies the entrance block. And the `Scope` is the container of all the variable instance, which is persistent throughout different runs. An executor takes a `ProgramDesc`, a `block_id` and a `Scope`. The `ProgramDesc` is a list of blocks and each block contains the protobuf definition of all the parameters and operators in the block. The `block_id` specifies the entrance block. And the `Scope` is the container of all the variable instances, which is persistent throughout different runs.
### What does executor do? ## Executor
It evaluates all the operators in the `block_id`th block of a `ProgramDesc`. The `Executor` explicitly executes all the intrinsics (operators here) in the `block_id`th block of a `ProgramDesc`. Essentially, it instantiates Variables and Operators, then runs all the operators in sequence one-by-one.
It is very similar to how a push stack frame works when entering a block, following which it cleans up all the temporary variables when a mini-batch is finished. It does not however, have the stack frame pop process.
### What does executor NOT do? ### The interface
```c++
Executor(places);
```
A executor does not own any computing resources, a user can only construct an executor using the specified places.
It does not do runtime optimization, meaning intelligently parse the dependency of each op a choose which one to be run and in which order they should be run. ### Running an Executor
It does not do graph partitioning, meaning dividing the `ProgramDesc` into several small pieces and executing them on different devices. ```
void Run(ProgramDesc, Scope, block_id, create_local_scope);
## Implementation ```
An `Executor` only provides a unified way to execute `ProgramDesc`. `ProgramDesc` is the target that will be executed, the `Scope` specifies the variable container, the `block_id` indicates the entrance block and `create_local_scope` is a boolean that states whether it will destroy the temporary variables after the execution is finished.
`Executor` evaluates a `ProgramDesc`. Essentially, it instantiates Variables and Operators, then run all the operators in sequence. [[code]](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/executor.cc)

Binary file not shown.

After

Width:  |  Height:  |  Size: 108 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 33 KiB

Some files were not shown because too many files have changed in this diff Show More

Loading…
Cancel
Save