|
|
|
@ -1,13 +1,13 @@
|
|
|
|
|
This tutorial introduces techniques we used to profile and tune the
|
|
|
|
|
This tutorial introduces techniques we use to profile and tune the
|
|
|
|
|
CPU performance of PaddlePaddle. We will use Python packages
|
|
|
|
|
`cProfile` and `yep`, and Google `perftools`.
|
|
|
|
|
`cProfile` and `yep`, and Google's `perftools`.
|
|
|
|
|
|
|
|
|
|
Profiling is the process that reveals the performance bottlenecks,
|
|
|
|
|
Profiling is the process that reveals performance bottlenecks,
|
|
|
|
|
which could be very different from what's in the developers' mind.
|
|
|
|
|
Performance tuning is to fix the bottlenecks. Performance optimization
|
|
|
|
|
Performance tuning is done to fix these bottlenecks. Performance optimization
|
|
|
|
|
repeats the steps of profiling and tuning alternatively.
|
|
|
|
|
|
|
|
|
|
PaddlePaddle users program AI by calling the Python API, which calls
|
|
|
|
|
PaddlePaddle users program AI applications by calling the Python API, which calls
|
|
|
|
|
into `libpaddle.so.` written in C++. In this tutorial, we focus on
|
|
|
|
|
the profiling and tuning of
|
|
|
|
|
|
|
|
|
@ -82,7 +82,7 @@ focus on. We can sort above profiling file by tottime:
|
|
|
|
|
|
|
|
|
|
We can see that the most time-consuming function is the `built-in
|
|
|
|
|
method run`, which is a C++ function in `libpaddle.so`. We will
|
|
|
|
|
explain how to profile C++ code in the next section. At the right
|
|
|
|
|
explain how to profile C++ code in the next section. At this
|
|
|
|
|
moment, let's look into the third function `sync_with_cpp`, which is a
|
|
|
|
|
Python function. We can click it to understand more about it:
|
|
|
|
|
|
|
|
|
@ -135,8 +135,8 @@ to generate the profiling file. The default filename is
|
|
|
|
|
`main.py.prof`.
|
|
|
|
|
|
|
|
|
|
Please be aware of the `-v` command line option, which prints the
|
|
|
|
|
analysis results after generating the profiling file. By taking a
|
|
|
|
|
glance at the print result, we'd know that if we stripped debug
|
|
|
|
|
analysis results after generating the profiling file. By examining the
|
|
|
|
|
the print result, we'd know that if we stripped debug
|
|
|
|
|
information from `libpaddle.so` at build time. The following hints
|
|
|
|
|
help make sure that the analysis results are readable:
|
|
|
|
|
|
|
|
|
@ -155,9 +155,9 @@ help make sure that the analysis results are readable:
|
|
|
|
|
variable `OMP_NUM_THREADS=1` to prevents OpenMP from automatically
|
|
|
|
|
starting multiple threads.
|
|
|
|
|
|
|
|
|
|
### Look into the Profiling File
|
|
|
|
|
### Examining the Profiling File
|
|
|
|
|
|
|
|
|
|
The tool we used to look into the profiling file generated by
|
|
|
|
|
The tool we used to examine the profiling file generated by
|
|
|
|
|
`perftools` is [`pprof`](https://github.com/google/pprof), which
|
|
|
|
|
provides a Web-based GUI like `cprofilev`.
|
|
|
|
|
|
|
|
|
@ -194,4 +194,4 @@ time, and `MomentumOp` takes about 17%. Obviously, we'd want to
|
|
|
|
|
optimize `MomentumOp`.
|
|
|
|
|
|
|
|
|
|
`pprof` would mark performance critical parts of the program in
|
|
|
|
|
red. It's a good idea to follow the hint.
|
|
|
|
|
red. It's a good idea to follow the hints.
|
|
|
|
|