Polishing the cpu profiling doc (#6116)

release/0.11.0
Abhinav Arora 8 years ago committed by Yi Wang
parent 0d40a4dbc6
commit 6dc5b34e5b

@ -1,13 +1,13 @@
This tutorial introduces techniques we used to profile and tune the
This tutorial introduces techniques we use to profile and tune the
CPU performance of PaddlePaddle. We will use Python packages
`cProfile` and `yep`, and Google `perftools`.
`cProfile` and `yep`, and Google's `perftools`.
Profiling is the process that reveals the performance bottlenecks,
Profiling is the process that reveals performance bottlenecks,
which could be very different from what's in the developers' mind.
Performance tuning is to fix the bottlenecks. Performance optimization
Performance tuning is done to fix these bottlenecks. Performance optimization
repeats the steps of profiling and tuning alternatively.
PaddlePaddle users program AI by calling the Python API, which calls
PaddlePaddle users program AI applications by calling the Python API, which calls
into `libpaddle.so.` written in C++. In this tutorial, we focus on
the profiling and tuning of
@ -82,7 +82,7 @@ focus on. We can sort above profiling file by tottime:
We can see that the most time-consuming function is the `built-in
method run`, which is a C++ function in `libpaddle.so`. We will
explain how to profile C++ code in the next section. At the right
explain how to profile C++ code in the next section. At this
moment, let's look into the third function `sync_with_cpp`, which is a
Python function. We can click it to understand more about it:
@ -135,8 +135,8 @@ to generate the profiling file. The default filename is
`main.py.prof`.
Please be aware of the `-v` command line option, which prints the
analysis results after generating the profiling file. By taking a
glance at the print result, we'd know that if we stripped debug
analysis results after generating the profiling file. By examining the
the print result, we'd know that if we stripped debug
information from `libpaddle.so` at build time. The following hints
help make sure that the analysis results are readable:
@ -155,9 +155,9 @@ help make sure that the analysis results are readable:
variable `OMP_NUM_THREADS=1` to prevents OpenMP from automatically
starting multiple threads.
### Look into the Profiling File
### Examining the Profiling File
The tool we used to look into the profiling file generated by
The tool we used to examine the profiling file generated by
`perftools` is [`pprof`](https://github.com/google/pprof), which
provides a Web-based GUI like `cprofilev`.
@ -194,4 +194,4 @@ time, and `MomentumOp` takes about 17%. Obviously, we'd want to
optimize `MomentumOp`.
`pprof` would mark performance critical parts of the program in
red. It's a good idea to follow the hint.
red. It's a good idea to follow the hints.

Loading…
Cancel
Save