Profiling with GPerfTools
Introduction
Gperftools is a set of tools for performance profiling and memory checking. One of the main advantages of the CPU profiler is a very nice graphical output, low overhead, and very simple use (the profiled application doesn't require any recompilation, and the profiling is enabled by simply preloading the profiler's library also an optional library linking is possible when compiling).
Installation
Install the library packages from the GitHub repository
git clone https://github.com/gperftools/gperftools cd gperftools ./autogen.sh ./configure make sudo make install
On debian-based systems the complementary tools are packaged under the google-perftools package. For graphical output you also need graphviz installed:
sudo apt-get install google-perftools graphviz
Note that all the tools have the “google-” prefix under debian - the prefix may be missing on other systems (and is also missing in the official documentation).
Profiling with gperftools
CPU
A) Profile the whole process runtime
1. To start profiling run:
LD_PRELOAD: Path to the libprofiler.so usually located at /usr/local/lib/ or /usr/lib/x86_64-linux-gnu for 64 bit systems.
CPUPROFILE: Name of the output log file
LD_PRELOAD=/usr/local/lib/libprofiler.so CPUPROFILE=test.prof ./path/to/bin
- Note: Application example for gst-launch-1.0:
LD_PRELOAD=/usr/local/lib/libprofiler.so CPUPROFILE=test.prof gst-launch-1.0 videotestsrc ! fakesink
2. Keep the application open or running until it finishes, in this mode if the execution is canceled for example with Ctrl+C the output file will not be generated.
3.1 Once the application ended the test.prof file contains the CPU profile information. To get a graphical output run:
google-pprof --gv ./path/to/bin test.prof
3.2 Also as an alternative viewer you can display it to a web browser by running:
google-pprof --web ./path/to/bin test.prof
4. To generate a PDF report with the previous graphic output run:
google-pprof --pdf ./path/to/bin test.prof > output.pdf
- Note: Application example using gst-launch-1.0 as path/to/bin
To get a graphical output run:
google-pprof -gv /usr/bin/gst-launch-1.0 test.prof
To display it in a web browser run:
google-pprof --web /usr/bin/gst-launch-1.0 test.prof
To generate a PDF report with the previous graphic output run:
google-pprof --pdf /usr/bin/gst-launch-1.0 test.prof > output.pdf
Example graphical output:
B) Profile part of process runtime
In addition to defining the environment variable CPUPROFILE you can also define CPUPROFILESIGNAL. This allows profiling to be controlled via the signal number that you specify. The signal number must be unused by the program under normal operation. Internally it acts as a switch, triggered by the signal, which is off by default.
1. To start profiling run:
LD_PRELOAD=/usr/local/lib/libprofiler.so CPUPROFILE=test.prof CPUPROFILESIGNAL=12 ./path/to/bin
2. Leave the program running until you want to start the profiling process. Then send the signal:
2.1 You can use htop program to send the signal to the desired process:
2.2 Also you can use killall command to send the -12 signal
killall -12 /path/to/bin
3. Leave the program until the point you want to profile it, then run again:
killall -12 /path/to/bin
You will notice the following output when the output file was correctly generated:
Using signal 12 as cpu profiling switch PROFILE: interrupts/evictions/bytes = 4773/1826/231936
4. Once the application ended the test.prof.0 file contains the CPU profile information. To get a graphical output run:
google-pprof -gv ./path/to/bin test.prof
C) Profile specific section of source code
1. Add the header file in your code:
#include <gperftools/profiler.h>
2. Add the following function calls around the code you want to profile:
ProfilerStart("output_inside.prof"); //Start profiling section and save to file /* * Code to be analyzed */ ProfilerStop(); //End profiling section
3. When compiling you will need to link against the profiling library.
You can check the needed library flags by running:
fsolano@ridgerun-laptop:build$ pkg-config --libs libprofiler -L/usr/local/lib -lprofiler
If you are compiling manually, just add the -lprofiler flag.
If you are using for example meson compilation system you can ask for the library by using:
profiler_dep = dependency('libprofiler')
Add it to the list of your dependencies for example:
lib_common_dep = [opencv_dep,boost_dep,profiler_dep]
And use the dependencies list to build the test app
executable('test', 'test.cpp', dependencies: [libbevlib_dep], c_args: c_args)
4. Run the test application, where you will see the output similar to the following:
fsolano@ridgerun-laptop:build$ ./test PROFILE: interrupts/evictions/bytes = 9/0/280 PROFILE: interrupts/evictions/bytes = 7/0/584 PROFILE: interrupts/evictions/bytes = 12/0/872 PROFILE: interrupts/evictions/bytes = 9/0/712 PROFILE: interrupts/evictions/bytes = 9/0/904 PROFILE: interrupts/evictions/bytes = 7/0/464 PROFILE: interrupts/evictions/bytes = 8/0/680
5. With this method you can wait until the application execution ends, or end it with Ctrl+C.
6. Once the application ended the output_inside.prof file contains the CPU profile information. To get a graphical output run:
google-pprof -gv ./test output_inside.prof
Modifying Runtime Behavior
You can more finely control the behavior of the CPU profiler via environment variables.
Variable | Default | Description |
---|---|---|
CPUPROFILE_FREQUENCY=x | 100 | How many interrupts/second the cpu-profiler samples. |
CPUPROFILE_REALTIME=1 | Not set | If set to any value (including 0 or the empty string), use ITIMER_REAL instead of ITIMER_PROF to gather profiles. In general, ITIMER_REAL is not as accurate as ITIMER_PROF, and also interacts badly with use of alarm(), so prefer ITIMER_PROF unless you have a reason prefer ITIMER_REAL. |
How to analyze the output
Different ways to display output data
% google-pprof /path/to/bin out.prof #Enters "interactive" mode % google-pprof --text /path/to/bin out.prof #Outputs one line per procedure % google-pprof --gv /path/to/bin out.prof #Displays annotated call-graph via 'gv' % google-pprof --gv --focus=Mutex /path/to/bin out.prof #Restricts to code paths including a .*Mutex.* entry % google-pprof --gv --focus=Mutex --ignore=string /path/to/bin out.prof #Code paths including Mutex but not string % google-pprof --list=getdir /path/to/bin out.prof #(Per-line) annotated source listing for getdir() % google-pprof --disasm=getdir /path/to/bin out.prof #(Per-PC) annotated disassembly for getdir() % google-pprof --callgrind /path/to/bin out.prof #Outputs the call information in callgrind format
Analyzing Text Output
Text mode has lines of output that look like this:
14 2.1% 17.2% 58 8.7% std::_Rb_tree::find
Here is how to interpret the columns:
- Number of profiling samples in this function
- Percentage of profiling samples in this function
- Percentage of profiling samples in the functions printed so far
- Number of profiling samples in this function and its callees
- Percentage of profiling samples in this function and its callees
- Function name
Node Information
Each node represents a procedure. The directed edges indicate caller to callee relations. Each node is formatted as follows:
Class Name Method Name local (percentage) of cumulative (percentage)
The last one or two lines contain the timing information. (The profiling is done via a sampling method, where by default we take 100 samples a second. Therefore one unit of time in the output corresponds to about 10 milliseconds of execution time.) The "local" time is the time spent executing the instructions directly contained in the procedure (and in any other procedures that were inlined into the procedure). The "cumulative" time is the sum of the "local" time and the time spent in any callees. If the cumulative time is the same as the local time, it is not printed.
For instance, the timing information for test_main_thread() indicates that 118 units (about 1.18 seconds) were spent executing the code in test_main_thread() and 157 units were spent while executing test_main_thread() and its callees such as snprintf().
The size of the node is proportional to the local count. The percentage displayed in the node corresponds to the count divided by the total run time of the program (that is, the cumulative count for main()).
Edge Information
An edge from one node to another indicates a caller to callee relationship. Each edge is labeled with the time spent by the callee on behalf of the caller. E.g, the edge from test_main_thread() to __sqrt() indicates that of the 200 samples in test_main_thread(), 37 are because of calls to __sqrt().
Meta Information
The top of the display should contain some meta information like:
/tmp/profiler2_unittest Total samples: 202 Focusing on: 202 Dropped nodes with <= 1 abs(samples) Dropped edges with <= 0 samples
This section contains the name of the program, and the total samples collected during the profiling run. If the --focus option is on, the legend also contains the number of samples being shown in the focused display. Furthermore, some unimportant nodes and edges are dropped to reduce clutter. The characteristics of the dropped nodes and edges are also displayed in the legend.
Focus and Ignore
You can ask pprof to generate a display focused on a particular piece of the program. You specify a regular expression. Any portion of the call-graph that is on a path that contains at least one node matching the regular expression is preserved. The rest of the call-graph is dropped on the floor. For example, you can focus on the vsnprintf() libc call in profiler2_unittest as follows:
pprof --gv --focus=vsnprintf /tmp/profiler2_unittest test.prof
For direct inquiries, please refer to the contact information available on our Contact page. Alternatively, you may complete and submit the form provided at the same link. We will respond to your request at our earliest opportunity.
Links to RidgeRun Resources and RidgeRun Artificial Intelligence Solutions can be found in the footer below.