Profiling GNU/Linux applications

From RidgeRun Developer Wiki

Overview

This document explains the necessary steps for profiling applications under GNU/Linux. There are utilities such as valgrind that tells how the programs uses memory and gives detailed info about leaks, nevertheless does not provide sufficient information of CPU usage by the program's functions, who called them and how much time does this functions used when they run. This document tells how to profile with gprof, The GNU Profiler.


In order to produce the gmon.out file, the application should exit normally, on errors the file is not created

Application setup

Install

In order to use gprof, install oprofile at file system applications


Compile Flags

gprof needs that the application generates profiling data to work with. To make this available, on the application package some flags need to be set,

Profiling and debugging flags:

CFLAGS = -pg

gprof generates profiling data better for static linked libraries, have it in mind for your application:

LDFLAGS = -static-libgcc -Wl,-Bstatic -lc

With this, the application profiling build setup is ready, now run your application as you would normally:

 $ ./app

When you run your application you create "gmon.out" file with the information.

 $ ls
gmon.out  app  app.c  app.o  Makefile

gprof Usage

When the application is done, there should be a *.out file (usually gmon.out) that gprof will work with. Run,

gprof $EXECUABLE_FILE gmon.out

By now, you should see in the command prompt all the information regard the application profile. For more details, visit the GNU Profiler website.

Other way to use gprof to analyze the information later,

gprof $EXECUABLE_FILE gmon.out >> app_gprof_data.txt

Call graph

There is a way to show the profile information in a graph (generate a *.png file also), for that you'll need some extra packages, plus a python script.

sudo apt-get install python graphviz xdot

Now, you'll need gprof2dot to convert the profiling output to a dot graph. This script can be download from:

https://code.google.com/p/jrfonseca/wiki/Gprof2Dot

or checkout from the git repo.

git clone https://code.google.com/p/jrfonseca.gprof2dot/ gprof2dot

Enter the script directory:

gprof2dot.py app_gprof_data.txt > app_call_graph.dot

To visualize the data:

xdot app_call_graph.dot

To convert the .dot file to an image:

dot -Tpng app_call_graph.dot -o app_call_graph.png

Example 1 video stabilization

A video stabilization algorithm was ported to the RidgeRun SDK and run on the ARM processor. gprof was used to identify the time consuming routines so they could be moved to the DSP. The following are the results of running gprof before any optimizations were performed.

APP=yuv_tester
  • Normal command execution
$APP -f 1 coastguard_352x288.yuv
  • Capture profile data
gprof $APP -f 1 coastguard_352x288.yuv $APP.gprof.out >> $APP.gprof.txt

with first part of output being:

Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls  ms/call  ms/call  name    
 32.58      1.01     1.01 45467136     0.00     0.00  interpolateBiLin
 32.09      2.00     0.99      300     3.30     3.30  boxblur_vert_C
 11.67      2.36     0.36      300     1.20     1.20  boxblur_hori_C
  8.43      2.62     0.26      300     0.87     4.22  transformYUV
  8.43      2.88     0.26   600990     0.00     0.00  compareSubImg_thr
  1.46      2.92     0.05      300     0.15     0.15  lowPassTransforms
  1.30      2.96     0.04                             memcpy
  0.97      2.99     0.03                             __write_nocancel
  • Create call graph
graph2dot $APP.gprof.txt > $APP.call_graph.dot

with first part of call graph data file being:

		     Call graph (explanation follows)


granularity: each sample hit covers 2 byte(s) for 0.32% of 3.09 seconds

index % time    self  children    called     name
                                                 <spontaneous>
[1]     95.0    0.00    2.93                 filter_video [1]
                0.00    1.62     300/300         motionDetection [2]
                0.26    1.01     300/300         transformYUV [4]
                0.05    0.00     300/300         lowPassTransforms [11]
                0.00    0.00     300/300         transformPrepare [35]
                0.00    0.00     300/300         transformFinish [34]
  • Visualize call graph
xdot $APP.call_graph.dot
  • Create PNG image file of call graph
dot -T$APP.call_graph.dot -o $APP.call_graph.png

Yuv tester gprof graph.png

References

You can find more information about profiling in the following links:

The GNU Profiler: https://www.math.utah.edu/docs/info/gprof_toc.html

Gprof call-graph visualization: http://redmine.epfl.ch/projects/python_cookbook/wiki/Gprof_call-graph_visualization

Gprof2Dot: https://code.google.com/p/jrfonseca/wiki/Gprof2Dot