Debug and Profiling Guide

Read First

This document is intent for users of the RidgeRun SDK on any hardware platform who want to learn about the features and usage of debugging and profiling tools provided on the software, or how to integrate with hardware debugging tools.

This document provides information on the configuration of your SDK for debugging proposes along with examples for use. Debugging covers user space Linux applications, and profiling covers complete system profiling (kernel, drivers, applications).

Introduction to debugging with RidgeRun SDK

Whenever there is a software development task, there might be the need to reproduce certain scenarios that are known to make the process fail or stop working.

In most open source operating systems, the most widely used development tools are those released by the Free Software Foundation, including the GNU Compiler Collection (gcc) and the GNU Debugger (gdb).

The RidgeRun SDK provides those utilities as part of the toolchain, and many get installed in the target hardware as specified by the SDK configuration.

The tools are available only in the executing environment, that is, once the Linux kernel has loaded and the system is up and running. Because of this, they won't be available in other stages of the system, for example in the bootloader (rrload) or during kernel initialization.

For debugging early stage system execution, some hardware tools should be used, like JTAG emulators, that can emulate and / or execute the embedded processor instructions and provide the required software and tools for the developer to determine what is causing the behavior being inspected.

How to debug different components of the SDK

Bootloader debugging

It isn't possible to debug the rrload bootloader with software tools, the only way to do this is with hardware assisted debugging, using a JTAG that can allow to step through the assembler instructions one by one until the error is reproduced.

Some open source bootloaders like u-boot may support gdb remote debugging, in such cases please contact RidgeRun for further details and instructions.

Kernel debugging

It is possible to debug the Linux kernel using software tools, particularly KGDB.

KGDB is a debugger for the Linux kernel. This kernel debugger requires two machines that are inter-connected. The connection may either be an RS-232 interface using a null modem cable, or via the UDP/IP networking protocol (KGDB over Ethernet, KGDBoE), or on ARM processors is also possible to use the Debug Communications Channel (DCC). KGDB is implemented as a patch to Linux kernel. The target machine (the one being debugged) runs the patched kernel and the other (host) machine runs gdb. The GDB remote protocol is used between the two machines.

Applications debugging

Applications in RidgeRun SDK works by using the standard ELF (Executable and Linking Format) file format, which is the the most common format used in Linux for executable files and has several characteristics, many which have a direct impact on how to proceed with debugging.

The ELF format is widely used in most, if not all, of the UNIX variants, and in many of other specialized devices, such as consumer electronic platforms. Such flexibility could only be attained due to ELF file format design and architecture independence.

ELF allows the use of shared objects between binary files. An implication of this is the use of libraries, for example, many of the functions available in the C programming language are actually part of the C Standard Library, of which there are many implementations.

It is common for a desktop computer using Linux, to have the GNU C Library (glibc) as the implementation used to supply all the functions of the C Standard Library. Because the glibc library was designed to supply all the necessary functions at the expense of compatibility, it is bloated and sometimes slower than expected.

However, in embedded devices it is a better choice to use the uClibc implementation, which was designed from the ground up to supply the C Standard Library to micro controllers and other devices that don't have a Memory Management Unit, and also that usually provides a very restricted environment regarding available memory in both storage and system memory. Other restricted implementations for embedded devices include dietlibc and newlib.

Most RidgeRun SDK makes use of the uClibc implementation, providing a fast, small and efficient alternative to glibc. However some custom SDK's may use glibc, contact RidgeRun if you have doubts about the specific C library used in your SDK.

Other libraries that might be different to the ones used in desktop systems are the POSIX thread implementation, which is usually a part of the C Standard Library, and as such, it may be different when using uClibc instead of glibc.

There are many factors involved in the debugging process. If an application fails it could be due to several different factors.

For example, if an application takes a lot of time accomplishing a task, there may be a need to use a profiler. The SDK includes Oprofile for this purpose. A profiler allows you to see how much of the execution time was used in which functions or shared objects and libraries.

Another situation might be the introduction of new libraries to a project, for example, a graphics library, which might be the cause of an error, then the best choice would be to use ltrace to check if the use of the provided functionality is the one causing the error.

If there is an strange behavior, like segmentation faults in certain operations, the best choice would be to use strace to see which were the last system calls before the error to check if there is a system situation external to the application that could be the cause of the error.

In the end, if there are many factors and the error occurs in a specific environment, the best alternative would be to use gdb to step through the source code statement by statement until the error is reproduced again.

Hardware assisted debugging with JTAG tools

It is possible to perform symbolic debugging assisted by hardware on ARM devices using JTAG tools. Many third party tools offers Linux-aware debugging solutions that may be adapted on the SDK.

This section details how to perform debugging with common JTAG solutions, and their scope and limitations under different debugging scenarios. Contact RidgeRun if you are interested on more information or support for hardware assisted debugging.

Debugging with Lauterbach T32

RidgeRun has extensive experience using T32 (Lauterbach) JTAG debuggers to instrument and debug bootloader, kernel, applications or modules.

T32 has great support for RTOSs including Linux, and is capable of debug, instrument and profile the complete system. Contact RidgeRun to get training or applications notes and sample scripts for your target hardware usage with T32 (RidgeRun T32 scripts for the target hardware are a good complement to T32 RTOS debugging documentation). Embedded Trace Macrocell Support ARM targets that support ETM technology may be used along with debugging tools like T32 for performance analysis, or application tracing. Some SystemOnChip include Embedded Trace Buffers integrated to the ETM interface, given a lower cost option for ETM debugging support.

When using T32 and a proper ETM license, the ETB functionality from T32 can be used on platforms that support it to debug and trace Linux applications.

For more information go to Getting Started Guide for Lauterbach.

Debugging with Texas Instruments Code Composer Studio (CCS)

CCS is a powerful IDE for DSP development, but also includes support for the ARM cores of the TI's SoCs. The DSP debugging and tracing functions are out of the scope of this document.

CCS supports the following debug features: Advance ARM control: MMU table listing, Exception Vector traps, SW/HW breakpoints Symbolic Assembly or Source Code Debugging

How to perform symbolic debugging with CCS

Symbolic debugging with CCS requires the following steps:

Compile your code with debugging support enabled and code optimizations disabled (see section Requirements for debugging support. For kernel debugging see Kernel debugging techniques).
The SDK generates executable files in ELF format; CCS supports loading code and symbols from ELF formats. The following table details the location of typical images locations:

Component	Location
bootloader	bootloader/<bootloader version>/src/
Linux kernel	kernel kernel/$(KERNEL)/vmlinux
Linux applications	The application executable itself

Connect CCS to the target hardware (consult with RidgeRun if you need specific instructions to setup your hardware with CCS). CCS can only connect to the target and reset it, it doesn't support attaching to the board without producing a reset.
Proceed to load the symbols following the menus: File -> Load Symbols -> Load Symbols Only... CCS will only look for .out and .sym files, but you may change the filter settings to select the ELF image file that you are using. If the ELF file was compiled properly with debug symbols enabled.
CCS will report that can't find source files and ask for a location to find them. This happens because the debugging files include path locations on Unix format (which are invalid on Windows), you will need to point CCS to the right file location for the code that is debugging.

Debugging with low-cost OpenOCD based JTAG solutions

The Open On-Chip Debugger (openocd) aims to provide debugging, in-system programming and boundary-scan testing for embedded target devices. The targets are interfaced using JTAG (IEEE 1149.1) compliant hardware, but this may be extended to other connection types in the future.

Openocd currently supports Wiggler (clones), FTDI FT2232 based JTAG interfaces, the Amontec JTAG Accelerator, and the Gateworks GW1602. It allows ARM7 (ARM7TDMI and ARM720t), ARM9 (ARM920t, ARM922t, ARM926ej-s, ARM966e-s), XScale (PXA25x, IXP42x) and Cortex-M3 (Luminary Stellaris LM3 and ST STM32) based cores to be debugged.

For more information visit: The OpenOCD User’s Guide

OpenOCD is available on Ubuntu distributions from 7.10, but can compiled for other host systems.

OpenOCD systems doesn't provide powerful debugging solutions as others JTAG tools, but provide a effective low-cost JTAG solution for big developer teams, or manufacturing support. RidgeRun can provide applications notes and scripts for your particular hardware usage with OpenOCD, contact RidgeRun for specific information on your board.

Software debugging with GDB

GDB, the GNU Project debugger, allows you to see what is going on `inside' another program while it executes -- or what another program was doing at the moment it crashed.

GDB supports four main features to help you catch bugs:

Start your program, specifying anything that might affect its behavior.
Make your program stop on specified conditions.
Examine what has happened, when your program has stopped.
Change things in your program, so you can experiment with correcting the effects of one bug and go on to learn about another.

GDB also permits remote debugging by running a program (gdbserver) on the remote target, making it suitable for debugging embedded systems.

For more information on GDB visit:

The RidgeRun SDK includes a version of GDB that run on the host machine as well a version of gdbserver for the target hardware.

Requirements for debugging support

To obtain the most precise results while debugging, it is necessary to set a few flags to the compiler to tell it to annotate the binary file with information about the corresponding source file, and also to turn off all the unnecessary optimization flags to the compiler. This allows the generated binary files to be free from changes that may be induced from the compiler.

Set the SDK toolchain tools in the PATH

Usually embedded developers may have several different toolchains installed on their machines, therefore the SDK doesn't relay on finding the tools on the system PATH, as it may mistakenly use the wrong tools , but instead sets the PATH during the build process according to the information stored on the SDK configuration system (which include the path to the toolchain tools).

When performing debugging operations it may be useful to have the toolchain tools on the system PATH, so the SDK provides a method to export the right tools quickly on your console session:

$ cd <RidgeRunSDK Devdir>
$ `make env`

The env target of the makefile will print the shell exports to set the variables DEVDIR and PATH properly to work with the SDK and the enclosure of reversed quotes will execute those commands on the shell. The DEVDIR variable is useful to be able to perform make commands inside subdirectories of the SDK.

Produce debugging information (GCC flag -g)

The flag required to produce debugging information is “-g”. This will include references to the source code such as line numbers and name of the functions that will allow to add breakpoints so that the developer could step to and from them when trying to pinpoint errors.

To add this switch flag run “make config” and go to:

RidgeRun SDK Configuration
Toolchain configurations
 Toolchain optimize flags

Press ENTER to select the option, then a text field will appear, then add the string “-g”.

Size Optimization (GCC flag -Os)

One optimization flag that should be removed altogether is the “optimize for size” flag. It is enabled by default because it produces smaller binary files. That are desirable in embedded applications due to the restrained storage space. At the time of debugging, optimized binary file code will not match the source code, making debugging more difficult.

To remove this switch flag run “make config” and go to: RidgeRun SDK Configuration

Toolchain configurations
Toolchain optimize flags

A text field will appear where the option is enabled by default, move to where this option is, and delete the text.

Keep the frame pointer (remove GCC flag -fomit-frame-pointer)

Keep the frame pointer in a register even for functions that don’t need one. This is required to make proper debugging possible.

To add this switch flag run “make config” and go to:

RidgeRun SDK Configuration
Toolchain configurations
 Toolchain optimize flags

Press ENTER to select the option, then a text field will appear, then be sure the string “-fomit-frame-pointer” is not present on the field.

Don't strip symbols from binaries (SDK Option)

This option allows the binaries (executables and shared object files) to be stripped from many symbols that aren't needed for execution, including debugging information. By default this option is disabled, and should be used mostly in the latest stages of development and quality assurance to guarantee a small footprint of the file system.

To check the status of this option:

RidgeRun SDK Configuration
File System Configuration  --->
 [ ] Strip the target file system binaries (may render debug useless)

(Check that the option is not marked)

Using gdb and gdbserver for cross debugging

Gdbserver is a program that allows you to run GDB on a different machine than the one which is running the program being debugged.

In this particular situation, we will run gdbserver in the target hardware, then we will connect gdb running in the host to that gdbserver instance and start the debugging process.

First you need gdbserver to be installed on your target. On the configuration screen (make config) go to File System Configuration and select the option Instal GDB server on targer file system.

Gdbserver requires three arguments to be set, the host IP address, a listening port, and the program to be debugged with its own arguments. The command line format is:

gdbserver host_address:listening_port command [arguments]

We will open the target listening port number 2345, and the program to be executed is named “sleeptest”, which is in the same directory and requires no arguments, the example command line will be:

gdbserver :2345 sleeptest
Process ./sleeptest created; pid = 251 
Listening on port 2345

If you are trying to debug a process that is already running, you can attach using the Process ID (PID)

ps
gdbserver :2345 --attach $PROCESS_ID_TO_DEBUG

where $PROCESS_ID_TO_DEBUG is the number you obtained by running ps.

Example output is:

gdbserver :2345 --attach 24601
 Attached; pid = 24601
Listening on port 2345

By itself, the gdbserver isn't very functional. The debugging process followed by gdbserver is guided by gdb running in the host computer.

Setting up the host gdb.

We need to execute a gdb program that understands how to debug ARM code, because of this, the use of an stock gdb (As provided by any Linux distribution package system) isn't possible.

The SDK provides an appropriate gdb in the toolchain, called arm-linux-gnueabi-gdb in the case of ARM platforms. For documentation purposes we will refer to this command with just “gdb”.

After starting gdb, it will provide you with a console with the prefix command prompt (gdb). This command line interface is where the debugging will take place.

First, we must tell gdb where to look for the shared object and libraries, this is necessary because the target system will be using those libraries when executing the program being debugged.

(gdb) set solib-absolute-prefix <path to devdir>/fs/fs

After setting that path, we need to point the host gdb to where the binary file being debugged is located with its full path, so that gdb can load the symbols and other annotations appended at compile time, this is possible with the line:

(gdb) file <path to file>

As a tip, gdb allows auto completion of pathnames just as in the shell command line interface, start writing the path and press TAB to automatically complete it if there is a match available. After this, we need to connect to the target hardware with the following command:

(gdb) target remote <ip address:port number>

Example run:

$ cd <RidgeRunSDK>
$ `make env`
$ arm-linux-gdb
GNU gdb 6.8.50.20080318
Copyright (C) 2008 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "--host=x86_64-pc-linux-gnu --target=arm-linux-uclibcgnueabi".
(gdb) set solib-absolute-prefix ./fs/fs
(gdb) file ./fs/fs/examples/simple_threads 
Reading symbols from ./fs/fs/examples/simple_threads...done.
(gdb) target remote 192.168.200.245:2345
Remote debugging using 192.168.200.245:2345
[New Thread 267]
0x40003c58 in _start () from ./lib/ld-uClibc.so.0

After connecting to the hardware you will be able to perform all the standard debugging commands of gdb, including multithreading debugging.

gdb command line usage documentation

Integration of gdb with ddd graphical debugger

If you would like to use a graphical front end with gdb, there are several compatible front ends available. One popular simple front end is the ddd utility. Below is how you would invoke the gdb, using the ddd graphical front end:

$ ddd --debugger arm-linux-gdb

Generating and reading core dump files

When a process or application crashes, a corresponding signal is generated and each one of these has a current disposition which determines how the process behaves when it is delivered the signal. Thus, there are signals which default action is to terminate the process, generate a dump core, stop a process or continue the process (if stopped).

As mentioned, the default action of certain Linux kernel signals is to cause a process to terminate and produce a core dump file, a disk file containing an image of the process's memory at the time of termination.

The generated dump file can be read using gdb on host side, among other debugging tasks, the user can check on the state of the processor's register in the moment of the process' crash; a back trace of the functions executed prior to the crash is also available. All this information becomes very handy when executing debugging tasks on applications / processes that do not provide debugging information while in runtime or that crash with no further information.

The procedure to enable core dumping on Linux kernel is pretty simple, and should be limited to setting the maximum size of the dump file.

By default, the core file is called core and is created in the current directory. There are several reasons for a core file not to be generated or to be empty:

The process does not have permission to write the core file. Writing the core file will fail if the directory in which it is to be created is non-writable.
The directory in which the core dump file is to be created does not exist.
RLIMIT_CORE or RLIMIT_FSIZE resources limits for a process are set to zero (can be verified through the getrlimit() function).
The process is executing a set-user-ID (set-group-ID) program that is owned by a user (group) other than the real user (group) ID of the process (i.e. a process that belongs to root).

Generating the core dump file (TARGET side)

Once the target has been successfully booted, proceed set the dump file size to a value different from zero (can also be set to unlimited, however, this is not recommended).

 /#ulimit -c 10000
/#ulimit -a
time(seconds)			unlimited
file(blocks)			unlimited
data(kb)			unlimited
stack(kb)			8192
coredump(blocks)		10000
memory(kb)			unlimited
locked memory(kb)		unlimited
process 			944
nofiles				1024
vmemory(kb)			unlimited
locks				unlimited

Proceed to create a temporary directory to store the dump files with open privileges.

# mkdir -m 777 /tmp/dumps

Set the core file name. By default, a core dump file is name core, but the /proc/sys/kernel/core_pattern file can be set to define a template that is used to name core dump files. The basic specifiers for the dump name:

%% A single % character.
%p PID of dumped process.
%u Real UID of dumped process.
%g Real GID of dumped process.
%s Number of signal causing dump
%t Time of dump (EPOC time)
%h Hostname
%e Executable filename

The following line has the following objectives:

set the /tmp/dumps as the destination path for the dump files.
set the dump file's name as <name_of_the_application>.core

/ # echo "/tmp/dumps/%e.core" > /proc/sys/kernel/core_pattern

Run the application. The crashing application/process will generate an output similar to the following (using the “Hello World” example included in RidgeRun's user applications examples and in case a Segmentation Fault is generated).

# ./hello Segmentation fault (core dumped)

Before running gdb on host side, both the application's binary file and the generated dump file should be placed on the same directory and should be accessible from the host (a suitable directory would be target's /opt directory if using NFS file system).

Running gdb on host side

Once the dump file has been generated, gdb can be executed on the host side.

Compile the application with the -ggdb flag for debugging purposes.
Locate the directory where both the application's binary and dump file are located and proceed to run gdb (if using NFS filesystem and with both files on /opt): <DEVDIR>/fs/fs/opt$ arm-linux-gnueabi-gdb

<DEVDIR>/fs/fs/opt$ arm-linux-gnueabi-gdb
GNU gdb 6.8.50.20080318 
Copyright (C) 2008 Free Software Foundation, Inc. 
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> 
This is free software: you are free to change and redistribute it. 
There is NO WARRANTY, to the extent permitted by law.  Type "show copying" 
and "show warranty" for details. 
This GDB was configured as "--host=i686-linux –target=arm-linux-gnueabi". 
(gdb)

Load the application as the program to be debugged

(gdb) file hello

Reading symbols from <DEVDIR>/fs/fs/opt/hello...done.

Load dump's memory and processor's registers data

(gdb) core hello.core 
Core was generated by `./hello'. 
Program terminated with signal 11, Segmentation fault. 
[New process 366] 
#0  0x000083a8 in ?? ()

Display the backtrace of the stack (gdb) info stack

(gdb) info stack 
#0  main (argc=1, argv=0xbed38eb4, envp=0xbed38ebc) at hello.c:41

List the processor's registers and the content of these.

(gdb) info all-registers 
r0             0x1      1 
r1             0xbed38eb4       3201535668 
r2             0xbed38ebc       3201535676 
r3             0x1072c  67372 
r4             0x0      0 
r5             0x1      1 
r6             0x0      0 
r7             0x0      0 
r8             0x0      0 
r9             0x0      0 
r10            0x40024000       1073889280 
r11            0x0      0 
r12            0x0      0 
sp             0xbed38d50       0xbed38d50 
lr             0x400399cc       1073977804 
pc             0x83a8   0x83a8 <main+24> 
f0             0        (raw 0x000000000000000000000000) 
f1             0        (raw 0x000000000000000000000000) 
f2             0        (raw 0x000000000000000000000000) 
f3             0        (raw 0x000000000000000000000000) 
f4             0        (raw 0x000000000000000000000000) 
f5             0        (raw 0x000000000000000000000000) 
f6             0        (raw 0x000000000000000000000000) 
f7             0        (raw 0x000000000000000000000000) 
fps            0x0      0 
cpsr           0x60000010       1610612752

Eclipse Integrated Development Environment (IDE)

If you would like to use the eclipse integrated development environment for application development, it is easiest to start with an existing, mostly empty, SDK application directory that builds properly.

For debugging description below, it is assumed you are using root NFS mount so when you rebuild the code it is available on the target. The steps are similar for other configurations, but you need to get the update d executable being debugged on the target hardware each time your rebuild the application.

Create SDK application directory

For the description below, DEVDIR is set to your development directory path and the name of the application being created is ticker. Run the following commands to get an SDK application directory populated.

cd $DEVDIR/myapps
mkdir ticker
cp hello/Config hello/Makefile ticker
cp hello/hello.c ticker/ticker.c

Edit the files in the ticker directory to rename from hello to ticker. In the Makefile, enter the following lines after the include statement, if they are not already there:

CFLAGS += $(EXTRA_CFLAGS)

Verify the application builds correctly by issuing the make command.

cd $DEVDIR/myapps/ticker
make

Create eclipse project

Launch eclipse and specify the DEVDIR directory as the eclipse workspace.

Create the ticker project as follows (pressing Next and changing tabs as needed):

File -> New -> Project ->
 C -> Standard Make C Project
 <Next>
 Project name: ticker
 Location: <ticker directory>
 <Next>
 Make Builder
 Build (Incremental Build): build install EXTRA_CFLAGS+=-g \
   EXTRA_CFLAGS+=-O0 EXTRA_CFLAGS+=-fno-omit-frame-pointer
 <Finish>

Note: the second EXTRA_CFLAGS above is capital letter 'O' followed by a zero '0'.

Verify you can build the application from within Eclipse by issuing

Project -> Build All

Create debug configuration

To get remote gdb to work correctly, you need to create a debug configuration using the following steps:

Create a gdbinit file on the base of your SDK with the following contents:

set solib-absolute-prefix <path to devdir>/fs/fs

This instructs gdb to search any library on the target's root file system at the host.

Then configure Eclipse as follows:

Run -> Debug...
  Create,Manage, and run configurations
    C/C++ Local Application <double click>
      Name: ticker remote debug
      Project: ticker <Browse...>
      C/C++ Application: ticker <Search Project...>
    Debugger
      Debugger: gdbserver Debugger
      Debugger Options
        Main
          GDB debugger: /opt/RidgeRun/arm-eabi-uclibc/bin/arm-linux-gdb
          GDB Command set: Standard (Linux)
		  GDB command file: <Select the previously generated gdbinit file>
		  Verbose console mode: do not check unless you want to debug the gdb connection, 
                  this option slows down the responsiveness of the debugger 
        Connection
          TCP
          IP address: <target IP address>
          Port number: 2345

The host path to the GDB dependings on the toolchain you have installed. You can check in:

/opt/RidgeRun/arm-eabi-uclibc/bin/arm-linux-gdb
/opt/codesourcery/arm-2009q1/bin/arm-none-linux-gnueabi-gdb

Start gdbserver

As describe previously, start gdbserver on the target hardware:

cd /examples # or where every your application is installed 
gdbserver host_address:2345 ./ticker

Connect to gdbserver

All that is left is to press the Debug button to start debugging.

If everything worked correctly, you will see the ticker.c file displayed with a blue arrow pointing to main() indicating that program counter hit a breakpoint at main(). You can now debug.

Troubleshooting

If you see the error “Target selection failed” they means the networking parameters on the gdbserver line or in the Eclipse debug configuration are incorrect. Try to establish a telnet session to make sure the desktop and target can exchange TCP packets.

If something isn't working, inspect the gdb console window (enable the verbose console mode is useful to get more information on the problem). After you get remote debugging working correctly, you can disable “Verbose console mode” in the debug configuration to turn off gdb console output.

If the filesystem is build without stripping the debugging information (see previous sections) the debugger should be capable of loading the information of the functions on the system libraries (like libc or libpthread), but to be able to symbolic debug inside such libraries the source code of the toolchain must be installed. The SDK installer provides an option to install the toolchain source code.

Also if can access telnet or the connection gets refuse, try running "inetd" on target before using telnet.

inetd

If root password is required when using telnet of ssh into the target, try setting a root password into the target with "passwd" on target

/ # passwd
Changing password for root
New password:

After using passwd the password setted when using passwd will be the root password

On any case, do not hesitate to contact RidgeRun for support.

Application tracing

Is often useful to trace what an application is doing, but on operating systems with dynamic library loaders and plenty of system calls, getting a good trace may be difficult beyond the code of the application itself.

RidgeRun SDK includes two open source tools that help to perform system and library tracing for applications: strace and ltrace.

System call tracing with strace

Strace is a debugging utility in Linux to monitor the system calls used by a program and all the signals it receives. A system call is the mechanism used by an application program to request service from the operating system.

This utility can be very helpful when debugging applications that deals with many open files (In Unix terminology, a device, a socket, a pipe and a FIFO queue file are all addressed as open files by the kernel). As a tip, the lsof utility prints the list of open files.

For example, if an application stops working suddenly, it is possible to execute it again with strace to check if it is stalled on opening a file or expecting user input.

The most common system calls are open, read, write, close, wait, exec, fork, exit, and kill. Besides those, Linux has hundreds of system calls.

The SDK includes strace under the applications and can be enabled by running “make config”, under:

RidgeRun SDK Configuration
 File System Configuration
  SDK Applications  --->
   [*] strace-4.5.15

To use this, try the following command line example:

# strace <application> [<application parameters>]

Library tracing with ltrace

ltrace is a debugging utility in Linux to monitor the library calls used by a program and all the signals it receives.

Just like strace, ltrace is used to run an application to check calls, but instead of system calls, ltrace tracks all the library calls.

For example, in the header file string.h there are definitions for many of the most common used library calls in Linux programming, among them memcpy(), memset(), strcmp(), and strcpy(), also the stdio.h header provides many other common library calls, particularly printf(), fopen() and fclose(). Calls to this functions, the parameters being used and the function return values may be monitored with ltrace.

The SDK includes ltrace under the applications and can be enabled by running “make config”, under:

RidgeRun SDK Configuration
 File System Configuration
  SDK Applications  --->
   [*] ltrace-0.5

To use this, try the following command line example:

# ltrace ./application

Linux debugging techniques

Debugging Linux applications or kernel require more than the right tools for the right job, it also needs basic understanding from the programmer on concepts of the subsystem under test. Linux may compromise of several different subsystems and technologies that may induce complex behaviors during debugging.

The best way to approach debugging when observing odd behaviors is to review reference documentation on the topics related. For example if an application relays heavily on threads, it's worth to understand how different function calls behave under multi-threaded applications (signal delivery, priority handling, etc).

Several references can be found that explains in-depth Linux internals, and some recommendations are:

Understanding the Linux Kernel, Bovet & Cesati, O'Reilly
Linux Kernel Development, Robert Love, Novell Press
Linux Devices Drivers, Rubini & Corbet, O'Reilly
Embedded Linux System Design and Development, P. Raghavan & Co., Auerbach Publications

RidgeRun provides support services with experts that may help you to identify quickly how to best approach the debugging of your product. RidgeRun extensive knowledge of Linux internals

This section provides common procedures that may result useful when debugging problems.

Kernel debugging techniques

Linux kernel provides extensive options to debug and instrument different system functionality. To enable kernel debugging use the configuration menu of the SDK:

$ make config 
  Kernel Configuration --> 
    Kernel Hacking --> 
      [ x ] Kernel Debugging

Two of the more relevant options here are:

Verbose kernel error messages: provided detailed information and kernel stack trace on kernel errors.
Compile kernel with debug info: useful when performing symbolic debugging, as allow the debuggers to associate the assembly code with the source code.

Rebuild and reload the kernel is enough for the changes to take effect.

How to interpret a kernel crash log

When the kernel produces an error and the option “verbose kernel error messages is enabled” (review previous section), it will produce a dump of the registers as well of the stack trace that lead to the crash.

The best way to interpret this data is by translating the name of the functions from the address involved on the crash log. There are two possible ways to perform the translation:

The file $DEVDIR/kernel/linux-<version>/System.map provides a readable table to translate from kernel address ranges to kernel symbols.
If you desire to trace on the assembly code the instructions that lead to certain problem, you may use the tool objdump to disassembly the kernel image and look for the problem. Follow this steps:

$ `make env` # This put the toolchain on your PATH environment variable
$ arm-linux-objdump -d -S $DEVDIR/kernel/linux-2.6.22.2/vmlinux

The last command assumes that your toolchain prefix is arm-linux- (which may change depending on your platform), and that your kernel is 2.6.22.2. The “-d” option will disassemble the binary image, and the “-S” will interleave the source code if the kernel is build with debugging information (see previous section).

How to interpret memory addresses

Refer to the following table to get a better comprehension of the memory address that you see on the logs and how to properly debug them:

Address	Used by	How to translate it
0x0 – 0xBEFF:FFFF	Current user space application	Identify the running process PID and review the contents of /proc/<PID>/maps while the process is running.
0xBF00:0000 – 0xBFFF:FFFF	Current user space application	Review the contents of /proc/modules
0xC000:0000 to 0xC000:000 + <size of your RAM>	Kernel virtual logical addresses	Look at kernel/<kernel version>/System.map
0xC000:000 + <size of your RAM> to 0xFFFF:FFFF	Kernel virtual addresses	May require a complex procedure, as is non-contiguous physical memory builded with vmalloc.

For further information review the Chapter 8: “Process Address Space” of Understanding the Linux Kernel, 2nd Edition, or contact RidgeRun support.

How to debug “Segmentation Fault” problems

A common problem developers encounter when developing Linux applications is that an application dies with a “Segmentation Fault” message. This error is produced when the application access a illegal memory area, thus is, a memory area that doesn't belong to the process.

This kind of error is typically produced by wrong code, and could be difficult to locate on complex applications. This section explains techniques to locate the source of the problem.

Enabling verbose user crash messages

The first step to debug the problem is to enable verbose user crash messages on the kernel level. This is done by following this procedure:

$ make config
	Kernel Configuration -->
		Kernel Hacking -->
			[ x ] Verbose user fault messages

This will enable verbose error message generation from the kernel, but it requires one more step; the option “user_debug=N” should be passed on the command line to the kernel from the bootloader, where N is the sum of:

1 - undefined instruction events
2 - system calls
4 - invalid data aborts
8 - SIGSEGV faults
16 - SIGBUS faults

So for example to see detailed information about all the previous errors the parameter “user_debug=31” should be used.

How to interpret application crash log

Refer to the this section on how to interpret memory addresses. User space problems typically lie on memory addresses between 0x0 – 0xBEFF:FFFF. But user process have many different libraries and files mapped on that memory range.

To properly identify the location of a memory address from a crash log, the contents of /proc/<PID>/maps file are useful. This file only exist while the application is running, and <PID> is the process ID of it. Usually adding an early pause() call to the application and running it on the background (with the '&' suffix) can give the programmer the time to dump the mapping contents to use them for further debug.

Profiling software on the SDK

The SDK provides the profiling tool 'OProfile' and gprof, OProfile allows the analysis of your program's behavior as it runs. It lets you determine which parts of a program to optimize for speed.

This is done by measuring the frequency and duration of function calls. The output is a statistical summary of the events observed. 'OProfile' uses hardware interrupts to probe the target's program counter register at regular intervals. Sampling profiles may be less accurate and specific than a detailed trace, but allow the target program to run at near full speed. As the summation in a profile is done related to the source code positions where the events occur, the size of measurement data will be linear to the code size of your program.

There are different ways 'OProfile' can help you. Generally, things that could be done intentionally to make a program run longer can also occur unintentionally. One commonly accepted kind of slow-down is a "hot spot", which is a tight inner loop where the program counter spends much of its time. For example, if one often finds at the bottom of the call stack a linear search algorithm instead of binary search, this would be a true hot spot slow-down. However, if another function is called in the search loop, such as string compare, that function would be found at the bottom of the stack, and the call to it in the loop would be at the next level up. In this case, the loop would not be a hot spot, but it would still be a slow-down. In all but the smallest programs, hot spot slugs are rare, but slugs are quite common.

Another way to slow down software is to use data structures that are too general for the problem at hand. For example, if a collection of objects remains small, a simple array with linear search could be much faster than something like a "dictionary" class, complete with hash coding. With this kind of slow-down, the program counter is most often found in system memory allocation and freeing routines as the collections are being constructed and destructed. Another common motive is that a powerful function is written to collect a set of useful information (from a database, for example). Then that function is called multiple times, rather than taking the trouble to save the results from a prior call. A possible explanation for this could be that it is beyond a programmer's comprehension that a function call might take a million times as long to execute as an adjacent assignment statement. A contributing factor could be "information hiding", in which external users of a module can be ignorant of what goes on inside it.

There are certain misconceptions in performance analysis. One is that timing is important. Knowing how much time is spent in functions is good for reporting improvements, but it provides only vague help in finding problems. The information that matters is the fraction of time that individual statements reside on the call stack. Another misconception is that statistical precision matters. Typical slow-downs sit on the call stack between 5 and 95 percent of the time. The larger they are, the fewer samples are needed to find them.

Configuring OProfile

To enable OProfile select it under the following options.

RidgeRun SDK Configuration
 File System Configuration
  Select target's file system software  --->  
   [*] oprofile 0.9.6

There are a couple of storage considerations regarding the OProfile usage, first, that the file system may be larger than the available flash storage and that OProfile requires fast write speeds for its logs.

Storage considerations

OProfile automatically enables several options in the kernel configuration, BusyBox and other applications and libraries, that together will add up to about 10MB of additional required space on the target file system, because of this it would be necessary to enable the NFS root file system on smaller storage systems.

If your hardware has plenty of storage you may not need NFS root file system.

To enable this file system option, it must be selected under:

RidgeRun SDK Configuration
 File System Configuration
  File system image target (...) --->
   (X) NFS root file system

Sample collection considerations

It is also suggested that the log files (stored in /var/lib/oprofile) should be stored on a storage medium that allows fast writes, so the time based sampling will not be biased by such writes. This isn't the case for NFS. As a solution, it is possible to use the board RAM with the tmpfs file system, for this, the /var/lib/oprofile directory needs to be mounted under that file system type.

An example of how to mount this directory as a tmpfs file system is the following command lines:

# opcontrol --shutdown # verify oprofile is not running
# mkdir -p /var/lib/oprofile
# mount -t tmpfs none /var/lib/oprofile

Linux kernel considerations

It may be necessary to load the Linux kernel image in the file system so that OProfile could load its symbols.

The RidgeRun SDK generates all the images needed by the bootloader (Kernel, the file system and the bootloader itself) and puts them in the $DEVDIR/images/ directory.

However, this images are not what OProfile expects. The kernel image is located after successfully compiling the kernel in $DEVDIR/kernel/$KERNEL_VERSION in the vmlinux file.

This file can be placed under any directory in the $DEVDIR/fs/fs/ exported directory. If a NFS file system is being used while using the RidgeRun SDK in a development board, it will appear in the target hardware at the moment it has been copied. If CRAMFS or JFFS2 file systems are used, the Kernel image file must be copied in the same directory and then execute “make fs” again to generate the proper image with the Kernel image included.

Kernel module considerations

OProfile profiles kernel modules by default. However, there are a couple of problems you may have when trying to get results. First, you may have booted via an initrd; this means that the actual path for the module binaries cannot be determined automatically. To get around this, you can use the --image-path / -p [paths] (comma-separated list of additional paths to search for binaries) option to the profiling tools to specify where to look for the kernel modules.

In kernel versions 2.6 upwards, the information on where kernel module binaries are located has been removed. This means OProfile needs guiding with the -p option to find your modules. Normally, you can just use your standard module top-level directory for this. Note that due to this problem, OProfile cannot check that the modification times match; it is your responsibility to make sure you do not modify a binary after a profile has been created.

If you have run insmod or modprobe to insert a module in a particular directory, it is important that you specify this directory with the -p option first, so that it over-rides an older module binary that might exist in other directories you've specified with -p. It is up to you to make sure that these values are correct: 2.6 kernels simply do not provide enough information for OProfile.

Capture profiling data

You need to be the root user to use OProfile. To setup OProfile, you can either profile your application with, or without the Linux kernel. If you don't want to profile the Linux kernel, you'll need to do this:

# opcontrol --no-vmlinux

On the other case, let's say you copied the kernel image to the hardware's /kernel directory, then you'll need to do this: # opcontrol --vmlinux=/kernel/vmlinux

You are now ready to start the OProfile daemon process:

# opcontrol --start-daemon

The above step is useful to keep the OProfile startup process away from your profiling data, now, to start the profiling data collection, you need to do this:

# opcontrol --start

Though you could have skipped the start-daemon step. It is now time to start running your application. After running your application through its paces, you'll want to see the collected profile data. There are a couple of ways to do this. you can either shutdown profiling altogether, or you can just tell OProfile to dump the collected data, but it will continue to collect more data. These data are written to the file /var/lib/oprofile/samples/oprofiled.log To just dump the collected data:

# opcontrol --dump

To clear the profile data at any time, you can just do a reset with:

# opcontrol --reset

To shutdown OProfile:

# opcontrol --shutdown

Interpret profiling data

Let's see a usage example of OProfile running it on a RidgeRun SDK:

BusyBox v1.2.1 (2008.01.04-20:55+0000) Built-in shell (ash)
Enter 'help' for a list of built-in commands.
# opcontrol –no-vmlinux

Let's start profiling and immediately run a time consuming application in the background:

# opcontrol --start; yes > /dev/null &
Using 2.6+ OProfile kernel interface.
Using log file /var/lib/oprofile/samples/oprofiled.log
Daemon started.
Profiler running.
# ps aux
  PID  Uid     VmSize Stat Command
    1 root        364 S   init       
    .    .          . .    . 
    .    .          . .    .
    .    .          . .    .
  193 root            SW< [IRQ 19]
  225 root        344 S  /sbin/inetd 
  230 root        412 S  /bin/sh /usr/bin/keep_smapp_alive 
  231 root        412 S   -sh 
  581 root        312 S  /usr/bin/oprofiled –session-dir=/var/lib/oprofile
  583 root        292 R   yes 
  609 root        344 R   ps aux 
# kill -9 583
# 
[1] + Killed                     yes 1>/dev/null
# opcontrol --shutdown
Stopping profiling.
Killing daemon.

'yes' is a BusyBox tool, so it will appear as such in the report. This is the summarized one:

# opreport
CPU: CPU with timer interrupt, speed MHz (estimated)
Profiling through timer interrupt
TIMER:0|
samples| %|
------------------
4753 no-vmlinux
4670 libuClibc-0.9.29.so
315 busybox
260 ld-uClibc-0.9.29.so
18 oprofiled

An this is the detailed report:

# opreport -l
warning: /no-vmlinux could not be found.
CPU: CPU with timer interrupt, speed  MHz (estimated)
Profiling through timer interrupt
samples  %        app name                 symbol name
4753     ..  no-vmlinux               (no symbols)
2499     ..  libuClibc-0.9.29.so      _ppfs_parsespec
717       ..  libuClibc-0.9.29.so      vfprintf
315       ..  busybox                  (no symbols)
212       ..  libuClibc-0.9.29.so      _ppfs_init
193       ..  libuClibc-0.9.29.so      _memcpy
190       ..  libuClibc-0.9.29.so      memset
151       ..  libuClibc-0.9.29.so      _ppfs_setargs
150       ..  libuClibc-0.9.29.so      _charpad
137       ..  libuClibc-0.9.29.so      strnlen
79        ..  libuClibc-0.9.29.so      __stdio_fwrite
76        ..  ld-uClibc-0.9.29.so      _dl_find_hash_mod
72        ..  libuClibc-0.9.29.so      fwrite_unlocked
59        ..  ld-uClibc-0.9.29.so      _dl_load_elf_shared_library
56        ..  libuClibc-0.9.29.so      memcpy
54        ..  libuClibc-0.9.29.so      fputs_unlocked
46        ..  ld-uClibc-0.9.29.so      _start
32        ..  libuClibc-0.9.29.so      strlen
25        ..  libuClibc-0.9.29.so      _ppfs_prepargs
20        ..  libuClibc-0.9.29.so      Laligned
17        ..  libuClibc-0.9.29.so      Llastword
15        ..  ld-uClibc-0.9.29.so      _dl_add_elf_hash_table
13        ..  ld-uClibc-0.9.29.so      _dl_load_shared_library
12        ..  libuClibc-0.9.29.so      strchr
9         ..  ld-uClibc-0.9.29.so      _dl_memalign
8         ..  ld-uClibc-0.9.29.so      _dl_allocate_tls_storage
8         ..  ld-uClibc-0.9.29.so      _dl_get_ready_to_run
6         ..  oprofiled                odb_update_node
5         ..  libuClibc-0.9.29.so      fork
4         ..  ld-uClibc-0.9.29.so      _dl_allocate_static_tls
4         ..  libuClibc-0.9.29.so      bsearch
4         ..  oprofiled                opd_process_samples
3         ..  ld-uClibc-0.9.29.so      _dl_next_tls_modid
3         ..  ld-uClibc-0.9.29.so      _dl_parse_dynamic_info
3         ..  libuClibc-0.9.29.so      __heap_free
3         ..  libuClibc-0.9.29.so      __psfs_parse_spec
3         ..  libuClibc-0.9.29.so      _uintmaxtostr
2         ..  ld-uClibc-0.9.29.so      _dl_calloc
2         ..  ld-uClibc-0.9.29.so      _dl_find_hash
2         ..  ld-uClibc-0.9.29.so      _dl_strdup
2         ..  ld-uClibc-0.9.29.so      _dl_unmap_cache
2         ..  ld-uClibc-0.9.29.so      init_tls
2         ..  libuClibc-0.9.29.so      .plt
2         ..  libuClibc-0.9.29.so      __psfs_do_numeric
2         ..  libuClibc-0.9.29.so      byte_regex_compile
2         ..  libuClibc-0.9.29.so      free
2         ..  libuClibc-0.9.29.so      memcmp
2         ..  libuClibc-0.9.29.so      re_search_2
2         ..  libuClibc-0.9.29.so      read
2         ..  libuClibc-0.9.29.so      strcoll
2         ..  libuClibc-0.9.29.so      strcspn
2         ..  libuClibc-0.9.29.so      strpbrk
2         ..  oprofiled                pop_buffer_value
1         ..  ld-uClibc-0.9.29.so      _dl_allocate_tls_init
1         ..  ld-uClibc-0.9.29.so      _dl_app_init_array
1         ..  ld-uClibc-0.9.29.so      _dl_initial_error_catch_tsd
1         ..  ld-uClibc-0.9.29.so      _dl_linux_resolve
1         ..  ld-uClibc-0.9.29.so      _dl_linux_resolver
1         ..  ld-uClibc-0.9.29.so      _dl_map_cache
1         ..  libuClibc-0.9.29.so      __pgsreader
1         ..  libuClibc-0.9.29.so      __sigsetjmp
1         ..  libuClibc-0.9.29.so      __stdio_trans2w_o
1         ..  libuClibc-0.9.29.so      __stdio_wcommit
1         ..  libuClibc-0.9.29.so      __uClibc_init
1         ..  libuClibc-0.9.29.so      _stdlib_strto_ll
1         ..  libuClibc-0.9.29.so      bsd_signal
1         ..  libuClibc-0.9.29.so      close
1         ..  libuClibc-0.9.29.so      exit
1         ..  libuClibc-0.9.29.so      fclose
1         ..  libuClibc-0.9.29.so      fflush
1         ..  libuClibc-0.9.29.so      fflush_unlocked
1         ..  libuClibc-0.9.29.so      getc_unlocked
1         ..  libuClibc-0.9.29.so      putc_unlocked
1         ..  libuClibc-0.9.29.so      strcpy
1         ..  libuClibc-0.9.29.so      vsscanf
1         ..  oprofiled                do_match
1         ..  oprofiled                get_file
1         ..  oprofiled                odb_do_hash
1         ..  oprofiled                odb_open_count
1         ..  oprofiled                op_get_mtime
1         ..  oprofiled                sfile_log_sample

For more information on 'opreport' capabilities, you can see the OProfile manual at: http://oprofile.sourceforge.net/doc/index.html

Debugging gstreamer

Using dmaiperf gstreamer element

Dmaiperf is a gstreamer element to observe the DSP utilization, one of its properties is to define the engine-name. If you setup this property to codecServer, it will display information regarding DSP performance such as usage.

The following is an example pipeline with that property set:

gst-launch -e audiotestsrc num-buffers=600 ! audio/x-raw-int, endianness=123 4, signed=true, width=16, depth=16, rate=44100, channels=1 ! dmaienc_aac  copyOutput=true !
dmaiperf enginename=codecServer ! qtmux ! filesink location=test.mp4 -v

This is an example of the dmaiperf element output, the DSP: 21 specifies the average utilization of the DSP, this means that if there are two encoders, it will be the total amount used by both of them.

The other information specifies the sections that the codec server is using such as DDR2, DDRALGHEAP, etc, their sizes, base address, maximum size of the block. This information is useful if you need to debug issues related to the memory map.

INFO:
Timestamp: 0:09:36.950987086; bps: 8377; fps: 15; DSP: 21; mem_seg:
DDR2; base: 0xc3c98589; size: 0x20000; maxblocklen: 0x13910; used:
0xc6f0; mem_seg: DDRALGHEAP; base: 0xc3100000; size: 0xa00000;
maxblocklen: 0x96ef10; used: 0x90f78; mem_seg: IRAM; base:
0x11800000; size: 0x20000; maxblocklen: 0x10000; used: 0x10000;
mem_seg: L3_CBA_RAM; base: 0x80000000; size: 0x10000; maxblocklen:
0x10000; used: 0x0;

INFO:
Timestamp: 0:09:37.153399169; bps: 11846; fps: 14;

Information such as the timestamp, bps and fps depend on where you place the dmaiperf element. In the pipeline below, I placed one dmaiperf element after the video encoder, this means that the information in italic (see above output) is related to this element. I placed the other dmaiperf element after the audio encoder with the engine-property set, this will allow you to see the overall DSP information (utilization, memory map distribution...) and the specific information of the video encoder: timestamp, bps and fps.

gst-launch videotestsrc  ! ffmpegcolorspace ! videorate ! video/x-raw-yuv,framerate=15/1 ! videoscale ! video/x-raw-yuv,width=320,height=240 ! queue ! dmaienc_h264 
maxbitrate=200000 targetbitrate=200000 !  dmaiperf ! queue ! rtph264pay ! queue ! udpsink name=vsink host=192.168.0.199 port=10000 audiotestsrc !
audio/x- raw-int,endianness=1234,signed=true,width=16,depth=16,rate=16000,channels=1 ! queue ! dmaienc_aac copyOutput=true  bitrate=32000 maxbitrate=64000 ! 
dmaiperf engine-name=codecServer ! queue ! rtpmp4gpay ! queue ! udpsink name=asink host=192.168.0.199 port=10002

To wrap up, timestamp, fps and bps are related to the element that is before the dmaiperf element. The DSP utilization and memory information are general.

Valgrind

Valgrid is a suite of free software tools for debugging and profiling programs, Valgrind allows you to improve your programs or applications, making them faster and less buggy. One of the most important Valgrind tools is Memcheck. Memcheck identifies common memory issues that are produced by the following reasons:

The program is:

Accessing memory that it shouldn't: for instance, if the program is accessing memory after it has been freed
Using values that haven't been defined or initialized
Doing an incorrect freeing of heap memory
Overlapping pointers when using functions like memcpy
Has memory leaks

The standard distribution of Valgrind includes other useful tools, among them:

Cachegrind: This tool is a cache profiler that can simulate the L1, D1 and L2 caches in your CPU and can identify the source and numbers of cache misses during the execution of your program.
Callgrind: Callgrind is an extension of cachegrind, but it includes further information that Cachegrind doesn't provide. Optionally, cache simulator information about the memory access behavior of your application could be used.
Helgrind is a tool that allow you to create in an easier way a correct multi-threaded program, It is a thread error detector.
DRD: has the same functionality that Helgrind but it use different analysis techniques and as a consequence can find additional issues.
Massif: It can help you create a program that use less memory that the original, is used as a heap profiler.

If you want more information about Valgrind's tools, read The Valgrind User Manual. Also you can learn how to debug an easy Hello world program with memory leak issues by looking at The Valgrind quick start guide.

Due the advantages offered by Valgrind during the debug process of an application, it is available as one developer tool in the RidgeRun's SDKs. Currently Valgrind only supports in ARMv7 architectures but if you are interested on ARMv5 Valgrind support don't hesitate to contact RidgeRun.

In order to enable Valgrind in your SDK, you must select it under:

RidgeRun SDK Configuration
 File System Configuration
  Select target's file system software--->
   [*] Valgrind

note: Valgrind is only available in the professional SDK