This article documents a developer tool.
A list of available devtools is available, together with installation instructions.


[edit] Description

Oprofile is a low overhead system-wide profiler for linux. It can be used to find CPU usage bottlenecks in the whole system and within processes.

[edit] Packages

source: oprofile

binary: oprofile

[edit] Installing Oprofile

[edit] Configuring the device

In order to run oprofile on your device an extra module will need to be installed. The oprofile kernel module is found in the kernel-modules-debug package, that can be installed from the tools repository.

[edit] Installing oprofile to the device

Provided that you have the Fremantle tools repository in your APT sources.list, the easiest way to install oprofile and the required kernel module is using apt.

Nokia-N900-40-12:~# apt-get install oprofile kernel-modules-debug

This will also install binutils.

[edit] Installing debug symbols

In order to view any useful profiling information at functions level, you will have to install debugging symbols. Debugging symbols normallycome with debugging (-dbg) packages. The easiest way to install all dbg packages required for a given binary is to use debug-dep-install script which comes with the maemo-debug-scripts package:

Nokia-N900-40-12:~# apt-get install maemo-debug-scripts
Nokia-N900-40-12:~# debug-dep-install /usr/bin/osso-xterm.launch

[edit] Usage

1. On the device, type:

Nokia-N900-40-12:~# insmod /lib/modules/current/oprofile.ko
Nokia-N900-40-12:~# opcontrol --no-vmlinux
Nokia-N900-40-12:~# opcontrol --separate=kernel
Nokia-N900-40-12:~# opcontrol -c 8
Nokia-N900-40-12:~# opcontrol --init

Like with the --separate=library option, the --separate=kernel option separates the collected statistics per process and their components. In most use-cases cases processes (implicitly) request other processes like X server and hildon-desktop to do work for them. To optimize the CPU usage, you need to see which processes need to use most CPU and in which of its components (binary/libraries) in the whole system. The --separate=kernel option will additionally assign CPU usage within kernel under the processes that caused it. The vmlinux binary name is used for this part. The -c 8 option will make oprofile collect information about the call graph, till the depth of 8 function calls.

2. Start the usecase you are interested in and type:

Nokia-N900-40-12:~# opcontrol --reset
Nokia-N900-40-12:~# opcontrol --start

3. When you've finished, type:

Nokia-N900-40-12:~# opcontrol --stop

Now you've collected the data.

[edit] Viewing profile reports

To see basic per-process picture, type opreport:

Nokia-N900-40-12:~# opreport
CPU: OMAP GPTIMER, speed 0 MHz (estimated)
Counted GPTIMER_CYCLES events (32KiHz timer clock cycles between interrupts) with a unit mask of 0x00 (No unit mask) count 16
  samples|      %|
    43666 88.5972 no-vmlinux
     2636  5.3484 maemo-launcher
          samples|      %|
              491 18.6267 no-vmlinux
              450 17.0713
              410 15.5539
              342 12.9742
              275 10.4325
              138  5.2352
              134  5.0835
               55  2.0865
               50  1.8968 hildon-desktop.launch
               45  1.7071
               42  1.5933
               32  1.2140
               32  1.2140
               28  1.0622
               27  1.0243

After you know which processes and components are taking most of CPU, you need to find out the bottleneck functions/functionality in them. For this you need to install debug symbols for them.

Note: If with the --separate=kernel option there's a lot of CPU activity for kernel that's not assigned under any process, it means that the system/kernel is idle. If your use-case is (unexpectedly) slow despite system idling a lot, usually these kind of issues are related to locking and other inter-process interaction issues that cannot be analyzed by looking at the CPU usage.

To see more detailed symbol analysis use opreport -l:

Nokia-N900-40-12:~# opreport -l /usr/bin/Xorg | more
warning: /no-vmlinux could not be found.
CPU: OMAP GPTIMER, speed 0 MHz (estimated)
Counted GPTIMER_CYCLES events (32KiHz timer clock cycles between interrupts) with a unit mask of 0x00 (No unit mask) count 16
samples  %        image name               symbol name
313      51.7355  no-vmlinux               /no-vmlinux
153      25.2893  Xorg                     /usr/bin/Xorg
36        5.9504    /usr/lib/
31        5.1240              /lib/
11        1.8182                /usr/lib/xorg/modules/
10        1.6529             /usr/lib/xorg/modules/drivers/
10        1.6529             /usr/lib/xorg/modules/extensions/
8         1.3223             /usr/lib/
7         1.1570       /usr/lib/
7         1.1570        /lib/
6         0.9917                 /usr/lib/xorg/modules/
5         0.8264               /usr/lib/xorg/modules/extensions/
5         0.8264             /lib/
2         0.3306              /usr/lib/
1         0.1653             /usr/lib/xorg/modules/input/

Once you know what functionality is a bottleneck, you need to find out whether your process should be (indirectly) causing the use of that functionality in the first place, is it using it too much/often or should the bottleneck functionality itself be optimized. Analysis of this falls to the corresponding process developers as only they know what their application is trying to achieve, why & how and before this kind analysis it's too early to assign/report bugs for lower level components.

[edit] Profiling with callgraphs

If you have initialized opcontrol with the -c option as described before, you should now be able to get call graphs for your applications. The textual information reported by opreport in these cases is a bit difficult to read, but there are ways to generate nice graphs out of them:

Nokia-N900-40-12:~# opreport -l /usr/bin/Xorg -c > oprofile.log
#...copy the oprofile.log to your PC...
myPC$ cat oprofile.log | python -f oprofile | dot -Tpng -o callgraph.png

You need the script and the dot tool, which is part of the Graphviz software (which is in Ubuntu's graphviz package).

[edit] Viewing reports from a PC

opreport -l, and especially opreport -c -l can take quite a long time when fired up on the devices. Therefore, it often makes sense to run opreport in scratchbox.

  1. Configure scratchbox target in a way that its binaries and libraries 100% match the target's.
  2. Collect profiling data as usual
  3. Copy contents of /var/lib/oprofile from the device to the corresponding directory in scratchbox target.
  4. in scratchbox, apt-get install maemo-debug-scripts (this may not be omitted)
  5. install debug packages either with debug-dep-install or by hand

Note: the binaries and libraries in the scratchbox target must match what's in the device, otherwise you will get bogus results.

[edit] Oprofile with kcachegrind

kcachegrind is a useful GUI tool for viewing performance data interactively. It comes with many modern linux distros.

To use it:

  1. Get the callgraph oprofile data (see above) and install the same packages also to scratchbox.
  2. Copy the profile data to scratchbox session as described above.
  3. install kcachegrind-converters package on HOST (debian, ubuntu)
  4. in scratchbox: opreport -gdf | op2calltree (you might want to copy op2calltree script somewhere on target)
  5. the resulting files can now be opened with kcachegrind on host, provided you set it to display ALL files (extensions are wrong)

[edit] Links

[edit] See Also