Documentation/devtools/maemo5/oprofile

Description
Oprofile is a low overhead system-wide profiler for linux. It can be used to find CPU usage bottlenecks in the whole system and within processes.

Packages
source: oprofile

binary: oprofile

Configuring the device
In order to run oprofile on your device an extra module will need to be installed. The oprofile kernel module is found in the kernel-modules-debug package, that can be installed from the tools repository.

Installing oprofile to the device
Provided that you have the Fremantle tools repository in your APT sources.list, the easiest way to install oprofile and the required kernel module is using apt.

Nokia-N900-40-12:~# apt-get install oprofile kernel-modules-debug

This will also install binutils.

Installing debug symbols
In order to view any useful profiling information at functions level, you will have to install debugging symbols. Debugging symbols normallycome with debugging (-dbg) packages. The easiest way to install all dbg packages required for a given binary is to use debug-dep-install script which comes with the maemo-debug-scripts package:

Nokia-N900-40-12:~# apt-get install maemo-debug-scripts Nokia-N900-40-12:~# debug-dep-install /usr/bin/osso-xterm.launch

Usage
1. On the device, type:

Nokia-N900-40-12:~# insmod /lib/modules/current/oprofile.ko Nokia-N900-40-12:~# opcontrol --no-vmlinux Nokia-N900-40-12:~# opcontrol --separate=kernel Nokia-N900-40-12:~# opcontrol --init

Like with the --separate=library option, the --separate=kernel option separates the collected statistics per process and their components. In most use-cases cases processes (implicitly) request other processes like X server and hildon-desktop to do work for them. To optimize the CPU usage, you need to see which processes need to use most CPU and in which of its components (binary/libraries) in the whole system. The --separate=kernel option will additionally assign CPU usage within kernel under the processes that caused it. The vmlinux binary name is used for this part.

2. Start the usecase you are interested in and type:

Nokia-N900-40-12:~# opcontrol --reset Nokia-N900-40-12:~# opcontrol --start

3. When you've finished, type:

Nokia-N900-40-12:~# opcontrol --stop

Now you've collected the data.

Viewing profile reports
To see basic per-process picture, type opreport:

Nokia-N900-40-12:~# opreport CPU: OMAP GPTIMER, speed 0 MHz (estimated) Counted GPTIMER_CYCLES events (32KiHz timer clock cycles between interrupts) with a unit mask of 0x00 (No unit mask) count 16 GPTIMER_CYCLES:16| samples|     %| --   43666 88.5972 no-vmlinux 2636 5.3484 maemo-launcher GPTIMER_CYCLES:16| samples|     %| --             491 18.6267 no-vmlinux 450 17.0713 libclutter-eglx-0.8.so.0.800.2 410 15.5539 libgobject-2.0.so.0.2000.3 342 12.9742 libGLESv2.so             275 10.4325 libglib-2.0.so.0.2000.3 138 5.2352 libpthread-2.5.so              134  5.0835 libc-2.5.so               55  2.0865 libdbus-1.so.3.4.0 50 1.8968 hildon-desktop.launch 45 1.7071 libpango-1.0.so.0.2400.2 42 1.5933 libgdk-x11-2.0.so.0.1400.7 32 1.2140 libX11.so.6.2.0 32 1.2140 libpulsecommon-0.9.15.so               28  1.0622 libgtk-x11-2.0.so.0.1400.7 27 1.0243 libgio-2.0.so.0.2000.3 ...

After you know which processes and components are taking most of CPU, you need to find out the bottleneck functions/functionality in them. For this you need to install debug symbols for them.

Note: If with the --separate=kernel option there's a lot of CPU activity for kernel that's not assigned under any process, it means that the system/kernel is idle. If your use-case is (unexpectedly) slow despite system idling a lot, usually these kind of issues are related to locking and other inter-process interaction issues that cannot be analyzed by looking at the CPU usage.

To see more detailed symbol analysis use opreport -l:

Nokia-N900-40-12:~# opreport -l /usr/bin/Xorg | more warning: /no-vmlinux could not be found. CPU: OMAP GPTIMER, speed 0 MHz (estimated) Counted GPTIMER_CYCLES events (32KiHz timer clock cycles between interrupts) with a unit mask of 0x00 (No unit mask) count 16 samples %        image name               symbol name 313     51.7355  no-vmlinux               /no-vmlinux 153     25.2893  Xorg                     /usr/bin/Xorg 36       5.9504  libpixman-1.so.0.15.3    /usr/lib/libpixman-1.so.0.15.3 31       5.1240  libc-2.5.so              /lib/libc-2.5.so 11        1.8182  libexa.so                /usr/lib/xorg/modules/libexa.so 10        1.6529  fbdev_drv.so             /usr/lib/xorg/modules/drivers/fbdev_drv.so 10        1.6529  librecord.so             /usr/lib/xorg/modules/extensions/librecord.so 8         1.3223  libsrv_um.so             /usr/lib/libsrv_um.so 7         1.1570  libdbus-1.so.3.4.0       /usr/lib/libdbus-1.so.3.4.0 7        1.1570  libpthread-2.5.so        /lib/libpthread-2.5.so 6         0.9917  libfb.so                 /usr/lib/xorg/modules/libfb.so 5         0.8264  libdri2.so               /usr/lib/xorg/modules/extensions/libdri2.so 5         0.8264  librt-2.5.so             /lib/librt-2.5.so 2         0.3306  libpvr2d.so              /usr/lib/libpvr2d.so 1         0.1653  evdev_drv.so             /usr/lib/xorg/modules/input/evdev_drv.so

Once you know what functionality is a bottleneck, you need to find out whether your process should be (indirectly) causing the use of that functionality in the first place, is it using it too much/often or should the bottleneck functionality itself be optimized. Analysis of this falls to the corresponding process developers as only they know what their application is trying to achieve, why & how and before this kind analysis it's too early to assign/report bugs for lower level components.

Profiling with callgraphs
TODO

Viewing reports from a PC
opreport -l, and especially opreport -c -l can take quite a long time when fired up on the devices. Therefore, it often makes sense to run opreport in scratchbox.

1. Configure scratchbox target in a way that its binaries and libraries 100% match the target's.

2. Collect profiling data as usual

3. Copy contents of /var/lib/oprofile from the device to the corresponding directory in scratchbox target.

4. in scratchbox, apt-get install maemo-debug-scripts (this may not be omitted)

5. install debug packages either with debug-dep-install or by hand

Note: the binaries and libraries in the scratchbox target must match what's in the device, otherwise you will get bogus results.

Oprofile with kcachegrind
kcachegrind is a useful GUI tool for viewing performance data interactively. It comes with many modern linux distros.

To use it:

1. Get the callgraph oprofile data (see above) and install the same packages also to scratchbox.

2. Copy the profile data to scratchbox session as described above.

3. install kcachegrind-converters package on HOST (debian, ubuntu)

4. in scratchbox: opreport -gdf | op2calltree (you might want to copy op2calltree script somewhere on target)

5. the resulting files can now be opened with kcachegrind on host, provided you set it to display ALL files (extensions are wrong)

Links
[oprofile man page](/development/documentation/man_pages/oprofile.html)

http://oprofile.sourceforge.net/about/

http://oprofile.sourceforge.net/doc/controlling.html

http://kcachegrind.sourceforge.net/cgi-bin/show.cgi