Profiling DRI drivers

Using oprofile

Oprofile is not difficult to setup, it doesn't require special compilation flags other than debugging info, and it allows to know where time is spent globally (including time spent inside the kernel and X server).

Here is a simple script that automates the invocation of oprofile to profile glxgears:

#!/bin/sh
VMLINUX=/boot/vmlinux-`uname -r`
if [ -f $VMLINUX ]
then
        opcontrol --vmlinux=$VMLINUX
else
        opcontrol --no-vmlinux
fi
opcontrol --separate=kernel
opcontrol --callgraph=16
opcontrol --init
opcontrol --reset
glxgears &
PID=$!
sleep 5s
opcontrol --start 
sleep 60s
kill $PID
opcontrol --stop
opcontrol --dump

NOTE: Oprofile requires kernel support, and an uncompressed vmlinux to profile inside the kernel. For Debian-based distro users, oprofile already comes with stock kernel, but uncompressed vmlinux images dot not. Check the instructions in /usr/share/doc/oprofile/README.Debian.gz for more details on how to proceed. For Redhat enable the core-debuginfo repos in /etc/yum.repos.d/, and yum install kernel-debuginfo. See here for more detail on oprofile and ?RedHat.

WARNING: I used the above script on a Celeron processor, for which oprofile can produce timer based information only. Time is usually more interesting than other CPU events. Therefore, on non-Celeron processors you will likely need to pass an --event=XXX option to request specifically timer based events. The actual option depends on your kernel version and processor. See the oprofile manual for more information.

You can then view the output by:

opreport -l | less

You can also generate a colored call graph using gprof2dot.py:

opreport -cgf > glxgears.oprofile
/path/to/gprof2dot.py -f oprofile -o glxgears.dot glxgears.oprofile
dot -Tpng -o glxgears.png glxgears.dot

You'll hopefully get an image similar to this one.

See the xdot.py script for interactive viewing of *.dot files.

NOTE: The gprof2dot.py and xdot.py are work in progress. Please contact-me if you have any problems with them. -- JoseFonseca

You can also use KCacheGrind together with the included op2calltree script to visualize it. However, op2calltree doesn't support call graph information. You'll only get a linear profile.

Valgrind + Callgrind

Using valgrind is very simple:

valgrind --tool=callgrind glxgears

And you can use KCacheGrind to visualize the output it generates.

However, callgrind only measures CPU instructions, not time. So the time that driver is spending waiting for buffers from DRM, or the card is not accounted. Enhancing callgrind to measure time would be an interesting project.

Using gprof

Gprof requires the code to be recompiled and libraries statically linked, which is not convenient for profiling DRI drivers.

However, Gallium 3D has the option to be compiled statically with profiling generation:

make linux-debug
cd progs/xdemos
make
./glxgears
gprof glxgears

You can also generate a graph from gprof output using http://code.google.com/p/jrfonseca/wiki/Gprof2Dot .

Others

Here are some other profilers which might also be useful:

  • Sysprof -- similar to oprofile, comes with a gui
  • qprof -- process-wide profiling without need for kernel support
  • simple benchmarking of screensaver hacks