Profiling DRI drivers

Using oprofile

Oprofile is not difficult to setup, it doesn't require special compilation flags other than debugging info, and it allows to know where time is spent globally (including time spent inside the kernel and X server).

Here is a simple script that automates the invocation of oprofile to profile glxgears:

VMLINUX=/boot/vmlinux-`uname -r`
if [ -f $VMLINUX ]
        opcontrol --vmlinux=$VMLINUX
        opcontrol --no-vmlinux
opcontrol --separate=kernel
opcontrol --callgraph=16
opcontrol --init
opcontrol --reset
glxgears &
sleep 5s
opcontrol --start 
sleep 60s
kill $PID
opcontrol --stop
opcontrol --dump

NOTE: Oprofile requires kernel support, and an uncompressed vmlinux to profile inside the kernel. For Debian-based distro users, oprofile already comes with stock kernel, but uncompressed vmlinux images dot not. Check the instructions in /usr/share/doc/oprofile/README.Debian.gz for more details on how to proceed. For Redhat enable the core-debuginfo repos in /etc/yum.repos.d/, and yum install kernel-debuginfo. See here for more detail on oprofile and ?RedHat.

WARNING: I used the above script on a Celeron processor, for which oprofile can produce timer based information only. Time is usually more interesting than other CPU events. Therefore, on non-Celeron processors you will likely need to pass an --event=XXX option to request specifically timer based events. The actual option depends on your kernel version and processor. See the oprofile manual for more information.

You can then view the output by:

opreport -l | less

You can also generate a colored call graph using

opreport -cgf > glxgears.oprofile
/path/to/ -f oprofile -o glxgears.oprofile
dot -Tpng -o glxgears.png

You'll hopefully get an image similar to this one.

See the script for interactive viewing of *.dot files.

NOTE: The and are work in progress. Please contact-me if you have any problems with them. -- JoseFonseca

You can also use KCacheGrind together with the included op2calltree script to visualize it. However, op2calltree doesn't support call graph information. You'll only get a linear profile.

Valgrind + Callgrind

Using valgrind is very simple:

valgrind --tool=callgrind glxgears

And you can use KCacheGrind to visualize the output it generates.

However, callgrind only measures CPU instructions, not time. So the time that driver is spending waiting for buffers from DRM, or the card is not accounted. Enhancing callgrind to measure time would be an interesting project.

Using gprof

Gprof requires the code to be recompiled and libraries statically linked, which is not convenient for profiling DRI drivers.

However, Gallium 3D has the option to be compiled statically with profiling generation:

make linux-debug
cd progs/xdemos
gprof glxgears

You can also generate a graph from gprof output using .


Here are some other profilers which might also be useful:

  • Sysprof -- similar to oprofile, comes with a gui
  • qprof -- process-wide profiling without need for kernel support
  • simple benchmarking of screensaver hacks