Profiling comes in three different flaviors. The first is emulation, where a processor behavior is emulated, the second is sampling, where at regular intervals, the profiler samples the status of a program, and fianlly instrulentation, where the profiler gets information when a subroutine is called and when it returns. As with the Heisenberg uncertainty, profiling changes the exact behavior of your program. This is something you have to remember when analyzing a profile.
Valgrind is an Open Source emulation profiler. It is freely available on standard Linux platforms. As it is an emulation, it is far slower than the actual program. This means that the I/O are underestimated. The advantage is that you can have every detail on the memory behavior (cache misses for instance). Valgrind does not emulate all processors, but you can tweak it to approach your own one.
This is more or less a translation of my French tutorial on Valgrind profiling.
Calling the profiler is really easy:
valgrind --tool=callgrind --dump-instr=yes --simulate-cache=yes --collect-jumps=yes program arguments
Here, I ask valgrind to use the callgrind profiler plugin, and it is supposed to dump the executed instructions (which will help knowing which part of a function really costs, not only which function), simulate the cache (to help enhancing the processor usage) and collect jumps (to have a dynamic view of the program behavior). Of course, the program must have been compiled with the appropriate compilation options (at least -g).
KCacheGrind is probably the best tool to visualize ad analyze valgrind results (it can also display other profilers results).
When opening a profile, KCacheGrind may not recognize the associated source files. You may add their folder to the annotations folders.
I think the most important graph KCacheGrind provides is the Callee Map. It can be colorized by different means (files, classes, …), the main point being that Callee Map provides an image where the surface of a function represents its weight in the program execution (weight being number of instructions, cache misses, …). Unfortunately it appears that in some cases, KCacheGrind is not able to create everything Callee related. I don’t know why, but I got this on a RedHat 4, the associate KCacheGrind and the latest valgrind.
Call graphs can also provide intel on how much each function consumes. When double-clicking on a function (in the call graph, in the Callee Map), it is “activated”. The original source code is shown (with jumps, if they were collected) with the cost for each instruction, which functions called the function and which functions are called. Another important thing is the difference between the self cost (sometimes called exclusive cost) and the inclusive cost. The former is the cost of the function alone, the latter is the cost of the function with the cost of the called functions.
Valgrind combined with KCacheGrind are free tools to make an application profile. It is far from perfect, but it provides valuable information. Instrumentation- and sample-based profiles need a patched kernel (for Linux) or administrator rights (for Windows and Linux), and they can’t provide at the moment every cost, contrary to emulation.