Convert HPCToolkit files to callgrind format

After my post on HPCToolkit, I felt that I prefered QCacheGrind as a GUI to explore profiling results. So here is a gist with a Python script to convert XML HPCToolkit experiments to callgrind format: https://gist.github.com/mbrucher/6cad31e38beca770523b

For instance, this is a display of an Audio Toolkit test of L2 cache misses:

ATK L2 cache misses profile
Profiling with HPC Toolkit

I’ve started working with the HPC Toolkit, especially the part where it can use PAPI to read hardware counters and display the result in a nice GUI.

Profiling with Visual Studio Performance Tool

After presenting Valgrind as an emulation profiler, I will present Microsoft solution, Visual Studio Performance Tool. It is available in the Team Suite editions, and offers a sampling- and an instrumentation-based profiler. Of course, it is embedded in Visual Studio IDE and accessible from a solution.
Profiling with Valgrind/Callgrind

Profiling comes in three different flaviors. The first is emulation, where a processor behavior is emulated, the second is sampling, where at regular intervals, the profiler samples the status of a program, and fianlly instrulentation, where the profiler gets information when a subroutine is called and when it returns. As with the Heisenberg uncertainty, profiling changes the exact behavior of your program. This is something you have to remember when analyzing a profile.

Valgrind is an Open Source emulation profiler. It is freely available on standard Linux platforms. As it is an emulation, it is far slower than the actual program. This means that the I/O are underestimated. The advantage is that you can have every detail on the memory behavior (cache misses for instance). Valgrind does not emulate all processors, but you can tweak it to approach your own one.
