After my post on HPCToolkit, I felt that I prefered QCacheGrind as a GUI to explore profiling results. So here is a gist with a Python script to convert XML HPCToolkit experiments to callgrind format:

For instance, this is a display of an Audio Toolkit test of L2 cache misses:

ATK L2 cache misses profile
ATK L2 cache misses profile


In the new C++ standard, multithread finally appears, with the old standard supported with TR2. This new addition has numerous implications on how programs are coded, and there are of course almost no book on this matter. This one is an exception.

Note: this review is not based on the final version that is now available (June the 28th), but on the MEAP one. There may be some differences between the final draft and the one I based my review on, although I don’t expect many, and certainly not any huge change.

Read More

We know now that we won’t have the same serial computing increase we had in the last decades. We have to cope with optimizing serial codes, and programming parallel and concurrent ones, and this means that all coders have to cope with this paradigm shift. If computer scientists are aware of the tools to use, it is not the same for the “average” scientist or engineer. And this is the purpose of this book: educate the average coder.

Read More

Due to the end of the free lunch, manufacturers started to provide differents processing units and developers started to go parallel. It’s kind of back to the future, as accelerators existed before today (the x87 FPU started as a coprocessor, for instance). If those accelerators were integrated into the CPU, their instruction set were also.

Today’s accelerators are not there yet. The tools are not ready yet (code translators) and usual programming practices may not be adequate. All the ecosystem will evolve, accelerators will change (GPUs are the main trend, but they will be different in a few years), so what you will do today needs to be shaped with these changes in mind. How is it possible to do so? Is it even possible?

Read More

I’ve played a little bit with Intel Parallel Studio. Let’s say it has been a pleasant trip out in the wildness of multithreaded applications.

Intel Parallel Studio is a set of tools geared toward multithreaded applications. It consists of three Visual Studio plugins (so you need a fully-fledged Visual Studio, not an Express edition):

  • Parallel Inspector for memory analysis
  • Parallel Amplifier for thread behavior and concurrency
  • Parallel Composer for parallel debugging

This is an update of the review I’ve done for the beta version. Since this first review, I’ve tried the official first version.

Read More