Parallel Studio: Using Advisor Lite

After reviewing Parallel Studio, I’ve decided to look after Advisor Lite. Intel offers it for free, before the actual Advisor is released with a future Parallel Studio version. It aims at steering multithreaded development with Parallel Studio.

I’ve started with the Starting Guide, and in fact, it is the best way to know how to use this plugin. Advisor offers four steps, two of them being short-cuts to the online help, and the two others link to some Parallel Studio actions (namely hotspot in Amplifier and the threaded memory check with Inspector).
The online help is interesting, but once you know how you can parallelize an application and what to look for, the two Parallel Studio actions with the help of some macros presented in the Starting Guide are the only thing you need.

Test on parallelizing a custom library

I’ve decided to test Advisor Lite on my Interactive Raytracer. This is a test to verify if Advisor Lite finds the adequate parallelization and the memory sharing issues. It is a simple raytracer, so it can be parallelized for each pixel in the image. The only memory sharing issue that I know of is in the kd-tree ray traversal.

Profiling the library

First, I will profile the library. For the complete Advisor Lite workflow, I have to use Intel Compiler, and as it is faster than Microsoft’s compiler, I will use the timeit_image.py script instead of the measure_image.py I’ve used when profiling with Valgrind or Visual Studio.

Amplifier can show the results in a bottom-up or in a top-down manner. Unfortunately, you only have the exclusive timing that is displayed. In my case, when displaying bottom-up results, the method getEntryExitDistances() is the most costly one. In the top-down view, unfortunately, I can’t have a simple tree, as it can be seen in the following view:

IRT: Amplifier profile (call-tree view)

In Visual Studio, I have the same results – more or less -, but with a correct top-down call-tree:

Profile returned by Visual Studio Performance Tool (call-tree)

The method getEntryExitDistances() cannot be parallelized: it is recursively called, several times per pixel, which would lead to a lot of memory contention. The simpler task is thus to parallelized the pixel rendering, a perfect data-parallel problem.

Annotation of the code

OK, now I can annotate my code. I had to dig inside the help for this, as it was not mentionned in the Starting Guide that Intel provides a header, annotate.h, which mimics the issues you may encounter in a multithreaded application.

So you need to read at least once the online help so that you know the available annotation macros, how you can get them and how they will retrieve what you need. Once the code is annotated, it must be recompiled and then the sharing issues can be detected.

Detection of sharing issues

As expected, Inspector detected errors in the kd-tree traversal:

Memory sharing issues detected by Inspector
The solution in this case is to have a ray-traversal stack per thread, which will have to be implemented in whichever parallel library will be chosen, or simply to put the stack in the actual traversal algorithm and not in the instance.

Using TBB

I’ve decided to go for Thread Building Blocks, as it was already used for game development. This seemed to me a good idea, as it is a Open Source solution. So now, I will split the screen in 2D pieces, and add a thread-specific storage in the kd-tree class. Of course, I will have to add a flag to disable this paralellization if TBB is not available.

The actual parallelization will be in a future post in the Interactive Raytracer category. It is pretty straightforward once I had the different elements Parallel Studio gave me.

Conclusion

In fact Advisor is mainly the annotate.h header, as you have to know your program to put the macros at correct locations. The parallelization must be done by hand, as well as correcting the memory sharing issues.

The only problem I had is that annotate.h includes window.h. This header is not C++ compliant and declares some macros as max() (in fact I got the same issue with TBB headers!). As I use a max() function declared in std::numerical_limits, I had to explicitely undefine this macro.

Safe from this, Advisor Lite is a good plugin, and I’m looking forward to seeing Advisor in a next Parallel Studio release.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.