I’ve played a little bit with Intel Parallel Studio. Let’s say it has been a pleasant trip out in the wildness of multithreaded applications.
Intel Parallel Studio is a set of tools geared toward multithreaded applications. It consists of three Visual Studio plugins (so you need a fully-fledged Visual Studio, not an Express edition):
- Parallel Inspector for memory analysis
- Parallel Amplifier for thread behavior and concurrency
- Parallel Composer for parallel debugging
This is an update of the review I’ve done for the beta version. Since this first review, I’ve tried the official first version.
Since the beta phase, Intel added a lot of documentation, online help, as well as additional samples. This was my main complaint at that time, and now, I can say that Intel provides a complete tool with appropriate help. There is still room for improvement, but not much. For instance, here are videos presenting Parallel Studio.
There is a simple sample to show how all the plugins can be used simultaneously, the NQueens solution, that is also the main Composer example. For Composer though, different parallelization solutions are proposed. According to the Starting Guide and other documents, Intel’s workflow consists of using Advisor (I’ll try to use it in an other post), then Composer to debug the parallelization, Inspector to check for contentions, … and then Inspector to profile the application. One of the videos is dedicated to showing how to use the plugins with the NQueens sample.
As a final point, each plugin has a specific, parametrable toolbar, with a distinct icon.
Parallel Composer is mainly an parallel extension to Visual Studio’s debugger. It is based on an Intel runtime, which means you have to use Intel C++ Compiler, which is provided, as well as IPP (a primitives library) and TBB (a parallel library), but not MKL, the scientific library. The 11.1 version of the compiler provides OpenMP 3.0 (Visual Studio compiler only provides 2.5) and thus task parallelism. Intel’s goal is to provide this to C developers (C++ programmers can use TBB, for instance).
The goal of the extension is to detect shared data and its implication on reentrancy (can this function be simultaneously called by different threads ?) or the task and thread tree with OpenMP.
The OpenMP panels are not only for OpenMP. They are for every extension that needs /qopenmp (for instance for the parallel extension like __par), in which case useful information is displayed for the state of existing threads. It is also possible to suppress the multithreading and use a monothread execution.
It seems that it is possible to debug several process simultaneously, like TotalView does, but there are no example and no tutorial to explain how to do this.
Parallel Composer is a powerfull debugger extension, with a lot of information that you can get. On one hand, Intel did also a good job to provide tutorials and an online help. On the other hand, the documentation for the most important plugin is perhaps the shortest compared to the two other ones.
Parallel Inspector is in charge of detecting general memory issues as well as thread memory issues. Depending on the inspection level, the execution time can be several times longer. Each time a problem is detecting, it is assigned a gravity degree and registered in a list where you can then have access to its location and the source code.
The first analysis is the general memory one. It detects, for instance, memory leaks. Here is a result that it can give:
Usually, this kind of detection needs to modify your code, or with Linux, you have to preload a library that will detect memory leaks (or use valgrind). Here, the really great point is that there are no modification to do on the code and you can use the compiler of your choice.
The real addition of Inspector is of course not checking for memory leaks. Parallel Inspector is not titled “Parallel” for nothing. It can check concurrent memory accesses, and thus warn the developers that some threads can read or write concurrently. Of course, once you’ve checked the access is not dangerous, you can indicate Inspector to skip it (so the inspection is faster next time).
Inspector is, in my opinion, the easiest-to-use plugin of Parallel Studio. I find it easy to use because memory checks is something developers always care, so we know what to expect from it.
Parallel Amplifier is a profiler (I don’t know if it is instrumentation- or sampling-based) like the one you can found in Visual Studio Team edition, or like VTune, the fully-fledged profiler Intel sells as a stand-alone product. Here, you can only get the execution time, but it is still valuable information (if you need more, go and get VTune or Visual Studio Team). Then, for the Parallel profile, you can get the concurrency quality as well as waiting time.
Hotspot is the first profile you can get. The goal is to find where the application sends most of its time, which is in fact called the “hotspot”. In the next example, it is algorithm2, and by double-clicking on it, an annotated source code is displayed.
How scalable is my program? This is what the second profile tries to answer to. In this case, the scalability is given in the panel at the lower right of the screen (here, for two processors, I get 1.57, which means 78.4% of use, or efficiency). Source code can then be displayed with the annotations, here the lack of concurrency comes from the display routines. On the other hand, algorithm2 scales well. To optimize your concurrency, what you need is to reduce the red/”poor” part of the bar, and maximize the other ones.
Finally, a crucial issue is waiting and locks. Here again, Amplifier has a specific profile. Here, the main thread only waits for the subthreads to return.
Profiling should be done anytime, and it is interesting to see whether one optimization enhances the program or not. Amplifier can help you do this.
Amplifier comes with several examples, and a good online help. It is not meant to be a full guide to optimization (there are complete books dedicated to this topic), but it gives you access to the tools you need and some leads to use them correctly.
If Amplifier and Inspector are intuitive and simple to use, it is perhaps not the same for Composer. Intel provides several videos as tutorials to help you use all the plugins, as well as complete guides and samples. Parallel Composer is perhaps less documented, but it is mainly more complicated to use, at least from my point of view.
This product is very helpful, in my opinion, not code intrusive (I’m thinking about Amplifier and Inspector for detecting issues without additional libraries) and efficient. The tackled issues are not easy ones to solve, and it does it brilliantly. Since the beta phase, Intel did a tremondous job at providing better documentation for its tool, and now it is the best tool for multithreaded development.
Dr. Dobbs publish some days ago a small post on what is needed for multithreaded application development, and it said Parallel Studio is the perfect tool to help this.