Book review: Intel Threading Building Blocks: Outfitting C++ for Multi-core Processor Parallelism

After some general books on grid computation, I needed to change the subject of my readings a little bit. As Intel Threading Building Blocks always intrigued me, I chose the associated book.

Content and opinions

After a small introduction chapter (installation, the TBB philosophy of tasks instead of threads), the second chapter exposes what parallelism is. In every C++ book (that I know of), the programmer is taught how to think one instruction after another. The trouble with parallel thinking is that you are executing several things together that can interact, and this is not something the human mind handles correctly. So every useful concept is presented, locks, scaling, …

The next chapter is perhaps the most useful one, as it presents basic parallel loops that you will use almost everywhere if you have data parallelism. The concept of range is introduced, and it’s the basic tool for TBB. The problem is split thanks to these ranges and when the ranges are small enough, tasks are created for each of these small ranges. The drawback of this chapter is the code quality (I hate when the code uses a mix of C/C++ when the library is 100% C++, that is, don’t use the climits header and a macro when there is the limits header!) and the lack of some explanations of the expected speedup of the algorithms. For instance, parallel_reduce is different than the parallel reduction proposed as a sample by CUDA. Then, parallel_scan (it computes y[n] = f[n](y[n-1])) is not clear either. The speedup is achieved by computing several time the same values, but in one case, the result is not saved in the memory. So what you can gain is the fact that for the last part of the computation, only the cost of computing the results for the first parts matters. So if your function is complicated, your speedup is zero. A small explanation for this would have been great.

Fortunately, this chapter was the only one that had such drawbacks. Unfortunately, these drawbacks occur in the most import part of the book. After these basic blocks, task parallelism is introduced. Contrary to data parallelism for which you split your dataset into chunks that can be computed on a lot of cores, task parallelism only splits the work into a fixed number of predefined tasks. This means that the work cannot scale as much as for data parallelism, but the task themselves can be data parallel. This is done with a pipeline. Other algorithms for other kinds of loops are exposed and complete the toolbox for basic/usual processing.

Some algorithms need to update some variables in containers (like a sort algorithm), and as the STL containers are not thread-safe, specific containers must be used. TBB proposes a queue, a vector and an hash map for those purposes. The operations on these containers are not the same as the STL operations. New thread-safe methods are available with some examples.

The following chapter is about memory allocation. TBB exposes its own functions so that there cannot be cache conflicts between two threads. Basic examples show how to use them in overloaded new and delete instructions.

Then, the books gets one step further inside TBB with mutual exclusion. The underlying OS mutex is wrapped, but additional specific mutexes are available. Their specifications is well explained. A small chapter is dedicated to timing in a safe way. It is really useful for task timing (before using Intel tools for thread profiling).

The last important chapter is about the task scheduler. This is the basis of the whole library, and using it directly can generally be avoided, but if you don’t have a choice, TBB allows its direct use. The different choices Intel made when designing it are clearly stated, although some paragraphs are not that easy to understand. This is because the task scheduler is very modular and a lot of things can be done. This chapter talks about every possibility, but don’t forget to state precisely what you need so that your application is as simple as possible.

The next to last chapter echoes the second one. It sums up the different things you have to remember when write parallel algorithms with TBB. Although some points are pragmatic and very basic, they are all sound.

Finally, the last chapter exposes several examples with increasing difficulty (following the book actually). Additional examples are available with the library but these ones are explained and decrypted.

Conclusion

I had trouble at first with this book, because of the mix between C and C++ in the second chapter and then some trouble understanding how parallel_scan could even been sped up. But when I went on reading, I enjoyed it. Sometimes I would have liked additional example inside each chapter, instead of writing the interface of a class and explaining each function. In fact, this book, is a reference manual for TBB, and a good one. My troubles with the code was only at the beginning. At the end, everything is OK.

So if you want to use a more usable C++ thread library, go, get TBB (Open Source) and this book.

2 thoughts on “Book review: Intel Threading Building Blocks: Outfitting C++ for Multi-core Processor Parallelism”

Leave a Reply