Book review: Getting Started with LLVM Core Libraries

This entry is part 5 of 5 in the series Travelling in LLVM land

LLVM has always intrigued me. Actually, I always thought about one day writing a compiler. But it was more a challenge than a requirement for any of my works, private or professional, so never dived into it. The design of LLVM was also very well thought, and probably close to something I would have had liked to create.

So now the easiest is just to use LLVM for the different goals I want to achieve. I recently had to write clang-tidy rules, and I also want to perhaps create a JIT for Audio Toolkit and the modeling libraries. So lots of reasons to look at LLVM.

Discussion

The book more or less goes from C/C++ parsing to code generation.

OF course, the first chapters are about setting everything up. The book using Makefiles mainly, which is not an option anymore in current LLVM versions. But it does provide the equivalent CMake version, so it is fine. Also the structure of the projects have not changed, so everything still works. Of course, lots of projects matured also over time (lld, libcxx…), so when you read that something is not yet production ready, check online (if you can find the information, I have to say that LLVM communication is very bad, just look at release notes to get an idea!).

The third chapter tackles LLVM design. That’s what I liked with LLVM, the modular design, but it can also be scary because you can build more or less anything, and the API do evolve with time. But the chapter does reassure me, and helps understanding the philosophy.

Then, at the fourth chapter, we start working through clang pipeline by starting with all the steps between the C/C++ code and LLVM Intermediate Representation. The AST and interaction with it are very well presented with the different stages required to generate the IR. The missing bit may be explaining why the AST is so important to have, why LLVM people had to create a new intermediate representation for this front-end.

The fifth chapter is about everything we can do on the IR. I left the chapter still hungry for more. OK, the IR phases can evolve the graph, but it feels like not enough here. How does the matching actually work? This is where you can see that the book is for beginners and not for intermediate or advanced users. Also it made me realize that there is no way I can generate IR directly for my projects, I would go from a C++ AST to IR to the JIT…

After working on the IR, of course, we get to code generation and the different tools in LLVM to generate either byte code or machine code and everything in-between. Lots of time is devoted to explain that this phase is very costly, as we go from something quite generic to something definitely not generic, and this part was very instructional.

The seventh chapter was strange. It spent lots of time talking about a part of LLVM that was about to be removed from LLVM, the “old” JIT framework. I suppose at the time the new one was too new and some people still had to understand the old one. I still felt it was a little bit a waste of space.

Cross-compilation is tackled after that, and more precisely that you may not require to do anything. This is also where one can see the limit of LLVM. To get the proper backends, you need to get the gcc toolchain. I think this is still something people do today. Even for clang 6, I actually compiled it against a gcc 7 set so that I don’t have to rebuild all the C++ third-party libraries. Also the ARM backend seemed to be broken for a long time, so that’s also not very great for trust!

The last two chapters tackle tools made with clang. The first one is the static analyzer, and I have to say that I didn’t even knew it existed. There are tools with it that allow to generate HTML reports, and I liked that. But when I tried to use them with CMake, they just broke (scan_build). There is chapter about libclang and clang-tidy, which is probably my reference now. Something that wasn’t done in 2014 is that the static analyzer rules are now integrated inside clang-tidy, it’s just that it can’t build HTML reports out of them. Is it really mandatory? It gives a better view of static code issues (whereas the other rules are geared towards sugar-coating).

The book ends very quickly in a small paragraph at the end of the libeling chapter. Very disturbing.

Conclusion

Despite the age of the book and the changes that went inside LLVM (clang-modernize is now part of clang-tidy, DragonEgg is… I don’t know where it went), the book seem to stay very much current (clang is still the main front-end). I would have liked more example on clang AST matchers, but I suppose it requires a full cookbook, and the audience may not be that big. Still, I’m looking forward to use the different bits to write a JIT and C++ output for electronic modeling/SPICE.

Series Navigation<< Writing custom checks for clang-tidy

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.