Compiling C++ code in memory with clang

This entry is part 1 of 5 in the series Travelling in LLVM land

I have tried to find the proper receipts to compile on the fly C++ code with clang and LLVM. It’s actually not that easy to achieve if you are not targeting LLVM Intermediate Representation, and unfortunately, the code here, working for LLVM 7, may not work for LLVM 8. Or 6.

The pipeline

There are different ways of creating an AST tree and then a module with clang, not all of them equal (the main tools to create AST seem to be geared towards libtooling, not meant to create machine code after). Once we have a module, we can then use LLVM JIT to go from the IR to dynamically loaded code and then to the function call we want.

Let’s just start with all the headers we need for our job. There are just so many!

#include <clang/AST/ASTContext.h>
#include <clang/AST/ASTConsumer.h>
#include <clang/Basic/DiagnosticOptions.h>
#include <clang/Basic/Diagnostic.h>
#include <clang/Basic/FileManager.h>
#include <clang/Basic/FileSystemOptions.h>
#include <clang/Basic/LangOptions.h>
#include <clang/Basic/MemoryBufferCache.h>
#include <clang/Basic/SourceManager.h>
#include <clang/Basic/TargetInfo.h>
#include <clang/CodeGen/CodeGenAction.h>
#include <clang/Frontend/CompilerInstance.h>
#include <clang/Frontend/CompilerInvocation.h>
#include <clang/Frontend/TextDiagnosticPrinter.h>
#include <clang/Lex/HeaderSearch.h>
#include <clang/Lex/HeaderSearchOptions.h>
#include <clang/Lex/Preprocessor.h>
#include <clang/Lex/PreprocessorOptions.h>
#include <clang/Parse/ParseAST.h>
#include <clang/Sema/Sema.h>
 
#include <llvm/InitializePasses.h>
#include <llvm/ExecutionEngine/ExecutionEngine.h>
#include <llvm/ExecutionEngine/MCJIT.h>
#include <llvm/ExecutionEngine/SectionMemoryManager.h>
#include <llvm/IR/DataLayout.h>
#include <llvm/IR/LLVMContext.h>
#include <llvm/IR/PassManager.h>
#include <llvm/Passes/PassBuilder.h>
#include <llvm/Support/MemoryBuffer.h>
#include <llvm/Support/TargetSelect.h>
#include <llvm/Support/raw_ostream.h>

We will also add an init function for LLVM:

  bool LLVMinit = false;
 
  void InitializeLLVM()
  {
    if (LLVMinit)
    {
      return;
    }
    // We have not initialized any pass managers for any device yet.
    // Run the global LLVM pass initialization functions.
    llvm::InitializeNativeTarget();
    llvm::InitializeNativeTargetAsmPrinter();
    llvm::InitializeNativeTargetAsmParser();
 
    auto& Registry = *llvm::PassRegistry::getPassRegistry();
 
    llvm::initializeCore(Registry);
    llvm::initializeScalarOpts(Registry);
    llvm::initializeVectorization(Registry);
    llvm::initializeIPO(Registry);
    llvm::initializeAnalysis(Registry);
    llvm::initializeTransformUtils(Registry);
    llvm::initializeInstCombine(Registry);
    llvm::initializeInstrumentation(Registry);
    llvm::initializeTarget(Registry);
 
    LLVMinit = true;
  }

We can now start building our two pipeline steps, based on clang and then on LLVM.

Caveats in clang

The first caveat with using clang is that it can’t parse a file from memory, even if there are classes to do so. The reason is that in the compiler instance, it checks that the input is a file. If it’s not a file, it breaks.

The second caveat is that it’s a compilation. As such, we need to set the compiler as we would on the command line, with all the relevant flags and include paths. This can be done by letting clang set everything automatically by using a list of arguments as on command line.

Solution for clang

Let’s start by setting up the compiler diagnostics (where every warning and error will be written) and then the compiler instance and its invocation. The invocation is what we will actually use to compile our code.

    InitializeLLVM();
 
    clang::DiagnosticOptions diagnosticOptions;
    std::unique_ptr<clang::TextDiagnosticPrinter> textDiagnosticPrinter = std::make_unique<clang::TextDiagnosticPrinter>(llvm::outs(), &diagnosticOptions);
    std:: unique_ptr <clang::DiagnosticIDs> diagIDs;
 
    std::unique_ptr<clang::DiagnosticsEngine> diagnosticsEngine =
      std::make_unique<clang::DiagnosticsEngine>(diagIDs, &diagnosticOptions, textDiagnosticPrinter.get());
 
    clang::CompilerInstance compilerInstance;
    auto& compilerInvocation = compilerInstance.getInvocation();

Let’s now create our arguments. Here, I’m only setting up the “triple” which is the target platform. But this is where I will add all the include paths later

    std::stringstream ss;
    ss << "-triple=" << llvm::sys::getDefaultTargetTriple();
 
    std::istream_iterator<std::string> begin(ss);
    std::istream_iterator<std::string> end;
    std::istream_iterator<std::string> i = begin;
    std::vector<const char*> itemcstrs;
    std::vector<std::string> itemstrs;
    while (i != end) {
      itemstrs.push_back(*i);
      ++i;
    }
 
    for (unsigned idx = 0; idx < itemstrs.size(); idx++) {
      // note: if itemstrs is modified after this, itemcstrs will be full
      // of invalid pointers! Could make copies, but would have to clean up then...
      itemcstrs.push_back(itemstrs[idx].c_str());
    }
 
    clang::CompilerInvocation::CreateFromArgs(compilerInvocation, itemcstrs.data(), itemcstrs.data() + itemcstrs.size(),
     *diagnosticsEngine.release());

Now that the compiler invocation is done, we can tweak the options.

    auto* languageOptions = compilerInvocation.getLangOpts();
    auto& preprocessorOptions = compilerInvocation.getPreprocessorOpts();
    auto& targetOptions = compilerInvocation.getTargetOpts();
    auto& frontEndOptions = compilerInvocation.getFrontendOpts();
#ifdef DEBUG
    frontEndOptions.ShowStats = true;
#endif
    auto& headerSearchOptions = compilerInvocation.getHeaderSearchOpts();
#ifdef DEBUG
    headerSearchOptions.Verbose = true;
#endif
    auto& codeGenOptions = compilerInvocation.getCodeGenOpts();

And now we are set up to compile our file:

    frontEndOptions.Inputs.clear();
    frontEndOptions.Inputs.push_back(clang::FrontendInputFile(filename, clang::InputKind::CXX));
 
    targetOptions.Triple = llvm::sys::getDefaultTargetTriple();
    compilerInstance.createDiagnostics(textDiagnosticPrinter.get(), false);
 
    llvm::LLVMContext context;
    std::unique_ptr<clang::CodeGenAction> action = std::make_unique<clang::EmitLLVMOnlyAction>(&context);
 
    if (!compilerInstance.ExecuteAction(*action))
    {
    }

Caveats in LLVM

With LLVM, we also have some things to be careful about. The first is the LLVM context we created before needs to stay alive as long as we use anything from this compilation unit. This is important because everything that is generated with the JIT will have to stay alive after this function and registers itself in the context until it is explicitly deleted.

The second issue is the by default, there is no optimization pass done on the IR. We will have to add this manually (I will test the speed-up in a later blog post).

Solution for LLVM

The first step is to get the IR module from the previous action.

    std::unique_ptr<llvm::Module> module = action->takeModule();
    if (!module)
    {
    }

We will now make the different optimization passes. The code to create the passes is rather complicated and creates more stuff that s required, but this is LLVM… and one of the reasons the API keeps on mutating.

    llvm::PassBuilder passBuilder;
    llvm::LoopAnalysisManager loopAnalysisManager(codeGenOptions.DebugPassManager);
    llvm::FunctionAnalysisManager functionAnalysisManager(codeGenOptions.DebugPassManager);
    llvm::CGSCCAnalysisManager cGSCCAnalysisManager(codeGenOptions.DebugPassManager);
    llvm::ModuleAnalysisManager moduleAnalysisManager(codeGenOptions.DebugPassManager);
 
    passBuilder.registerModuleAnalyses(moduleAnalysisManager);
    passBuilder.registerCGSCCAnalyses(cGSCCAnalysisManager);
    passBuilder.registerFunctionAnalyses(functionAnalysisManager);
    passBuilder.registerLoopAnalyses(loopAnalysisManager);
    passBuilder.crossRegisterProxies(loopAnalysisManager, functionAnalysisManager, cGSCCAnalysisManager, moduleAnalysisManager);
 
    llvm::ModulePassManager modulePassManager = passBuilder.buildPerModuleDefaultPipeline(llvm::PassBuilder::OptimizationLevel::O3);
    modulePassManager.run(*module, moduleAnalysisManager);

We can now use the JIT compiler and extract the function we need. Be aware that the engine needs to stay alive as long as you use the function:

    llvm::EngineBuilder builder(std::move(module));
    builder.setMCJITMemoryManager(std::make_unique<llvm::SectionMemoryManager>());
    builder.setOptLevel(llvm::CodeGenOpt::Level::Aggressive);
    auto executionEngine = builder.create();
 
    if (!executionEngine)
    {
    }
 
    return reinterpret_cast<Function>(executionEngine->getFunctionAddress(function));

Conclusion

The code is long and this is just one quick pass at making an optimized JIT. If you need additional stuff like headers, this means more code. But at least you have all the bits that make it possible to compile a C++ file on the fly with clang/LLVM. At least for version 7…

Buy Me a Coffee!
Other Amount:
Your Email Address:
Series NavigationAddress Sanitizer: alternative to valgrind >>

7 thoughts on “Compiling C++ code in memory with clang

  1. you have not mentioned the libraries that needs to be linked for this to work, could you let me know what to include? my scenario is a little bit different than this, i tend to write a simple c++ program that can compile c++ code and generate an executable, you might as well give your opinion if it is possible this way, thanks in advance.

    1. Indeed, I didn’t put the libraries. One of the reasons is that these are different on Linux/OS X or Windows, and by LLVM version. The easiest is to solve dependencies by hand.
      Now based on what you described, if you generate an executable, don’t use this approach, just use the compiler itself, that’s what it is meant to do, and this will be also more consistent across LLVM versions.

      1. Thanks for your suggestion, actually I am trying to create a simple 2D game Dev program, i divided the program into modules for flexibility and ease of use I’ve already got my hands dirty with vulkan API and different asset managing tools and was successful so far, now comes the part where my program parse and compile the generated c++ source files with possible external links using clang. so far i couldn’t find a better way to achieve in-app compilation for c++. i think by avoiding

  2. Libtcc is an in memory C compiler (whose main flaw is that it does no optimization, its output is slightly worse than -O0).

    I made a version of it where include files and library files can be in memory instead of on the disk. I did that by replacing all of the posix file handle variables with a struct – causing compiler errors every time their used, then created equivalents for all of the posix file calls that can both read from memory or read from a file depending on the contents of the struct.

    I also wrote code-generating code that embeds directories of files into arrays of data (originally I made them embedded zip files, but I’m changing that to uncompressed).

    I wonder if a similar approach could be done with clang…

    But the problem I see is that libtcc is designed to sit in memory and be reused – it’s not just garbage after a compile. Is clang written that way?

    1. Well, for clang, it’s a little bit different. If you had an IR representation, then you can use a memory only approach. For a C++ file, you need to know the path to the C++ and system headers, which is a pain (the reason I stopped using this approach), or hack into LLVM/clang which internals keep on changing almost on a daily basis…
      So doable, but not something I would try to do.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.