Compiling C++ code in memory with clang

1 Star2 Stars3 Stars4 Stars5 Stars (No Ratings Yet)
Loading...

I have tried to find the proper receipts to compile on the fly C++ code with clang and LLVM. It’s actually not that easy to achieve if you are not targeting LLVM Intermediate Representation, and unfortunately, the code here, working for LLVM 7, may not work for LLVM 8. Or 6.

The pipeline

There are different ways of creating an AST tree and then a module with clang, not all of them equal (the main tools to create AST seem to be geared towards libtooling, not meant to create machine code after). Once we have a module, we can then use LLVM JIT to go from the IR to dynamically loaded code and then to the function call we want.

Let’s just start with all the headers we need for our job. There are just so many!

#include <clang/AST/ASTContext.h>
#include <clang/AST/ASTConsumer.h>
#include <clang/Basic/DiagnosticOptions.h>
#include <clang/Basic/Diagnostic.h>
#include <clang/Basic/FileManager.h>
#include <clang/Basic/FileSystemOptions.h>
#include <clang/Basic/LangOptions.h>
#include <clang/Basic/MemoryBufferCache.h>
#include <clang/Basic/SourceManager.h>
#include <clang/Basic/TargetInfo.h>
#include <clang/CodeGen/CodeGenAction.h>
#include <clang/Frontend/CompilerInstance.h>
#include <clang/Frontend/CompilerInvocation.h>
#include <clang/Frontend/TextDiagnosticPrinter.h>
#include <clang/Lex/HeaderSearch.h>
#include <clang/Lex/HeaderSearchOptions.h>
#include <clang/Lex/Preprocessor.h>
#include <clang/Lex/PreprocessorOptions.h>
#include <clang/Parse/ParseAST.h>
#include <clang/Sema/Sema.h>
 
#include <llvm/InitializePasses.h>
#include <llvm/ExecutionEngine/ExecutionEngine.h>
#include <llvm/ExecutionEngine/MCJIT.h>
#include <llvm/ExecutionEngine/SectionMemoryManager.h>
#include <llvm/IR/DataLayout.h>
#include <llvm/IR/LLVMContext.h>
#include <llvm/IR/PassManager.h>
#include <llvm/Passes/PassBuilder.h>
#include <llvm/Support/MemoryBuffer.h>
#include <llvm/Support/TargetSelect.h>
#include <llvm/Support/raw_ostream.h>

We will also add an init function for LLVM:

  bool LLVMinit = false;
 
  void InitializeLLVM()
  {
    if (LLVMinit)
    {
      return;
    }
    // We have not initialized any pass managers for any device yet.
    // Run the global LLVM pass initialization functions.
    llvm::InitializeNativeTarget();
    llvm::InitializeNativeTargetAsmPrinter();
    llvm::InitializeNativeTargetAsmParser();
 
    auto& Registry = *llvm::PassRegistry::getPassRegistry();
 
    llvm::initializeCore(Registry);
    llvm::initializeScalarOpts(Registry);
    llvm::initializeVectorization(Registry);
    llvm::initializeIPO(Registry);
    llvm::initializeAnalysis(Registry);
    llvm::initializeTransformUtils(Registry);
    llvm::initializeInstCombine(Registry);
    llvm::initializeInstrumentation(Registry);
    llvm::initializeTarget(Registry);
 
    LLVMinit = true;
  }

We can now start building our two pipeline steps, based on clang and then on LLVM.

Caveats in clang

The first caveat with using clang is that it can’t parse a file from memory, even if there are classes to do so. The reason is that in the compiler instance, it checks that the input is a file. If it’s not a file, it breaks.

The second caveat is that it’s a compilation. As such, we need to set the compiler as we would on the command line, with all the relevant flags and include paths. This can be done by letting clang set everything automatically by using a list of arguments as on command line.

Solution for clang

Let’s start by setting up the compiler diagnostics (where every warning and error will be written) and then the compiler instance and its invocation. The invocation is what we will actually use to compile our code.

    InitializeLLVM();
 
    clang::DiagnosticOptions diagnosticOptions;
    std::unique_ptr<clang::TextDiagnosticPrinter> textDiagnosticPrinter = std::make_unique<clang::TextDiagnosticPrinter>(llvm::outs(), &diagnosticOptions);
    std:: unique_ptr <clang::DiagnosticIDs> diagIDs;
 
    std::unique_ptr<clang::DiagnosticsEngine> diagnosticsEngine =
      std::make_unique<clang::DiagnosticsEngine>(diagIDs, &diagnosticOptions, textDiagnosticPrinter.get());
 
    clang::CompilerInstance compilerInstance;
    auto& compilerInvocation = compilerInstance.getInvocation();

Let’s now create our arguments. Here, I’m only setting up the “triple” which is the target platform. But this is where I will add all the include paths later

    std::stringstream ss;
    ss << "-triple=" << llvm::sys::getDefaultTargetTriple();
 
    std::istream_iterator<std::string> begin(ss);
    std::istream_iterator<std::string> end;
    std::istream_iterator<std::string> i = begin;
    std::vector<const char*> itemcstrs;
    std::vector<std::string> itemstrs;
    while (i != end) {
      itemstrs.push_back(*i);
      ++i;
    }
 
    for (unsigned idx = 0; idx < itemstrs.size(); idx++) {
      // note: if itemstrs is modified after this, itemcstrs will be full
      // of invalid pointers! Could make copies, but would have to clean up then...
      itemcstrs.push_back(itemstrs[idx].c_str());
    }
 
    clang::CompilerInvocation::CreateFromArgs(compilerInvocation, itemcstrs.data(), itemcstrs.data() + itemcstrs.size(),
     *diagnosticsEngine.release());

Now that the compiler invocation is done, we can tweak the options.

    auto* languageOptions = compilerInvocation.getLangOpts();
    auto& preprocessorOptions = compilerInvocation.getPreprocessorOpts();
    auto& targetOptions = compilerInvocation.getTargetOpts();
    auto& frontEndOptions = compilerInvocation.getFrontendOpts();
#ifdef DEBUG
    frontEndOptions.ShowStats = true;
#endif
    auto& headerSearchOptions = compilerInvocation.getHeaderSearchOpts();
#ifdef DEBUG
    headerSearchOptions.Verbose = true;
#endif
    auto& codeGenOptions = compilerInvocation.getCodeGenOpts();

And now we are set up to compile our file:

    frontEndOptions.Inputs.clear();
    frontEndOptions.Inputs.push_back(clang::FrontendInputFile(filename, clang::InputKind::CXX));
 
    targetOptions.Triple = llvm::sys::getDefaultTargetTriple();
    compilerInstance.createDiagnostics(textDiagnosticPrinter.get(), false);
 
    llvm::LLVMContext context;
    std::unique_ptr<clang::CodeGenAction> action = std::make_unique<clang::EmitLLVMOnlyAction>(&context);
 
    if (!compilerInstance.ExecuteAction(*action))
    {
    }

Caveats in LLVM

With LLVM, we also have some things to be careful about. The first is the LLVM context we created before needs to stay alive as long as we use anything from this compilation unit. This is important because everything that is generated with the JIT will have to stay alive after this function and registers itself in the context until it is explicitly deleted.

The second issue is the by default, there is no optimization pass done on the IR. We will have to add this manually (I will test the speed-up in a later blog post).

Solution for LLVM

The first step is to get the IR module from the previous action.

    std::unique_ptr<llvm::Module> module = action->takeModule();
    if (!module)
    {
    }

We will now make the different optimization passes. The code to create the passes is rather complicated and creates more stuff that s required, but this is LLVM… and one of the reasons the API keeps on mutating.

    llvm::PassBuilder passBuilder;
    llvm::LoopAnalysisManager loopAnalysisManager(codeGenOptions.DebugPassManager);
    llvm::FunctionAnalysisManager functionAnalysisManager(codeGenOptions.DebugPassManager);
    llvm::CGSCCAnalysisManager cGSCCAnalysisManager(codeGenOptions.DebugPassManager);
    llvm::ModuleAnalysisManager moduleAnalysisManager(codeGenOptions.DebugPassManager);
 
    passBuilder.registerModuleAnalyses(moduleAnalysisManager);
    passBuilder.registerCGSCCAnalyses(cGSCCAnalysisManager);
    passBuilder.registerFunctionAnalyses(functionAnalysisManager);
    passBuilder.registerLoopAnalyses(loopAnalysisManager);
    passBuilder.crossRegisterProxies(loopAnalysisManager, functionAnalysisManager, cGSCCAnalysisManager, moduleAnalysisManager);
 
    llvm::ModulePassManager modulePassManager = passBuilder.buildPerModuleDefaultPipeline(llvm::PassBuilder::OptimizationLevel::O3);
    modulePassManager.run(*module, moduleAnalysisManager);

We can now use the JIT compiler and extract the function we need. Be aware that the engine needs to stay alive as long as you use the function:

    llvm::EngineBuilder builder(std::move(module));
    builder.setMCJITMemoryManager(std::make_unique<llvm::SectionMemoryManager>());
    builder.setOptLevel(llvm::CodeGenOpt::Level::Aggressive);
    auto executionEngine = builder.create();
 
    if (!executionEngine)
    {
    }
 
    return reinterpret_cast<Function>(executionEngine->getFunctionAddress(function));

Conclusion

The code is long and this is just one quick pass at making an optimized JIT. If you need additional stuff like headers, this means more code. But at least you have all the bits that make it possible to compile a C++ file on the fly with clang/LLVM. At least for version 7…

Buy Me a Coffee!
Other Amount:
Your Email Address:
Series NavigationAddress Sanitizer: alternative to valgrind >>

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.