What is CLANG?

From Clang project:

The Clang project provides a language front-end and tooling infrastructure for languages in the C language family (C, C++, Objective C/C++, OpenCL, and CUDA) for the LLVM project. Both a GCC-compatible compiler driver (clang) and an MSVC-compatible compiler driver (clang-cl.exe) are provided. You can get and build the source today.

We’ll check it later! Now a bit of context.

Goal

My goal is to

  • cross-compile LinuxWindows application that are Windows-specific and use Windows internals/APIs.
  • produced PE must match as much as possible native ones (e.g., like MSVC compiled PEs)
  • package manager support (i.e., being able to install deps with vcpkg)
  • Support C, C++, asm

I don’t want to bloat my linux machine with Wine and cl.exe, that’s one of the reasons I chose Clang.

An evergreen solution: MinGW

Mingw is a good alternative for cross-compilation. MinGW comes with a Windows-style infrastructure (SDK, VC tools, libs, headers, ..) that matches the native one. It uses the GCC compiler.

Pros

  • Much easier to setup than clang
  • GCC is a standard
  • Easy to make code portable

Cons

  • You have to use GCC’s flags and sometimes there aren’t flags for specific program behaviour matching MSVC ones
  • PEs are not the same (or very similar) as the MSVC native ones

TLS Callbacks

But here’s one big con (for me). Produced PEs are not the same as the MSVC-produced ones. Even though many users won’t notice the different for standard Windows programs, there are some implications if you build malicious applications with MinGW.

The CRT (C Runtime) infrastructure code is different: GCC normally produces a TLS data directory filled with TLS callbacks that you can’t disable if you disable stdlib. If you get stdlib, you get TLS callbacks. From here:

TLS (Thread Local Storage) callbacks are a mechanism in Windows that allows a program to define a function that will be called when a thread is created. These callbacks can be used to perform various tasks, such as initializing thread-specific data or modifying the behavior of the thread.

Such tasks can happen when a new thread starts or ends (tls_init, tls_destruct).

Here I highlighted the TLS data inside a PE built with GCC/MinGW: The compiling flow is this:

  • Program is compiled: gcc foo.c -o foo.exe --someflag
  • Without -nostdlib, initialization code and standard libraries are automatically inserted by the compiler to allow the creation of a proper PE. For example, from the code of tlssup.c of CRT there’s the allocation of the TLS data section (there are other part doing remaining tasks):
#pragma data_seg(".tls")
 
#if defined (_M_IA64) || defined (_M_AMD64)
_CRTALLOC(".tls")
#endif
char _tls_start = 0;
 
#pragma data_seg(".tls$ZZZ")
 
#if defined (_M_IA64) || defined (_M_AMD64)
_CRTALLOC(".tls$ZZZ")
#endif
char _tls_end = 0;
  • CRT is responsible for creating sections including the TLS one, and registering the tls callbacks the .text section so that when a new thread starts it can access program’s code and execute them at start/destruct-time.
  • gcc eventually invokes the underlying linker ld. If not overridden, linker uses a default script to build foo PE structure
  • ld defines a global __xl_c symbol that points to __dyn_tls_init (DWARF name) TLS init callback
  • ld defines a global __xl_d symbol that points to __dyn_tls_dtor (DWARF name) TLS destructor callback Important: these symbols are defined inside .rdata section, so they cannot be modified in a normal configuration

Linker then fills symbols pointing to the defined callbacks. Here’s an extract of the standard linker script used by ld.

.CRT BLOCK(__section_alignment__) :
  {
    ___crt_xc_start__ = . ;
    KEEP (*(SORT(.CRT$XC*)))  /* C initialization */
    ___crt_xc_end__ = . ;
    ___crt_xi_start__ = . ;
    KEEP (*(SORT(.CRT$XI*)))  /* C++ initialization */
    ___crt_xi_end__ = . ;
    ___crt_xl_start__ = . ;
    KEEP (*(SORT(.CRT$XL*)))  /* TLS callbacks */
    /* ___crt_xl_end__ is defined in the TLS Directory support code */
    ___crt_xp_start__ = . ;
    KEEP (*(SORT(.CRT$XP*)))  /* Pre-termination */
    ___crt_xp_end__ = . ;
    ___crt_xt_start__ = . ;
    KEEP (*(SORT(.CRT$XT*)))  /* Termination */
    ___crt_xt_end__ = . ;
  }
  /* Windows TLS expects .tls$AAA to be at the start and .tls$ZZZ to be
     at the end of the .tls section.  This is important because _tls_start MUST
     be at the beginning of the section to enable SECREL32 relocations with TLS
     data.  */
  .tls BLOCK(__section_alignment__) :
  {
    ___tls_start__ = . ;
    KEEP (*(.tls$AAA))
    KEEP (*(.tls))
    KEEP (*(.tls$))
    KEEP (*(SORT(.tls$*)))
    KEEP (*(.tls$ZZZ))
    ___tls_end__ = . ;
  }

At runtime, this happens when a new thread spawns:

In normal conditions, this is good and works properly. The new thread sets its thread-local storage and continues execution. But there’s a problem. What if the thread cannot access My program thread ? You’re asking… why?

There are some scenarios like malware development in which PE may be tweaked at runtime to evade some defense mechanisms thanks to some Windows APIs. That’s the case for memory obfuscation!

Some implants implement memory obfuscation techniques in which they hide from memory scanners at run-time by encrypting their stack (and optionally, their heap) keeping their malicious data safe from host defenses. Usually is implemented as a sleep cycle, in which there are some objects (Timers) that performs these action in sequence:

  1. Call VirtualProtect to set RW permission on the image base
  2. Encrypt the image base
  3. Sleep n seconds
  4. Decrypt image base
  5. Call VirtualProtect to reset RX permissions

In this way, we never have a RWX section in any time, avoiding to raise suspicious flags. Memory scanners will access our code trying to identify some patterns, but finding nothing since it’s encrypted. Curiosity: how do this code run after the VirtualProtect RW? These functions are executed inside timers and so in the context of the timers, which are located in some DLL module and thread.

What happens if a new system thread is created when my main thread is sleeping encrypted?

Yeah, a memory access violation crashes the program because in that moment the memory could not be executed. Screen from IDA (highlighted in blue: RW regions):

My brain was like:

Now I have a broken executable! And the hilarious thing is that it doesn’t happen immediately after process start, but after ~6/7 minutes so it complicates everything (and also was hard to identify, since at the start I thought “hey it’s working, not a problem will happen after 7 minutes for sure!”).

And also I’m frustrated because I thought MinGW was enough for cross-compilation.

Possible solutions:

  1. Sleeping for 10 minutes before doing memory obfuscation. No. It sucks. I don’t want this.
  2. Tweaking the PE header to disable in some way TLS callback registering. I didn’t manage to do it since some symbols are not writeable and also the code responsible for the initialization and registering of the callback is located inside the CRT. There are some other CRT projects I can use on Github (miniCRT etc.), but I don’t want to have unstable code. CRT has been thoroughly tested and it is well-maintained. Code has to run for an indefinite period of time, stability is my first requirement.
  3. Disable stdlib using -nostdlib option. Even though it works (no CRT and no TLS callbacks, yeah), I have to reimplement every utility function and also some C++ stdlib mechanisms (like operator new and operator delete). I’m lazy. Maybe in the future, but not now.
  4. Move malicious code in a specific section, and obfuscate only this section, allowing the remaining code (and so the TLS callbacks to be executed). This can be accomplished by defining new sections and a custom linker script but it’s not the best solution. PE will appear with custom sections and it’s not the best opsec maneuver. Again, I’m lazy. Also, I never did it so it requires some trial and error playing with the sections.
  5. Use. A. Different. Compiler. Here comes CLANG!

The solution: Clang

We can leverage the llvm infrastructure and the clang driver for MSVC targets named clang-cl to write almost-native Windows programs. And even though I’m gonna miss gcc flags, this is better. We’ll use the clang-cl driver that targets Windows application inside non-Windows hosts.

Clang allows us also to build for Windows ARM architecture, that is currently not supported for MinGW unless you use a dedicated LLVM MinGW toolchain available here . In addition, we can take advantage of LLVM engine to employ some instrumentation mechanisms such as performance tests, address sanitizer, undefined behaviour sanitizer etc.

Installation

On Ubuntu, just:

  • sudo apt install llvm clang-{VERSION}
  • I installed clang-18

This is not enough. We need

  • Windows SDK (check here)
  • CRT (check here)
  • Specify to the llvm compiler lld-link flags to include the headers and the .lib files
  • Tweak vcpkg toolchain i.e. building a custom triplet (check here)

Here’s a good setup toolchain for who wants to start:

I quickly realized this was a burden. Also online a few people had similar problems, no a single solution so far. I managed with xwin to install and link correctly the libraries, but the main problem for me was to install a dependency using vcpkg. For some reason that dependency failed to build, and it was hard to propagate all the environment to setup across the vcpkg toolchains and the dependency one. Eventually, I came across this repos:


Init credits

Shoutout for this user to have setup an incredible working environment for who wants to cross-compile from Linux to Windows and keep a package management system!

End credits


I found his docker images on DockerHub (you can choose many tags here). There’s also an OSX cross-compiling docker here — very interesting!

So, I docker pull’ed one of the images tagged and entered inside: docker exec -it --rm --entrypoint /bin/bash <img-name>

All the LLVM suite is inside along with Windows SDK and symbolic links. The instructions to compile using also vcpkg are on his readme on GitHub.

Just a note for vcpkg: it is possible that cmake cannot find the .cmake file of your dependency on find_package, so if you installed a dependency and then ran CMake an error finding the package could appear. This is probably due to the custom triplet definition (it just looks for standard triplet folders, not custom one). I solved it specifying -DRAPIDJSON_DIR=, whatever your dependency is (it’s an hint that appears with the error).

Adding MASM support

Last goal was to setup MASM support to build .asm files. MASM is already installed in the image, infact llvm uses llvm-ml as a replacement of the ml assembly compiler of Windows.

Important: CMake will search for ml inside the system when compiling, so be sure to add the symbolic link to llvm-ml:

ln -s /usr/bin/llvm-ml /usr/bin/ml

Here’s an example CMakeLists.txt file to build assembly with MASM:

cmake_minimum_required(VERSION 3.5)
 
project(example VERSION 1.0.0 LANGUAGES ASM_MASM)
 
set(CMAKE_COLOR_MAKEFILE ON)
set(CMAKE_VERBOSE_MAKEFILE OFF)
 
 
add_library(asm-test OBJECT test.asm)

It seems enough to build an example asm object file. What if you want to specify to build x64 or x86 instead? ml comes with a -m option that takes this 2 options as parameters:

  • x86
  • x64

So, it was enough to specify the option

target_compile_options(asm-test PRIVATE -m86)

or for x64

target_compile_options(asm-test PRIVATE -m64)

Curiosity: to specify compiler flags using gcc style we can append them by invoking clang-cl with the option /clang:<option>. For me it was important to add rdrnd instructions to avoid linker errors, by specifying /clang:-mrdrnd as I would done with gcc. I like this a lot.

If you want to build a simple executable with an asm file, you have to instruct the linker:

target_link_options(asm-test PRIVATE /subsystem:console)

You can also set the RC compiler:

set(CMAKE_RC_COMPILE_OBJECT        "$ENV{RC} <DEFINES> <INCLUDES> <FLAGS> /r /Fo<OBJECT> <SOURCE>")

Complete example:

cmake_minimum_required(VERSION 3.5)
 
project(example VERSION 1.0.0 LANGUAGES C ASM_MASM)
 
set(CMAKE_COLOR_MAKEFILE ON)
set(CMAKE_VERBOSE_MAKEFILE OFF)
 
 
set(CMAKE_ASM_MASM_LINK_EXECUTABLE "<CMAKE_LINKER> <FLAGS> <CMAKE_ASM_MASM_LINK_FLAGS> <LINK_FLAGS> <OBJECTS> /OUT:<TARGET> <LINK_LIBRARIES>")
set(CMAKE_RC_COMPILE_OBJECT        "$ENV{RC} <DEFINES> <INCLUDES> <FLAGS> /r /Fo<OBJECT> <SOURCE>")
 
message(CMAKE_C_FLAGS=${CMAKE_C_FLAGS})
message(CMAKE_CXX_FLAGS=${CMAKE_CXX_FLAGS})
message(CMAKE_ASM_FLAGS=${CMAKE_ASM_FLAGS})
 
 
add_executable(asm-test test.asm)
target_compile_options(asm-test PRIVATE -m64)
target_link_options(asm-test PRIVATE /subsystem:console /entry:myentrypoint)
 

Hope that helped, and see you in the next research!