If you have been in trouble when implementing your new, fresh, reflective loader, raise your hand!

Well, after a thousand crashes, I want to write down some simple suggestions that could literally save you days of debugging.

The reflective loader

Generally speaking, a reflective DLL cannot be debugged with common tools and enough comfort. In fact, a reflective DLL has a custom loader inside that does the underlying mechanisms to load the DLL in memory. Basically, it loads it-self.

Note: a reflective loader cannot leverage Windows APIs or external functions because the DLL is not loaded yet. Therefore, a loader contains only PIC (Position-Independent Code) that is accomplished through particular language syntax and compiler tricks. I just want to say: maybe the problem resides in our compiler/linker flags :).

Since there’s no LoadLibrary API or any load event that can be intercepted, common debugging won’t work. You can’t just set a breakpoint on Visual Studio, press Run and debug. You have to leverage some assembly debugger like IDA or xdbg, allowing you to understand deep enough the cause of a crash, steps many instruction back and play with memory.

You have to learn assembly.

How to debug

We’ll take IDA as an example.

Using an injector - the “standard” way

Basically, you have to do two things to break a reflective DLL after it is loaded:

  • Run the injector and stop at the CreateThread or the equivalent mechanism that will run the injected DLL
  • Open a new window, attach to the process in which the DLL is injected and search
    • search what? the export that loads the DLL! Now, put a breakpoint on the start of it

From the 1st window, continue execution. You eventually hit the DLL export.

Using rundll32

Ok, in this way you only have to open the DLL under IDA and set process parameters accordingly:

  • application: rundll32.exe
  • params: <path/to/file.dll>,"ExportName"

In this case you do not lose access to the debug symbols, resulting in a better debug experience.

But there’s one thing you have to remember. rundll32 takes a DLL and an export as a comma-separated arguments. It will always run DllMain before your export! And that’s not what you want, since the export is actually used to load the DLL in memory.

You can bypass this by

  • putting a breakpoint on DllMain
  • when DllMain is hit, change EIP to reflective loader export
  • continue from that

Ok, and now?

Basically, a reflective loader does these operations:

  • Retrieves pointers to Windows API through already-loaded DLLs inside the target process. This is accomplished through the PEB:
    • ntdll for example, allows to get the pointer to VirtualAlloc and other useful functions
  • Retrieves the base address from where it has been loaded
    • looking for MZ signature
  • Map the PE in memory
    • allocate a region of size = SizeOfImage
    • memcpy all the PE structures in it
  • Fix PE
    • import address table (loading all the declared import dlls)
    • relocations
    • fix memory permissions of each section
  • Other
    • setup exception handlers through RtlAddFunctionTable
    • execute TLS callbacks
  • Finally
    • flush instruction cache
    • dll is correctly mapped and fixed, execute DllMain

From these operations, you can extract some useful conditions or APIs to which setup a breakpoint

  • VirtualAlloc (mapping the DLL)
  • VirtualProtect (mem permissions)
  • hardware breakpoint on access on gs:0x60 or fs:0x30 (on x86) (PEB)

Addressing errors

Memory cannot be executed

Imports

This is often due to some windows APIs invoked directly inside the reflective loader code. For example, memset, memcpy but also Win32 APIs. memset and memcpy refers to the C runtime located inside the ucrt.dll or msvcrt.dll (it depends which CRT you used). In normal cases, when you call a memset the assembly instruction will be like: call __imp_memset where __imp_memset is the address of the IAT that eventually leads to the address of the memset function located inside the C runtime DLL. Anyway, in normal cases this DLL is already loaded before the call ;)

In this case, it is enough to reimplement those function with custom defined ones inside your code, like:

void* __cdecl memset(void* pTarget, int value, size_t cbTarget) {
	unsigned char* p = (unsigned char*)pTarget;
	while (cbTarget-- > 0) {
		*p++ = (unsigned char)value;
	}
	return pTarget;
}
void* __cdecl memcpy(void* pDestination, void* pSource, size_t sLength) {
 
	PBYTE D = (PBYTE)pDestination;
	PBYTE S = (PBYTE)pSource;
 
	while (sLength--)
		*D++ = *S++;
 
	return pDestination;
}

Also, some language syntax leads to a generation of memset instruction without you explicitly doing it! Let’s look at this code

struct MY_STRUCT var = { 0 };

In assembly, it translates to this:

mov rcx, <addr of var>
mov rdx, 0
mov r8, <size_of_struct>
call memset

You can read the issue here.

So be careful on what you’re writing and what language you’re using!

This happens in a normal program compilation and linking. To disable memset generation, some compilers like gcc provides an option to do that (read here) but you’ll have to implement on your own anyway. We can also avoid linking against the C runtime, but that disables a lot of other functions and utilities and goes out of the scope of this article.

Switch cases

But hey, let’s complicate this! When you have a switch statement inside your code, the compiler may generate jump tables for each case. Those jump tables are static offsets that contains the code connected to the matching case, and they are generated for performance reasons. Yeah, you’ve heard it well - static. Avoid switch cases as much as possible when you’re writing reflective loader code.

Note: GCC allows you to avoid generating jump tables using the option -fno-jump-tables. Oh, you’re asking for MSVC? haha - no option. Don’t you dare asking anything else, you m0r##n.

Memory cannot be read

Ok this look interesting as well since it doesn’t always happen. In the best case you didn’t set DLL permissions properly, or you forgot the code that does it.

In the worst case, there are many situations in which for some reason (e.g., wrong pointer dereference, wrong struct size, …) the code access some part of the memory that is - in some way - invalid. I’m thinking to those instructions:

... some operations (bug: rcx = 0)
mov rax, rcx
mov rdx, [rax] ; bam! MEMORY_ACCESS_VIOLATION

Static strings

Strings will be defined in their own data section (check here). But hey, the section has not been mapped yet, so you’ll probably get an ACCESS_VIOLATION error when accessing a string! Be sure to not have them inside the loader. If you really want to use them, you can check stackstrings.

Global variables

Nothing to say here, same logic of strings applies here. be sure that you do not use them inside the loader!

MZ false positives

I share my experience and this opinion: always debug from the original injector! We’ll get to the point later.

The error for me resided in the code that gets the injected base address. This was more or less like this:

address = <address of the function>
do {
	img_dos_hdr = address
	if img_dos_hdr->e_magic == 'MZ' { // 0x5A4D
		img_nt_header = img_dos_hdr->e_lfanew
		if (img_nt_header == 'PE') { // 0x4550
			// ok! found it!
			break;
		}
	}
	address--;
}

Line that crashed:

img_nt_header = img_dos_hdr->e_lfanew

It didn’t crash without stdlib linked.

Why? PEs can have false positives inside their memory

Let’s check for MZ signature inside HxD. First: Ok this is the signature.

Second:

Mmm.. no, this doesn’t look like the signature.

Third: Neither does this.

So, we have to filter out many false positives that can come when we are scrolling up memory searching our base address!

One way to solve this is to find the famous This program cannot be run in DOS mode. This (weak) code shows us how to do it:

int* msdos_stub = (int*)(uTmpAddress + 0x4E); // addr of 'This program ...'
if (*msdos_stub != 0x73696854) { // hex of 'This'
	address--;
	continue;
}	

Without stdlib linked I didn’t have other runtime code inside my PE that brought many MZ false positives… The error didn’t show also when loading the DLL using rundll32.

I don’t know why! So, be careful to isolate the problem and writing down the context (i.e., memory, target process) because this can change the behavior of your loader! And finally, because of this, always debug the DLL from its injector!

I hope any of these recommendations could be useful for you, have a nice day!


See you in the next research!