hype-free: Playing tricks with the Windows PE Loader

For every software there is a specification and there is the implementation. Specifications are rarely exhaustive, thus there remain the corner cases which the developer handles based on her/his believes. As long as the handling methods don't introduce vulnerabilities, one can say that this doesn't make any difference.

The situation becomes much more interesting when the specification becomes public and others start to program based on it and (willingly or unwillingly) get themselves into these corner cases. In this post I will present such a case related to the PE file format and the Windows loaders. Credit goes to my colleagues who always helped with their advice when I got stuck.

The PE file format is documented by Microsoft (sidenote: OpenOffice.org 2.4+ can read .docx files, so I don't need to install Office/WordViewer). What I want to show here is how to create an executable which acts in ways that can be unexpected if one only done a cursory static analysis.

More specifically: I want to create a TLS callback which is "invisible" by static analysis (unless the analyst observes these clues). Why TLS? Because these callbacks are executed before the user-mode debuggers gets attached / have a chance to run. The method I accomplish this is by writing an invalid value to the TLS_DIRECTORY and fixing it during loading with relocations.

First, lets create an executable with a TLS callback. I'm using Visual C to do this, since the structure involved are fairly complicated and didn't want to write them by hand. The code is adapted from the boost library. It should compile with Visual C++ 2005 and later (I've tried with VC 6.0, but I didn't manage to make it work):

#define WIN32_LEAN_AND_MEAN  // Exclude rarely-used stuff from Windows headers
#include <stdio.h>
#include <tchar.h>
#include <windows.h>

int select = 0;

void NTAPI on_tls_callback(void* h, DWORD dwReason, PVOID pv)
{
 select = 1;
}

int _tmain(int argc, _TCHAR* argv[])
{ 
 if (select)
  printf("Foo!\n");
 else
  printf("Bar!\n");
 return 0;
}

extern "C" {
 ULONG __tls_index__ = 0;
#pragma section(".tls$zzz", read, write, execute)
 __declspec(allocate(".tls$zzz"))
 char __tls_end__ = 0;
#pragma section(".tls", read, write, execute)
 __declspec(allocate(".tls"))
 char __tls_start__ = 0;

#pragma section(".CRT$XLA")
 __declspec(allocate(".CRT$XLA"))
 PIMAGE_TLS_CALLBACK __crt_xl_start__ = 0;
#pragma section(".CRT$XLB")
 __declspec(allocate(".CRT$XLB"))
 PIMAGE_TLS_CALLBACK __crt_xl_tls_callback__ = on_tls_callback;
#pragma section(".CRT$XLZ")
 __declspec(allocate(".CRT$XLZ"))
 PIMAGE_TLS_CALLBACK __crt_xl_end__ = 0;
}

#pragma section(".rdata$T", read, write, execute)
__declspec(allocate(".rdata$T"))
extern "C" const IMAGE_TLS_DIRECTORY32 _tls_used =
{
 (DWORD) &__tls_start__,
 (DWORD) &__tls_end__,
 (DWORD) &__tls_index__,
 (DWORD) (&__crt_xl_start__+1),
 (DWORD) 0,
 (DWORD) 0
};

The code needs some explanation. What we are doing here is the following: initialize the "select" variable to 0. Assuming we don't know anything about TLS and so on, the code should print "Bar!". But we define a TLS callback which is executed before the main procedure (in fact it is executed so early that the CRT is not initialized, this is why I'm only setting a variable, not printing to the console / showing message boxes, etc), the value is changed and the program prints "Foo!".

So far so good. Some more details: the section pragma declares an executable section with a a given name. The __declspec token assigns the next variable to be alocated in a given section. Characters after and including the "$" character are not included in the name, so things assigned to section ".foo$a" and ".foo$x" will end up in the same section. The only consequence is that things will get ordered alphabetically based on the suffix (so variables assigned to section ".foo$a" and ".foo$b" will both end up in the ".foo" section, but variables assigned to ".foo$a" will preceed variabiles assigned to ".foo$b" - ordering inside of the same section is not guaranteed). This is used in the code to assure that __tls_start__ preceeds __tls_end__.

An other thing needed before compilation is to enable relocation information. This is called "/FIXED:NO" (no fixed base address) in Visual C++ speak. After compilation we need to ensure that on loading the executable does get relocated. I tried several methods before coming up with the right solution (thanks to a colleague):

Assigning it the base-address of kernel32.dll. This produced a warning that kernel32.dll got relocated (apparently kernel32.dll is loaded after the image).
Assigning it the base-address of kernel32.dll - the executable refused to start.
Finally a colleague suggested to give it a base address in the kernel address space (above 0x80000000). This worked like a charm, except that you can't specify such a base address directly in the Visual C++ project and you have to use rebase.exe to adjust the executable after compilation.

So after compilation and rebase.exe -b 0x87c80000 tlstest2.exe I had an executable with TLS which gets relocated before execution. You can test it - it should load fine and still display "Foo!". Now to modify it. The tools recommended are: IDA 4.9 free, LordPE and a hex editor.

We will be patching the "__crt_xl_start__" in the "_tls_used" structure. It would have been much nicer (read: confusing :-)) to patch the entry in the data directory pointing to this structure, but I didn't manage to get the loader to map it with the right permissions (RW) so that it can be patched. The first step is to find the location of this value in the file. To do this, load up the file in IDA (make sure that you are working with the Debug version of it), let it load the debug symbols and then simply search for "_tls_used" in the name window. Take note of the address (after doing the VA <-> Physical address translation). Overwrite the DWORD with zeroes.

Now we must find out to which address the executable gets actually mapped. This may or may not depend on the system. On my Windows XP SP3 box it gets mapped to 0x10000. You can find it out by running it with a debugger.

Next is to create a relocation entry. Go to the data directory and increase the size of the relocation table by 10 (0xA). This is needed because the relocation entry consists of a two-DWORD header followed by WORD sized entries (and we have only one entry). Now go to the end of the relocation table (you can find this by going to the value calculate based on the next formula: "relocation start" + "original relocation size" ). Write there the following structure: [RVA of the "__crt_xl_start__" on a DWORD][0xA on a DWORD][0x3000 on a WORD]. In my case this looks like:

38 56 01 00 0A 00 00 00 00 30

The first value is 0x15638 (encoded as little endian on a DWORD) followed by 0xA encoded as little endian on a DWORD, followed by 0x3000 (encoded on a WORD / little-endian). Now run the executable in a debugger and look at the memory location representing "__crt_xl_start__". Note the value there. 0x3000 means a "type 3" relocation - it means that at the address specified it will do the calculation: DWORD += [Specified Image Base] - [Actual Image Base] (you can find a detailed description in the MS document linked in the beginning).

Now perform the following calculation: V = [value where the callback routine ends up in the memory - you can find this in a debugger] - [value you have seen __crt_xl_start__]. Make sure that you perform the calculation on 32 bits with underflow. This is necessary because the value gets relocated twice (once by a relocation generated by the linker and once by your relocation), so this is a simple way to calculate the final value which needs to be there. Patch "__crt_xl_start__" with this value (remember, little endian DWORD!) and you should once again see "Foo!".

PS. A further trick (which worked with older versions of IDA) is to specify 0 size for the TLS_DIRECTORY in the DATA_DIRECTORY (instead of the correct 0x18). The TLS got still executed by Windows, but IDA didn't see it.

PS. PS. This trick is actually easily defeated by using the "relocate.exe" program and relocating the executable to the actual address it gets executed at. IDA has a similar option (Edit -> Segments -> Rebase program...), but it doesn't seem to work correctly (maybe the double relocation confuses it?).

Update: Here is a set of post about some TLS quircks:

Update 2: A discussion about changing the TLS table at runtime (an other post about the same topic can be found on OpenRCE).