Before Windows NT 4.0, the graphical part of the Windows subsystem was implemented completely in userland. Starting from NT 4.0 Microsoft decided to move a large part of the Window Manager and the Graphics Device Interface to kernel-mode in the Win32k.sys component. However, part of the implementation is still present in userland and the kernel component needs to call back user-mode code. To do so, Microsoft implemented a ‘reverse’ system call, allowing the kernel to call userland code. The whole process has already been discussed and explained in previous articles so we will not detail it again. Please refer to Tarjei Mandt white paper that contains a comprehensive description of the mechanism. In this post, we detail how Windows (from Windows XP to Windows 8) uses this mechanism to load modules in running processes. Understanding the mechanism may allow you to use it for your own purposes, in particular as a way to inject custom DLLs in processes while running code in the kernel portion of the Windows operating system. An article published by @zer0mem used the ‘reverse’ system call mechanism to execute code in user-mode from kernel code. This post offers an alternative approach if you are free to drop a Windows binary file on the file system.

 

User-mode callbacks walkthrough

General mechanism

The function that enables to call user code from kernel is located inside the Windows kernel and is an exported function named KeUserModeCallback. The prototype of KeUserModeCallback is

The initialization takes place in the function user32!UserClientDllInitialize (the entry point of the user32 DLL) and basically makes the KernelCallbackTable field point to the non-exported user32!apfnDispatch symbol.

NTSTATUS KeUserModeCallback (
IN ULONG ApiNumber,
IN PVOID InputBuffer,
IN ULONG InputLength,
OUT PVOID *OutputBuffer,
OUT PULONG OutputLength
);

Since this function is not present in any WDK header, you have to retrieve it dynamically with the help of an MmGetSystemRoutineAddress call.

What this function basically does is copying these parameters onto the userland stack and returning back to userland code (in ntdll!KiUserCallbackDispatcher function).

Callable functions are identified by the ApiNumber parameter. This is a zero-based index in an array accessible through the KernelCallbackTable field of the Process Environment Block.

This field is initialized when the user32 module is loaded in the process (before initialization, the field is NULL). The initialization takes place in the function user32!UserClientDllInitialize (the entry point of the user32 DLL) and basically makes the KernelCallbackTable field point to the non-exported user32!apfnDispatch symbol.

kd> dt nt!_PEB @$peb
+0x000 InheritedAddressSpace : 0 ''
+0x001 ReadImageFileExecOptions : 0 ''
+0x002 BeingDebugged : 0 ''
+0x003 SpareBool : 0 ''
+0x004 Mutant : 0xffffffff Void
+0x008 ImageBaseAddress : 0x00400000 Void
+0x00c Ldr : 0x00251e90 _PEB_LDR_DATA
+0x010 ProcessParameters : 0x00020000 _RTL_USER_PROCESS_PARAMETERS
+0x014 SubSystemData : (null)
+0x018 ProcessHeap : 0x00150000 Void
+0x01c FastPebLock : 0x7c990620 _RTL_CRITICAL_SECTION
+0x020 FastPebLockRoutine : 0x7c911000 Void
+0x024 FastPebUnlockRoutine : 0x7c9110e0 Void
+0x028 EnvironmentUpdateCount : 1
+0x02c KernelCallbackTable : 0x7e392970 Void
+0x030 SystemReserved : [1] 0
+0x034 AtlThunkSListPtr32 : 0
...


This table contains function pointers to various userland callable functions, all of them located in the user32 module. The contents (thus the index of the functions) and the length of the table depend on the operating system version.

Here is an example (truncated) displaying a function table in the Windows XP SP3 32 bits process:

kd> dps 0x7e392970 L0n98
7e392970 7e3a7f3c USER32!__fnCOPYDATA
7e392974 7e3d87b3 USER32!__fnCOPYGLOBALDATA
...
7e392a38 7e3d8eb9 USER32!__ClientCopyDDEIn1
7e392a3c 7e3d8efb USER32!__ClientCopyDDEIn2
7e392a40 7e3d8f5e USER32!__ClientCopyDDEOut1
7e392a44 7e3d8f2d USER32!__ClientCopyDDEOut2
7e392a48 7e3aeb09 USER32!__ClientCopyImage
7e392a4c 7e3d8f92 USER32!__ClientEventCallback
7e392a50 7e3b19f6 USER32!__ClientFindMnemChar
7e392a54 7e3a28f3 USER32!__ClientFontSweep
7e392a58 7e3d8e4c USER32!__ClientFreeDDEHandle
7e392a5c 7e3a82ff USER32!__ClientFreeLibrary
7e392a60 7e39f4b2 USER32!__ClientGetCharsetInfo
7e392a64 7e3d8e83 USER32!__ClientGetDDEFlags
7e392a68 7e3d8fdc USER32!__ClientGetDDEHookData
7e392a6c 7e3cf9f5 USER32!__ClientGetListboxString
7e392a70 7e39ec46 USER32!__ClientGetMessageMPH
7e392a74 7e3a16eb USER32!__ClientLoadImage
7e392a78 7e3a8023 USER32!__ClientLoadLibrary
7e392a7c 7e3aec03 USER32!__ClientLoadMenu
7e392a80 7e39ee0d USER32!__ClientLoadLocalT1Fonts
7e392a84 7e3a09e4 USER32!__ClientLoadRemoteT1Fonts
7e392a88 7e3d907b USER32!__ClientPSMTextOut
7e392a8c 7e3d90d1 USER32!__ClientLpkDrawTextEx
7e392a90 7e3d9135 USER32!__ClientExtTextOutW
7e392a94 7e3d919a USER32!__ClientGetTextExtentPointW
7e392a98 7e3d9019 USER32!__ClientCharToWchar
7e392a9c 7e39ed14 USER32!__ClientAddFontResourceW
7e392aa0 7e39a13e USER32!__ClientThreadSetup
7e392aa4 7e3d9253 USER32!__ClientDeliverUserApc
7e392aa8 7e3d91f1 USER32!__ClientNoMemoryPopup
7e392aac 7e3aa740 USER32!__ClientMonitorEnumProc
7e392ab0 7e3d944a USER32!__ClientCallWinEventProc
7e392ab4 7e3d8e15 USER32!__ClientWaitMessageExMPH
7e392ab8 7e3acf8e USER32!__ClientWOWGetProcModule
7e392abc 7e3d948d USER32!__ClientWOWTask16SchedNotify
7e392ac0 7e3d9266 USER32!__ClientImmLoadLayout
7e392ac4 7e3d92c2 USER32!__ClientImmProcessKey
...
7e392af4 7e3d950c USER32!__fnOUTLPSCROLLBARINFO

Conditions for calling KeUserModeCallback

Before calling KeUserModeCallback, you must first check the KernelCallbackTable field of the Process Environment Block is not NULL (KeUserModeCallback will not do this for you). This field is at offset 0x2c on a 32-bit system and 0x58 on a 64-bit system (from Windows XP to Windows 8). Omitting to do so will eventually lead to a BSOD.

On Windows XP, the operating system does not place any condition on the state of the current thread for calling the KeUserModeCallback function, so it is safe calling the function whenever you want.

Starting from Windows Vista, things are different. Indeed, the KeUserModeCallback function checks for the presence of the CallOutActive flag in the Flags field of the current _KTHREAD structure (this field is set at least by the nt!KeExpandKernelStackAndCalloutEx function). If present, the operating system issues a bugcheck with a 0x107 undocumented code.

On Windows 8, the Microsoft developers added even more constraints to allow the call to succeed.

The first check performed by Windows 8 is ensuring the current thread runs at PASSIVE_LEVEL. If not, the operating system issues a bugcheck with code 0x4A (IRQL_GT_ZERO_AT_SYSTEM_SERVICE).

Then, the operating system checks if APCs are enabled. If not, the operating system issues a bugcheck with code 1 (APC_INDEX_MISMATCH).

Finally, the operating system checks the CallbackNestingLevel field of the current thread. If this value reaches 32, the function fails with a code equals to 0xC00000FD (STATUS_STACK_OVERFLOW). This field is set by KeUserModeCallback to record the number of nested calls to user-mode callbacks.

 

User-mode callback for loading a library

Among the interesting functions, we can notice the user32!__ClientLoadLibrary function pointer.

This functionality is natively used by win32k.sys to inject the uxtheme.dll in running processes, allowing the operating system to apply visual styles to applications.

This operation is twofold. First, it effectively loads the module in the process memory, as if it were loaded by userland code. Then, a function called ThemeInitApiHook is invoked giving uxtheme.dll a chance to provide alternate implementations for various functions used by user32. We will not dive into the details of how this initialization function is called and what the patched functions are used for. We will just try to describe the parameters needed to load a module without calling any specific initialization function.

Function index

The first parameter requested is the ApiNumber. The value for the ‘load library’ feature depends on the operating system version.

Capture d’écran 2014-04-08 à 16.51.22
From now on, the index does not depend on the operating system flavor (32 bits or 64 bits).

Input buffer

The second and third parameters of the functions are the input buffer and its associated length, in bytes.

The input buffer for the ‘load library’ feature is described by the following structure:

typedef struct _USERHOOK
{
DWORD      dwBufferSize;
DWORD      dwAdditionalData;
DWORD      dwFixupsCount;
LPVOID     pbFree
DWORD      offCbkPtrs;
DWORD      bFixed;
UNICODE_STRING lpDLLPath;
union
{
DWORD      lpfnNotify
UNICODE_STRING lpInitFunctionName;
}
DWORD      offCbk[2];
} _USERHOOK_s;


This structure is a specialization of a more general mechanism that exists in the win32k.sys driver for user-mode callbacks: it is composed of a fixed header (from dwBufferSize to bFixed) and a variable-length data (starting from lpDLLPath).

dwBufferSize contains the length of the whole buffer, including the variable-length data.
dwAdditionalData contains the length of the variable-length data.
pbFree is a pointer to the end of the variable-length data.

We will not go into the implementation details of how this dynamic buffer is allocated and the previous fields are used by the Windows routines. We just have to mimic the way the buffer is filled in in order to call KeUserModeCallback.

Note: you can have a look at the win32k!AllocCallbackMessage and win32k!CaptureCallbackData functions called by win32k!ClientLoadLibrary if you want to understand how this structure is allocated and updated.

Relocatable buffer

The buffer supplied by the caller of KeUserModeCallback resides in kernel memory. The buffer must eventually resides in the user memory of the process (it is copied on the userland stack) in order to be handled by the userland functions.

In order to make the buffer location-independent, the Windows developers implemented a simple mechanism consisting of ‘fix-ups’. If the bFixed is FALSE, every pointer does not contain an address but an offset relative to the beginning of the structure.

Let’s take for example the buffer passed to the user32!__ClientLoadLibrary on a Windows XP 32 bits:

kd> db ef5a78e0 L68
ef5a78e0 68 00 00 00 40 00 00 00-01 00 00 00 48 79 5a ef h...@.......HyZ.
ef5a78f0 24 00 00 00 00 00 00 00-3e 00 40 00 28 00 00 00 $.......>.@.(...
ef5a7900 40 9e 00 00 1c 00 00 00-43 00 3a 00 5c 00 57 00 @.......C.:.\.W.
ef5a7910 49 00 4e 00 44 00 4f 00-57 00 53 00 5c 00 73 00 I.N.D.O.W.S.\.s.
ef5a7920 79 00 73 00 74 00 65 00-6d 00 33 00 32 00 5c 00 y.s.t.e.m.3.2.\.
ef5a7930 75 00 78 00 74 00 68 00-65 00 6d 00 65 00 2e 00 u.x.t.h.e.m.e...
ef5a7940 64 00 6c 00 6c 00 00 00 d.l.l...
dwBufferSize: 0x68
dwAdditionalData: 0x40
dwFixupCounts: 1
pbFree: 0xef5a7948
offCbkPtrs: 0x24 -> 0xef5a7904
bFixed : FALSE
lpDLLPath : (Length: 0x3e, MaximumLength: 0x40, Buffer: 0x28)

The buffer contains 1 fix-up (dwFixupsCount = 1). The array containing this fix-up is at offset 0x24 from the beginning of the structure (thus residing at address 0xef5a7904). The first and only element of this array is the offset of the value to fix: it is the UNICODE_STRING buffer (value 0x28 at offset 0x1c). After being fixed, the buffer points to the real memory address (0xef5a78e0+ 0x28 = 0xef5a7908).

This resolution is performed by the FixupCallbackPointers function of user32, after the buffer has been copied to the user land stack.

The code for this function looks like:

void FixupCallbackPointers(_USERHOOK_s *pData)
{
LPWORD offsetPointers;
DWORD fixup;
offsetPointers = (LPBYTE)pData + pData->offCbkPtrs;
for(fixup=0;fixup < pData->dwFixupsCount;fixup++)
{
pData[*offsetPointers] += (LPVOID)pData;
offsetPointers++;
}
}

Load library-specific parameters

The first parameter in the dynamic part of the input buffer given to KeUserModeCallback is the name of the module to load; it is specified in the lpDLLPath field of the structure. The module is eventually loaded by the call to the kernel32!LoadLibraryExW function.

The second parameter passed in the buffer describes a function to call once the library is loaded and depends on the operating system version.

On Windows XP, the field (lpfnNotify) is an offset relative to the loaded module of the function to call. Starting from Windows Vista, the field (lpInitFunctionName) is the name of the function to call; this function must be exported because it is retrieved with the help of GetProcAddress.

To skip the initialization function call, simply specify a 0 value for lpfnNotify on Windows XP or specify no relocation for the function name (dwFixupsCount = 1 and offCbk[1] = 0) starting from Windows Vista.

Output buffer

On output, the KeUserModeCallback fills the OutputBuffer and OutputLength parameters with the results of the call if it succeeds.

For the load library case, the contents of the whole output buffer has not been investigated. However, the beginning of the output buffer matches the structure:

typedef struct _LOAD_OUTPUT
{
LPVOID lpBaseAddress;

} _LOAD_OUTPUT_s;

The lpBaseAddress field contains the base address of the loaded module.

 

What about Wow64?

What we described so far is relevant for 32-bit processes on 32-bit operating systems and 64-bit processes on 64-bit systems. But what about 32-bit processes on 64-bit systems?

The good news is that it works equally from the kernel point-of-view, so what we explained is still relevant.

In a Wow64 process, the first change is that the KernelCallbackTable field of the Process Environment Block now points to wow64win module functions:

kd> dps 0x00000000`73e51510 L0n105
00000000`73e51510 00000000`73e82894 wow64win!whcbfnCOPYDATA
00000000`73e51518 00000000`73e82a28 wow64win!whcbfnCOPYGLOBALDATA
...
00000000`73e516a0 00000000`73e87dc8 wow64win!whcbClientCopyDDEIn1
00000000`73e516a8 00000000`73e87f78 wow64win!whcbClientCopyDDEIn2
00000000`73e516b0 00000000`73e880b8 wow64win!whcbClientCopyDDEOut1
00000000`73e516b8 00000000`73e88280 wow64win!whcbClientCopyDDEOut2
00000000`73e516c0 00000000`73e883c0 wow64win!whcbClientCopyImage
00000000`73e516c8 00000000`73e884e8 wow64win!whcbClientEventCallback
00000000`73e516d0 00000000`73e8862c wow64win!whcbClientFindMnemChar
00000000`73e516d8 00000000`73e8878c wow64win!whcbClientFreeDDEHandle
00000000`73e516e0 00000000`73e888a4 wow64win!whcbClientFreeLibrary
00000000`73e516e8 00000000`73e889b4 wow64win!whcbClientGetCharsetInfo
00000000`73e516f0 00000000`73e88aec wow64win!whcbClientGetDDEFlags
00000000`73e516f8 00000000`73e88c04 wow64win!whcbClientGetDDEHookData
00000000`73e51700 00000000`73e88d6c wow64win!whcbClientGetListboxString
00000000`73e51708 00000000`73e88f14 wow64win!whcbClientGetMessageMPH
00000000`73e51710 00000000`73e89088 wow64win!whcbClientLoadImage
00000000`73e51718 00000000`73e8920c wow64win!whcbClientLoadLibrary
00000000`73e51720 00000000`73e89370 wow64win!whcbClientLoadMenu
00000000`73e51728 00000000`73e894c4 wow64win!whcbClientLoadLocalT1Fonts
00000000`73e51730 00000000`73e895ac wow64win!whcbClientPSMTextOut
00000000`73e51738 00000000`73e89718 wow64win!whcbClientLpkDrawTextEx
...
00000000`73e51850 00000000`73e8c6a8 wow64win!whcbfnINPGESTURENOTIFYSTRUCT


What these functions do is performing an extra-marshalling between 64 and 32-bit structures.

Regarding the load library functionality, the wow64win!whcbClientLoadLibrary function first calls wow64win!FixupCaptureBuf64 which resolves the relative offsets.

The original buffer contains the raw data as received by the kernel:

0:000> db 00000000006fdde8 L00000000000000c8
00000000`006fdde8 c8 00 00 00 70 00 00 00-02 00 00 00 00 00 00 00 ....p...........
00000000`006fddf8 f0 d0 e1 02 80 f8 ff ff-48 00 00 00 00 00 00 00 ........H.......
00000000`006fde08 00 00 00 00 00 00 00 00-3e 00 40 00 00 00 00 00 ........>.@.....
00000000`006fde18 58 00 00 00 00 00 00 00-20 00 22 00 00 00 00 00 X....... .".....
00000000`006fde28 98 00 00 00 00 00 00 00-30 00 00 00 40 00 00 00 ........0...@...
00000000`006fde38 00 00 00 00 00 00 00 00-43 00 3a 00 5c 00 57 00 ........C.:.\.W.
00000000`006fde48 69 00 6e 00 64 00 6f 00-77 00 73 00 5c 00 73 00 i.n.d.o.w.s.\.s.
00000000`006fde58 79 00 73 00 74 00 65 00-6d 00 33 00 32 00 5c 00 y.s.t.e.m.3.2.\.
00000000`006fde68 75 00 78 00 74 00 68 00-65 00 6d 00 65 00 2e 00 u.x.t.h.e.m.e...
00000000`006fde78 64 00 6c 00 6c 00 00 00-54 00 68 00 65 00 6d 00 d.l.l...T.h.e.m.
00000000`006fde88 65 00 49 00 6e 00 69 00-74 00 41 00 70 00 69 00 e.I.n.i.t.A.p.i.
00000000`006fde98 48 00 6f 00 6f 00 6b 00-00 00 00 00 00 00 00 00 H.o.o.k.........
00000000`006fdea8 00 00 00 00 00 00 00 00 ........


In the original buffer, 2 fix-ups are declared. The pseudo-pointers are 64-bit long (in yellow and green in the previous image).

wow64win!FixupCaptureBuf64 replaces the relative offsets with absolute addresses. Since all relocations are performed, it sets the number of fix-ups to ‘0’.

0:000> db 00000000006fdde8 Lc8
00000000`006fdde8 c8 00 00 00 70 00 00 00-00 00 00 00 00 00 00 00 ....p...........
00000000`006fddf8 f0 d0 e1 02 80 f8 ff ff-48 00 00 00 00 00 00 00 ........H.......
00000000`006fde08 00 00 00 00 00 00 00 00-3e 00 40 00 00 00 00 00 ........>.@.....
00000000`006fde18 40 de 6f 00 00 00 00 00-20 00 22 00 00 00 00 00 @.o..... .".....
00000000`006fde28 80 de 6f 00 00 00 00 00-30 00 00 00 40 00 00 00 ..o.....0...@...
00000000`006fde38 00 00 00 00 00 00 00 00-43 00 3a 00 5c 00 57 00 ........C.:.\.W.
00000000`006fde48 69 00 6e 00 64 00 6f 00-77 00 73 00 5c 00 73 00 i.n.d.o.w.s.\.s.
00000000`006fde58 79 00 73 00 74 00 65 00-6d 00 33 00 32 00 5c 00 y.s.t.e.m.3.2.\.
00000000`006fde68 75 00 78 00 74 00 68 00-65 00 6d 00 65 00 2e 00 u.x.t.h.e.m.e...
00000000`006fde78 64 00 6c 00 6c 00 00 00-54 00 68 00 65 00 6d 00 d.l.l...T.h.e.m.
00000000`006fde88 65 00 49 00 6e 00 69 00-74 00 41 00 70 00 69 00 e.I.n.i.t.A.p.i.
00000000`006fde98 48 00 6f 00 6f 00 6b 00-00 00 00 00 00 00 00 00 H.o.o.k.........
00000000`006fdea8 00 00 00 00 00 00 00 00 ........


The second step builds another buffer containing only the static part of the structure with a layout matching the 32-bit code expectations.

0:000> db 6fdd20 L28
00000000`006fdd20 c8 00 00 00 70 00 00 00-00 00 00 00 f0 d0 e1 02 ....p...........
00000000`006fdd30 48 00 00 00 00 00 00 00-3e 00 40 00 40 de 6f 00 H.......>.@.@.o.
00000000`006fdd40 20 00 22 00 80 de 6f 00 ."...o.


The control is then passed to the user32!__ClientLoadLibrary function that performs the operations as if it were running on a 32-bit operating system.

Since the loading of the specified module is performed as if it were called by the userland process, the standard restrictions and behaviors are applicable. In particular, DLL redirection is in effect and loading of a DLL in c:\Windows\System32 will be automatically redirected to c:\Windows\SysWOW64.

 

Conclusions

It is possible to use the KeUserModeCallback function to load a custom library in processes, provided they use the user32 module. In practice, nearly all end-user applications use this module so it should not be a strong constraint. Since this function and the associated parameters are not documented, this functionality is subject to change in the future versions (even if it did not change so much during the 15 past years).

If you are interested in investigating other methods of executing user-mode code from the kernel, you can also have a look at the 6-part articles published by Nynaeve.

Share on

[juiz_sps buttons="facebook, twitter, linkedin, mail"]