Hooking All System Calls In Windows 10 20H1

In the previous post titled MySyscall: Hijacking Windows System Calls For Personal Use, I talked about a method that allows you to hijack a particular group of system calls via simple pointer swap in the .rdata section of win32kfull.sys. There I mentioned another publicly disclosed method to hook all system calls, InfinityHook, which has been patched in Windows 10 20H1. So in this post, I will quickly go over how InfinityHook initially worked, how I upgraded the concept, so it works on new versions of Windows and analyze a new feature involving system call hooking coming to Windows.

Note: This project was developed and tested on Windows 10 x64 version 2004.

An Official Way to Hook System Calls?

While I was analyzing KiSystemCall64, the Windows system call handler, I’ve noticed that it has changed a bit. Most notably, a check was added to KiSystemServiceUser.



This check is a bitwise AND operation between a member of the DISPATCHER_HEADER structure and 0x24. DISPATCHER_HEADER is used in multiple kernel structures, but in this scenario, it’s located at the start of a KTHREAD structure. This means that this check is thread based, so it can be toggled per thread in a process. Since the 3rd byte in this structure is AND’d with 0x24, it means that it’s checking if either Minimal or AltSyscall bit is 1.

struct _DISPATCHER_HEADER {
    union {
        LONG Lock;
        LONG LockNV;
        struct {
            UCHAR Type;
            UCHAR Signalling;
            UCHAR Size;
            UCHAR Reserved1;
        };
        struct {
            UCHAR TimerType;
            union {
                UCHAR TimerControlFlags;
                struct {
                    UCHAR Absolute : 1;
                    UCHAR Wake : 1;
                    UCHAR EncodedTolerableDelay : 6;
                };
            };
            UCHAR Hand;
            union {
                UCHAR TimerMiscFlags;
                struct {
                    UCHAR Index : 6;
                    UCHAR Inserted : 1;
                    UCHAR Expired : 1;
                };
            };
        };
        struct {
            UCHAR Timer2Type;
            union {
                UCHAR Timer2Flags;
                struct {
                    UCHAR Timer2Inserted : 1;
                    UCHAR Timer2Expiring : 1;
                    UCHAR Timer2CancelPending : 1;
                    UCHAR Timer2SetPending : 1;
                    UCHAR Timer2Running : 1;
                    UCHAR Timer2Disabled : 1;
                    UCHAR Timer2ReservedFlags : 2;
                };
            };
            UCHAR Timer2ComponentId;
            UCHAR Timer2RelativeId;
        };
        struct {
            UCHAR QueueType;
            union {
                UCHAR QueueControlFlags;
                struct {
                    UCHAR Abandoned : 1;
                    UCHAR DisableIncrement : 1;
                    UCHAR QueueReservedControlFlags : 6;
                };
            };
            UCHAR QueueSize;
            UCHAR QueueReserved;
        };
        struct {
            UCHAR ThreadType;
            UCHAR ThreadReserved;
            union {
                UCHAR ThreadControlFlags;
                struct {
                    UCHAR CycleProfiling : 1;
                    UCHAR CounterProfiling : 1;
                    UCHAR GroupScheduling : 1;
                    UCHAR AffinitySet : 1;
                    UCHAR Tagged : 1;
                    UCHAR EnergyProfiling : 1;
                    UCHAR SchedulerAssist : 1;
                    UCHAR ThreadReservedControlFlags : 1;
                };
            };
            union {
                UCHAR DebugActive;
                struct {
                    UCHAR ActiveDR7 : 1;
                    UCHAR Instrumented : 1;
                    UCHAR Minimal : 1;
                    UCHAR Reserved4 : 2;
                    UCHAR AltSyscall : 1;
                    UCHAR UmsScheduled : 1;
                    UCHAR UmsPrimary : 1;
                };
            };
        };
        struct {
            UCHAR MutantType;
            UCHAR MutantSize;
            UCHAR DpcActive;
            UCHAR MutantReserved;
        };
    };
    LONG SignalState;
    LIST_ENTRY WaitListHead;
};

In the case that at least one of them is 1, some registers will be copied into a KTRAP_FRAME structure on the stack, then the current stack pointer will be passed in as the first argument into PsAltSystemCallDispatch. If the ActiveDR7 or Instrumented bit is also set in the header, the debug registers will also be saved into the KTRAP_FRAME structure on the stack.



At the start of PsAltSystemCallDispatch, the current thread is stored into RAX and once again a bitwise AND operation is done. This time it checks the 3rd bit again, the Minimal bit. It seems that it’s a flag for a thread belonging to a pico process. The same check can be found in previous versions of the kernel. Now comes the interesting stuff. If this is indeed a pico process, then execution passes to the function pointed to by the first member of the PsAltSystemCallHandlers array. Otherwise, if the process is not a pico process, it jumps to the function which’s pointer is the second member of the array. I initially thought that I could simply overwrite the second member of the array with a pointer to my custom system call handler function, but unfortunately, it isn’t that simple. The first pointer in the array points to PsPicoAltSystemCallDispatch, this function simply calls PsPicoSystemCallDispatch, the same function that was called directly from KiSystemCall64 in previous versions of Windows. The second member (no special name in the symbols) doesn’t seem to be set to point to any function at all by ntoskrnl. Instead, there is a new function, PsRegisterAltSystemCallHandler, which is actually exported, and its prototype is the following:

NTSTATUS PsRegisterAltSystemCallHandler(PVOID HandlerFunction, ULONG HandlerIndex)

So maybe we can just call this function and register our own handler? Wrong. In the case that a handler is already registered at the index you provided (or the value is set to anything else than 0 basically), the system will crash itself. If you try to register a handler at an index larger than 1, the system will also crash itself because there are currently only 2 members allocated in the handlers array. So why does this function even exist? Well, if we look further into the xrefs of the array we will see that the second member is actually referenced in PsNotifyCoreDriversInitialized.

PVOID PsNotifyCoreDriversInitialized() {
	--KeGetCurrentThread()->KernelApcDisable;
	ExAcquirePushLockExclusiveEx(&PsAltSystemCallRegistrationLock, 0);
	if (!PsAltSystemCallHandlers[1]) {
		LONG v1 = 0;
		ULONG signatureEnforcementData = 8;
		if (SeCodeIntegrityQueryInformation(&signatureEnforcementData, 8, &v1) < 0 || !(signatureEnforcementData & 0xA2))
			PsAltSystemCallHandlers[1] = 1;
  	}

	ExReleasePushLockEx(&PsAltSystemCallRegistrationLock, 0);
	KiLeaveCriticalRegionUnsafe(KeGetCurrentThread());
	PspPicoRegistrationDisabled = 1;
	unkQword1= HaliQuerySystemInformation;
	unkQword2= 8;
	KeInitAmd64SpecificState();
	PspPicoProviderRanges = 0;
	memset(&PsKernelRangeList, 0, 0x140);
	PspKernelRanges = 0;
	return VslConnectSwInterrupt(0, 0);
}

This function is called from IopInitializeBootDrivers, which is invoked as part of the Windows bootup process. PsNotifyCoreDriversInitialized is called after the Windows core drivers are loaded, and it checks if any of them had set the alternative system call handler (using PsRegisterAltSystemCallHandler). If they haven’t, the function first checks if code signing (DSE) is enabled and, if it is, sets the second member of the handler function array to 1. After that, it releases the PsAltSystemCallRegistrationLock lock and disables pico provider registrations by setting PspPicoRegistrationDisabled to true. This means that we can’t use PsRegisterAltSystemCallHandler to change the function unless we disable signature enforcement. Well, if we can’t use that I guess we can manually overwrite the pointer, and we win. Wrong again. At the end of PsNotifyCoreDriversInitialized, there’s a call to KeInitAmd64SpecificState, and anyone who has ever looked at the inner workings of PatchGuard can tell you that’s not good. This function checks if a debugger is attached to the system and, if it isn’t, causes a #DZ exception, which invokes the exception handler, which in turn calls KiFilterFiberContext which is the main PatchGuard initialization routine. This means that the values inside the PsAltSystemCallHandlers array are protected by PatchGuard, try to modify them, and watch your whole system crash with a CRITICAL_STRUCTURE_CORRUPTION bugcheck.

In conclusion, it seems that at the moment, the whole alternative system call handler mechanism is closed for internal use only, as the first member of the handler array is reserved for use by pico providers and the second one for use by core Windows drivers, which are Microsoft signed drivers that are designed to be loaded before any other driver. So the only way to have a use for this mechanism right now is to bypass/disable PatchGuard and modify the pointer in the handlers array. Maybe Microsoft will decide to open up this feature to outsiders in the future, at least to ELAM drivers.

InfinityHook

The rise

InfinityHook is a project developed by Nick Peterson (everdox), which abuses an apparently old feature of Event Tracing for Windows (ETW) that allows you to hook not only system calls but basically every event in Windows that’s tracked by ETW. The concept behind it is actually pretty simple. There can be multiple ETW loggers in the system, and one of the default ones is the Circular Kernel Context Logger. This logger can be modified to log a bunch of kernel events through the ZwTraceControl API function. The first part of the exploit is actually hooking ETW. Nick achieved this by changing the pointer stored inside the WMI_LOGGER_CONTEXT structure under the member GetCpuClock. This typically points to one of the 3 routines: EtwGetCycleCount, EtwpGetSystemTime, or PpmQueryTime. The function pointed to by this pointer is usually called inside EtwpReserveTraceBuffer (could also be called in some other functions, if configured that way). Now, every time ETW tries to log an event, it will call the custom GetCpuClock function. This means that if the Circular Kernel Context Logger is configured to log system calls, our custom function will be called each time a system call is executed. Now we can log all system calls, but what about hooking them? In the case that logging is enabled, the system function pointer is stored on the stack before logging the event and is later called after ETW returns to KiSystemCall64. Because of that, you can walk the stack inside the custom GetCpuClock function and replace the pointer to the system function stored on the stack by KiSystemCall64. Then your system call hook will be called instead of the actual system function so you can do whatever you want.

The return

Even though Nick reported the issue to Microsoft, they said that it is not a real security issue, case closed. But surprise, surprise, Microsoft patched it in the new Windows Insider Builds. So what did they actually do to break the exploit?



They changed GetCpuClock into an index. Now, instead of calling the pointer stored inside the structure member, it calls the appropriate function based on the index. In the case that the index is bigger than 3, it will simply fastfail the system. An interesting case is if GetCpuClock is set to 2. In that case, HalpTimerQueryHostPerformanceCounter is called by its pointer in the HalPrivateDispatch table. Since that table is not protected by PatchGuard, we can simply replace that pointer and change the GetCpuClock index to 2, and system calls are hooked again. Well, that would be true in the past, but Microsoft pulled a smart move and had statically linked hal.dll with ntoskrnl.exe in Windows 10 20H1. Not only that, but they have also protected the HAL dispatch tables with PatchGuard. So this method isn’t a way to go. Well, the fix for InfinityHook still lies in HAL. In the case that GetCpuClock is 1, KeQueryPerformanceCounter will be called. A pretty often used routine that’s also exported.

LARGE_INTEGER KeQueryPerformanceCounter(PLARGE_INTEGER PerformanceFrequency) {
	if (*(DWORD*)(HalpPerformanceCounter + 0xE4) == 5) {
		if (HalpTimerReferencePage) {
			PVOID unknown;
			if (*(DWORD*)(HalpPerformanceCounter + 0xE0) & 0x10000)
				unknown = *(PVOID*)(HalpPerformanceCounter + 0x48) + *(PVOID*)(HalpPerformanceCounter + 0x50) * KeGetPcr()->CurrentPrcb->Number;
			else
				unknown = *(PVOID*)(HalpPerformanceCounter + 0x48);

			PVOID result = 0xFFFFF780000003B8 + (HalpPerformanceCounter->QueryCounter(unknown) * *(PVOID*)(HalpTimerReferencePage + 8)) >> 0x40;
		} else {
			// ...
		}
	} else {
		// ...
	}
}

In the code snippet of KeQueryPerformanceCounter, you can see that HalpPerformanceCounter is an often referenced data structure. First, a check is performed if HalpPerformanceCounter + 0xE4 equals 5. By default, it seems it does. If that is the case, the function will proceed to check if a reference page has been created and stored inside HalpReferenecePage. If it is, then some timer related data will be set as the first parameter to a function call that occurs later. The next key part is the call to HalpPerformanceCounter->QueryCounter routine. This means a pointer to the routine, in my case HalpTscQueryCounterOrdered, will be stored inside the QueryCounter member of the undocumented structure used by counters. Thankfully this structure is located in the .data section and is not protected by PatchGuard since the timer can be swapped out by the system. Now we can finally change the pointer to the QueryCounter routine so it calls our own hook function. Everything else can basically stay the same as it was with the original InfinityHook. I have, however, made some changes in some parts of the whole concept to maybe make it a bit more reliable.

The source code for this project is available on my GitHub.

Conclusion

I’m not sure how I’d feel about Microsoft implementing official system call hooks and opening them up for anyone to use. That would mean that anybody with access to the kernel could monitor all system functions used by applications. This way potentially sensitive data could be monitored or faked. But at the same time, anti-malware software could have great use for this. However, even if Microsoft decides to keep the system call hooking internal, there will always be ways to get around it. I’ve dug into the system call handler pretty deep, but couldn’t find much of use, but that doesn’t mean there aren’t other ways. You can always combine multiple exploits to achieve a given goal. For this exploit, you could just buy a code signing certificate and load the driver on a fully operational system and probably be undetected by most anti-malware solutions (until your driver gets properly analyzed and blacklisted).

17 thoughts on “Hooking All System Calls In Windows 10 20H1

    1. I have also tried manual mapping it with a vulnerable driver, it worked fine, so that shouldn’t be an issue. Without any crash information, it’s hard to say. If you are running it on Windows 10 20H1 or later and you crash, the only guess would be that some offsets are incorrect. You should be able to figure it out by debugging it a bit.

  • Epic, it is great for my project! I have a problem, when i found the ssdt shadow and found a syscall address, for example NtUserGetForegroundWindow, i got BSOD: ATTEMPTED_EXECUTE_OF_NOEXECUTE_MEMORY.

    I attached the csrss process.

    CSRSS eprocess address found: 0x5435080
    InitKSDTSAddrX64: SSTKSDTS address found: 0x8EE086C0
    NtUserGetForegroundWindow function address found: 0xDD1FD42A

    typedef ULONG(*NTUSERGETFOREGROUNDWINDOW)(VOID);

    ULONG hwnd;
    NTUSERGETFOREGROUNDWINDOW NtUserGetForegroundWindow = (NTUSERGETFOREGROUNDWINDOW)function;
    hwnd = NtUserGetForegroundWindow();

    What is the problem?

    1. There are multiple issues here, you seem to be only printing out a 4-byte address. You most likely crash because the address for NtUserGetForegroundWindow is not valid. Are you sure that you are dereferencing the function table offset correctly? It seems like you aren’t dereferencing it at all looking at the output you provided.

    1. This works with Windows 10 version 2004 and upwards. I haven’t tested with the latest insider versions, but I doubt Microsoft patched it.

        1. On Windows 10 prior to version 2004 you can just use InfinityHook, you could get the hook using this method too, but it’s kind of meaningless. If I remember correctly, you can enable compression on the logger and that should redirect the execution path to a block that will call the hook. You can analyze the EtwpReserveTraceBuffer function yourself and check this.

  • Hello,

    I’m quite bad at reversing. Any idea what could have been broken for .264+ versions ? Tried both normal loading and manualmap but it will result a bsod.

    I even tried to hardcode the halpPerformanceCounter address.

    The bsod will take place after swapping the pointers

    *oldFunction = *reinterpret_cast(halpPerformanceCounter + Offsets::counterQueryRoutine);
    *reinterpret_cast(halpPerformanceCounter + Offsets::counterQueryRoutine) = hookFunction;

    Thanks in advance!!!

    1. Offsets probably changed. What is the bugcheck code and parameters? Open up ntoskrnl.exe in a disassembler and check out KeQueryPerformanceCounter and look for differences.

      1. “`
        auto halpPerformanceCounter = Scanner::scanPattern(reinterpret_cast(keQueryPerformanceCounter),
        0x100, “\x48\x8B\x3D”, “xxx”);

        halpPerformanceCounter = *(uintptr_t*)RELATIVE_ADDR(halpPerformanceCounter, 7);“`

        if you need it:
        “`
        #define RELATIVE_ADDR(addr, size) ((PVOID)((PBYTE)(addr) + *(PINT)((PBYTE)(addr) + ((size) – (INT)sizeof(INT))) + (size)))“`

  • Im sorry for spamming your blog. Is it the right place to spam my stuff?

    I hook on 0xFFFFF8071644BD50 as you can see in the image above. Offsets are the same.

    1. It is possible that the HAL is not configured correctly, you should check if the first two conditions are satisfied in KeQueryPerformanceCounter (*(DWORD*)(HalpPerformanceCounter + 0xE4) == 5 && HalpTimerReferencePage).
      Breaking on this function will deadlock the system you are debugging, so you should check the values in memory and then place breakpoints on the part of the function that you want to be executed.

Leave a Reply

Your email address will not be published. Required fields are marked *