MindshaRE: Walking the Windows Kernel with IDA PythonMay 22, 2018 | Jasiel Spelman
When I attend security conferences, I enjoy talking to people about how they augment their own reverse engineering efforts. It is always beneficial to find out how others automate tedious tasks. One thing that often surprises me is that many people using IDA don't use the included APIs to augment their efforts. To try and change that, I'm going to start sharing some of my code and demonstrate some of the things you can accomplish with IDA and Python.
As an introduction to IDA Python, I'm going to show how you can enumerate the Windows System Call tables.
For those that don't know, all system calls on Windows are given an ID. This ID is a unique value that is used to specify the function you would like to call when performing a system call. These IDs can vary heavily across different versions of Windows and especially across service packs. As of Windows 10, they can vary across release branches. For normal applications this isn't a big deal as the userland libraries will always match to use the appropriate ID for the system you're on.
If you're analyzing an exploit or if you're attempting to directly make system calls yourself, this may not be the case. As a consequence, it is handy to know which IDs map to which functions for a given OS version. For a long time, referencing one of the tables that Mateusz Jurczyk hosts on his site was the easiest way, but if you're wanting a version not present there, you'll need to know how to do it yourself.
I'll quickly explain how to enumerate the tables manually, then we'll go over automatically handling it with Python.
Manually Enumerating Windows System Call Tables
There are three important symbols for parsing the system call tables: the base of the table, the size of the table, and the number of bytes the arguments take on the stack. For
ntoskrnl.exe, the names of these symbols are
KiArgumentTable respectively. For
win32k.sys, the names of these symbols are
W32pArgumentTable. On 32-bit builds, these symbol names are prepended with an underscore.
As an example, let's look at Windows 7 64-bit. This is from
ntoskrnl.exe version 6.1.7601.24117.
Based on this, we can see that there are 401 (0x191) system calls.
If we look at the table in Figure 2, we can manually map the functions to their IDs. Based on what we see above,
NtMapUserPhysicalPagesScatter has an ID of 0x0000,
NtWaitForSingleObject is 0x0001,
NtCallbackReturn is 0x0002, and so forth.
There are two special cases we need to handle. If we are looking at
win32k.sys, the ID will be the index of the function within the table plus 0x1000. Also, on 64-bit builds for Windows 10 as of Windows build 1607 need to be handled differently. In these builds, the system call table contains offsets to the functions as four-byte values rather than as eight-byte values.
This is from
ntoskrnl.exe version 10.0.17134.48:
Handling this just means that we need to read four bytes at a time and then add it to the base address.
Automating Mapping Within IDA
Let's first go over the IDA functions we will need to call:
idaapi.get_imagebase - This function will return the base address within the module we're looking at.
idc.GetInputFile - This function will return the name of the file the IDB was loaded for.
idc.BADADDR - This is a constant value that maps to -1 as an unsigned integer (it can also be used to test whether we're in 32-bit mode or 64-bit mode)
idc.Name - This function will return the name of a given address.
idc.LocByName - The inverse of idc.Name, this function will return the address of a given name.
idc.Dword - This function will return the four-byte value at a given address.
idc.Qword - This function will return the eight-byte value at a given address.
idautils.DataRefsFrom - This function will enumerate through any data references from a given address.
We'll start off by ensuring we are looking at either
We can then determine which symbol names we need to use. Next, we need to test to see if we need to use the underscore variants:
LocByName will return
BADADDR if the name does not exist, so we can use it to test if the symbol name exists with or without the underscore.
Now that we have the correct symbol names to use, let's grab the actual size of the table:
First we get the address with
LocByName, then we grab the value at the address with
Last corner case to handle, the Windows 10 64-bit case:
DataRefsFrom will iterate through the data references at the base of the table. There should be one, unless we're looking at one of the newer versions of Windows 10. When looking at those newer Windows 10 builds, we'll just need to make sure we add the base address of the image, which we'll get with
At this point, all we need to do is read consecutive values starting from the table base. We can use
Qword for 64-bit versions (outside of newer builds of Windows 10) and
Dword for 32-bit versions.
Here's an example of what this can print out:
You can see a full copy of this code on our Github page here.
Reverse engineering software can be tedious at times, but automating tasks can take away some of that tedium. I hope you've enjoyed this blog post, look out for future blog posts on IDA and Python. Until then, you can find me on Twitter at @WanderingGlitch, and follow the team for the latest in exploit techniques and security patches.