Taking Control of VMware Through the Universal Host Control Interface: Part 2August 15, 2019 | Guest Blogger
During this year’s Pwn2Own competition in Vancouver, the Fluoroacetate team demonstrated how they could escalate privileges by exploiting VMware Workstation to escape from the guest OS to the host OS. They exploited an out-of-bounds read/write vulnerability (ZDI-19-421) in the virtual USB 1.1 UHCI (Universal Host Controller Interface).
While this vulnerability affected a wide variety of VMware products, the analysis throughout this blog is based on VMware Workstation 15.0.3 using Fluoroacetate’s exploit. The vulnerability was patched in VMware Workstation 15.0.4 with VMSA-2019-0005.1.
To allow VMware guest machines to access USB devices, VMware installs a kernel device driver named
uhci_hcd in the guest. “hcd” stands for “Host Controller Driver”. This driver allows the guest to communicate with the Host Controller Interface (HCI) at the host side, which is the hardware interface used by the host to communicate with the physical USB port. Communication is accomplished by sending or receiving USB Request Block (URB) packets to or from various endpoints defined by the USB device. Each endpoint of the USB device is intended either to receive packets from the host (OUT), or to send packets to the host (IN). The vulnerability is triggered by sending a specially crafted OUT packet to a certain endpoint known as the Bulk endpoint.
Packets handled by the
uhci_hcd driver are represented in memory by the
uhci_td (Transfer Descriptor) structure:
Note that the
token field contains certain bit-aligned subfields not visible here. In particular, the lowest 8 bits indicate the “Packet ID”, which defines the type of packet. The top 10 bits is a length field named
In order to trigger this vulnerability, the guest must send a crafted TD structure that sets the Packet ID as OUT (0xE1). Additionally, the TD’s buffer length, indicated by the
MaxLen subfield, has to be more than 0x40 bytes to overflow an object on the heap.
By attaching windbg to vmware-vmx.exe and triggering the vulnerability, we get the following access violation:
The call stack reveals a chain of functions that handle UHCI requests:
memcpy call that crashes the process was in the middle of copying data from the TD’s buffer:
And this is what
memcpy has copied from the TD’s buffer to the heap:
Let’s see what the destination buffer size is:
The size of the buffer is 0x58 because
vmware-vmx allocates the destination buffer with the size
[number_of_TD_structures]*0x40+0x18. Since this time we only sent one TD structure, the buffer size is
memcpy call, we could precisely determine how many bytes we want to copy. To do this, we set
MaxLen subfield in the OUT TD’s
token field (bits 21 through 31) to the desired
memcpy size minus one.
Clearly, with this we are able to overflow the heap. However, in addition to overflowing the heap, the exploit author was able to exploit this vulnerability to perform additional out-of-bounds writes. The function
NewURB() (located at
vmware_vmx+0x165710) gets called to handle incoming URB packets. Each time the function
NewURB() receives a TD, it adds the TD’s
MaxLen value to a variable known as cursor. The cursor variable points to where the function should write next when it receives a TD structure. In this way, the
MaxLen field can be used to partially control the destination address when processing subsequent TDs.
In order to exploit this vulnerability, it’s necessary to prepare the layout of the heap of the
vmware-vmx process. To perform the heap preparation tasks, the exploit mainly relies on the SVGA3D protocol on the front end (the guest side), which it uses to communicate to the host through SVGA FIFO. On the back-end (the host side), VMware handles the requests using the DX11Renderer component. The exploit code starts with the initialization phase, where it initializes SVGA FIFO memory, then allocates the SVGA3D object tables. It looks like Amat based the exploit primitives used here on research presented by Zisis Sialveras at Black Hat Europe 2018 [PDF].
The overall strategy for preparing the heap will be as follows. The exploit will try to create “holes”, or islands of unallocated memory, each having size 0x158 bytes. That is exactly the required size for allocating a certain number of TDs together with a buffer header. The TDs will likely be allocated within one of these holes. Following each hole, the exploit tries to place a 0x150 byte structure called a “resource container”, representing data associated with a graphics surface. The plan is to corrupt the resource container that immediately follows the allocated TDs.
The exploit code prepares the heap using the following steps:
-- Define and bind a Context memory object with a size of 0x5000.
-- Define a memory object (
SPRAY_OBJ) with the size 0x1000, that the exploit repeatedly uses to bind with structures (e.g., shaders).
-- Define 2400 shaders with the size 0x158, binding them to
SPRAY_OBJ. After that, the exploit uses
SVGA_3D_CMD_SET_SHADER to spray the shaders in the host.
-- Iterate through the sprayed shaders and perform the following:
--- Deallocate each even-numbered shader.
--- Create a surface, to allocate a resource container having a size of 0x150. This allocation will usually be made in the hole that was just vacated by a shader. Additionally, the host will allocate an associated data buffer of size 0x160. Because of the difference in size, these data buffers will be located in a separate area of the Low-Fragmentation Heap (LFH). Each 0x150-byte resource container will contain a pointer to its associated 0x160-byte data buffer.
--- Create two more surfaces, to allocate two other resource containers having a size of 0x160. Because of their size, the resource containers allocated in this step will be located in memory near the 0x160-byte data buffers of the previous step. For this reason, these resource containers are known as the “adjacent” resource containers. The purpose of these “adjacent” resource containers will be explained below.
-- Deallocate all remaining shaders, to free blocks of size 0x158. These holes of size 0x158 will alternate with the resource containers of size 0x150.
Out-of-Bound Write Function
Before we highlight the general structure of the exploit, let’s describe the function
WriteOOB that triggers the vulnerability.
WriteOOB is called many times throughout the exploit for different purposes, such as leaking the
kernel32.dll base addresses, as well as the final code execution step.
The function’s parameters are as follows:
WriteOOB()(void * data, size_t data_size, uint32_t offset)
data parameter is a pointer to a buffer containing data we intend to write to the host heap. The
size parameter specifies the length of the data. Finally, the
offset parameter specifies the location where want to write the data, relative to the start of the resource container that will be corrupted.
The function first allocates and initializes the frame list and five TD structures. Recall that during heap massaging, we create holes of size 0x158. This function sends five TD structures, so the allocated buffer size on the heap will be
5*0x40+0x18=0x158. The hope is that this allocation will be made in the hole, so that immediately following the TDs there will be a resource container to corrupt.
Each TD structure is linked to the next TD structure using the
link field, except for the last TD structure, which is a terminating TD structure. For the first three TD structures, the
MaxLen subfield is set to 0x40. The Packet ID subfield for the first three TD structures is set to
USB_PID_SOF, so that the cursor will advance by 0x41 for each TD structure. The Packet ID for the fourth TD structure is also set to
USB_PID_SOF, but for this TD,
MaxLen is set to a value calculated from the
offset parameter. This advances the cursor by a controllable amount. In the fifth TD, the Packet ID is set to
USB_PID_OUT, in order to write the content of the
data buffer to the cursor position.
Memory leak and bypassing ASLR
Now that the exploit primitives are all in place, the first order of business is to leak the base address of
vmware-vmx.exe. This is done by corrupting the pointer to the data buffer in the resource container immediately following the TDs. This pointer resides at offset 0x138 within the resource container. The exploit corrupts the least significant byte of the data pointer by replacing it with 0x00. When the corrupted pointer gets referenced, it no longer points to the data buffer. Instead, it points within one of the 0x160-byte “adjacent” resource containers that are located close to the data buffers. Within these resource containers there are some function pointers, so when the data is copied back to the guest, the
vmware-vmx.exe base address is revealed:
Let’s see how many bytes we need to move the cursor in order to patch the data pointer precisely:
· Initially, the cursor points to the beginning of a buffer with size 0x158, and considering that the first 0x18 bytes are reserved as a buffer header, we only have control over 0x140 bytes.
· 0x8 bytes are taken up by the heap block header of the following resource container.
· The offset to the data pointer in the resource container is 0x138.
This sums to 0x140+0x8+0x138=0x280, and this is the number bytes the cursor has to move to point to the byte we intend to patch.
In order to write back the leaked function pointers to the guest, the exploit iterates over the 2400 sprayed surfaces and obtains the data from each one using
SVGA_3D_CMD_SURFACE_COPY. It continues iterating until it finds the leaked function pointers that reveal the
vmware-vmx.exe base address.
To find the
kernel32.dll base address, the exploit follows the same process and offsets used to find
vmware-vmx.exe, except for one minor detail. Instead of patching a single byte of the pointer, it overwrites the entire data pointer with
vmware_vmx_base_address+0x7D42D8, which is where the address of
Kernel32!MultiByteToWideCharStub is stored in the import address table. This reveals the
kernel32.dll base address.
Escape and Code Execution on the Host
To achieve code execution, the exploit once again overwrites a resource container on the heap. This time, the exploit overwrites 0x120 bytes of the resource container. This accomplishes three things:
1 - It writes the string
calc.exe to the resource container.
2 - It fills out certain necessary fields of the resource container.
3 - It overwrites a function pointer at offset 0x120 in the resource container, so that it instead points to
This is what the corrupted resource container looks like after corruption:
The result is that when the guest calls
SVGA_3D_CMD_SURFACE_COPY on this corrupted resource container, the
WinExec function pointer will get called, passing the address of the
calc.exe string as the first parameter. The exploit must iterate through all 2400 surfaces to ensure that the corrupted resource container is used.
Summary of the exploit
To review the above material, we can summarize the exploit as follows:
-- Heap massaging:
--- Allocate 2400 shaders of size 0x158.
--- Deallocate alternate shaders of size 0x158.
--- For each deallocated shader, fill the hole with a resource container (e.g., surface) of size 0x150. Within this resource container there will be a pointer to an associated data buffer of size 0x160. Also create two more shaders, allocating two resource containers of size 0x160 that will be adjacent to the data buffers.
-- Leaking vmware-vmx.exe base address (iterate 64 times until the address is found):
WriteOOB to corrupt a resource container of size 0x150 and patch the least significant byte of the pointer to its data buffer, so that it instead points to an adjacent 0x160-byte resource container. This memory contains some function pointers.
--- Iterate through the 2400 surfaces and write the data back to the guest using
SVGA_3D_CMD_SURFACE_COPY until leaked pointers are found.
-- Leaking kernel32.dll base address (iterate 64 times until the address is found):
WriteOOB to corrupt a resource container of size 0x150 and patch the pointer to its data buffer with the address of a
kernel32.dll function in the import table of vmware-vmx.exe.
--- Iterate through the 2400 surfaces and write the data back to the guest using
SVGA_3D_CMD_SURFACE_COPY until the leaked pointer is found.
-- Escape from the guest and gain code execution (iterate 64 times, until we have execution):
WriteOOB to corrupt a resource container of size 0x150. Write the “calc.exe” string and patch a function pointer with the address of
WinExec by Iterating through the 2400 surfaces and writing them back to the guest using
VMware guest-to-host escapes can be performed reliably for certain memory corruption bugs. An exploit can gain code execution by adopting a semi-brute force style. It is still a challenge to find exploitable bugs in VMware, but once a vulnerability is found, it is not overly difficult to exploit. VMware SVGA provides a wide variety of operations and objects, such as resource containers and shaders. These are useful from an exploit perspective in terms of their adjustable size, and the data and function pointers they store.