Analysis of a Parallels Desktop Stack Clash Vulnerability and Variant Hunting using Binary Ninja

September 09, 2021 | Reno Robert

Parallels Desktop uses a paravirtual PCI device called the “Parallels ToolGate” for communication between guest and host OS. This device is identified by Vendor ID 0x1AB8 and Device ID 0x4000 in a Parallels guest.

The guest driver provided as part of Parallels Tools and the host virtual device communicate using a ToolGate messaging protocol. To provide a summary, the guest driver prepares a message and writes the physical address of the message to TG_PORT_SUBMIT [PORT IO ADDRESS+0x8]. The host then maps the guest provided physical address, parses the message, and transfers control to the respective request handlers for further processing. Many of the bugs received by the ZDI program in Parallels are in these request handlers.

ToolGate Interface and Protocol Format

The ToolGate request format is a variable size structure that could span across multiple pages. The guest sends data to the host as inline bytes or pointers to paged buffers by writing the physical address of TG_PAGED_REQUEST structure to the IO port of the virtual device.

Figure 1 - Variable size TG_PAGED_REQUEST structure in guest memory

Figure 2 - Variable size TG_PAGED_BUFFER structure in TG_PAGED_REQUEST

The host then maps the page pointed to by the physical address, prepares a host buffer based on the information provided by the guest and then invokes the request handler. Request handlers may use inline bytes or paged buffers or both, for reading and writing data. These data buffers are accessed using a set of functions as roughly defined below:

TG_DataBuffer *TG_GetBuffer(TG_ReqBuffer *buf, uint32_t index, uint32 writable) uint32_t TG_ReadBuffer(TG_DataBuffer *buf, uint64_t offset, void *dst, size_t size) uint32_t TG_WriteBuffer(TG_DataBuffer *buf, uint64_t offset, void *src, size_t size)
void *TG_GetInlineBytes(TG_DataBuffer *buf)

Return of the Stack Clash Vulnerability

The STARLabs submission (ZDI-21-937) for Pwn2Own 2021 is an uncontrolled memory allocation vulnerability where a guest provided size value is allocated in the stack. If the size provided by the guest is more than the total size of the stack, it is possible to shift the Stack Pointer (RSP) into other regions of process memory (e.g., stack region of another thread).

Figure 3 - Normal stack operation (left) vs stack jumping due to large allocation (right)

When handling TG_REQUEST_INVSHARING (0x8420), Parallels Desktop does not validate the ByteCount value provided by the guest as part of TG_PAGED_BUFFER. When allocated in stack, this untrusted size value can be used for shifting the stack top of thread processing the ToolGate request into another victim thread to overwrite its contents. There is a guard page that is 4KB in size, however it is possible to jump over this small allocation without accessing it. Qualys published a detailed paper titled The Stack Clash on this bug class back in 2017, which also leads to various compiler mitigations to prevent such guard page jumps.

Below is the vulnerable section of code from Parallels Desktop 16.1.3. During the call to TG_ReadBuffer(), the stack memory of another thread can be overwritten with guest-controlled values.

Figure 4 - Vulnerability in TG_REQUEST_INVSHARING handling

Backward Compatibility and Compiler Mitigations

The most interesting question here is what happened to the stack clash compiler mitigation in Apple Clang? When a variable size allocation is made in stack like alloca(value) or char buffer[value], the Apple Clang compiler instruments the allocation with ___chkstk_darwin() to validate the size of allocation request. ___chkstk_darwin() essentially allocates a PAGE_SIZE of memory in stack, and then, for each page allocated, probes the new stack top if it is accessible. If the case guard page is reached, the probing step will fail leading to a safe crash. It is no longer possible to shift the stack pointer into an arbitrary location.

Figure 5 - Sample code with stack clash mitigation

It is clear that Parallels Desktop did not have this mitigation enabled because there is no call to ___chkstk_darwin() during the variable size stack allocation. At this point there are couple of open questions:

        -- Did Parallels disable the mitigation using -fno-stack-check compiler flag?
        -- Did they use an old build configuration which disabled the mitigation?

Mac OS otool can be used to get a fair amount of information regarding the build environment. Specifically, LC_VERSION_MIN_MACOSX can provide information regarding the macOS versions supported. Here is the output of otool -l prl_vm_app:

Figure 6 - LC_VERSION_MIN_MACOSX  information of prl_vm_app

The SDK used is 10.15 (Catalina), which is quite new. Moreover, Parallels has also set the minimal macOS version supported to 10.13, which makes it compatible with High Sierra. This aligns with the compatibility information provided in their KB article. Still, does backward compatibility for 10.13 disable the compiler mitigation? Here is a comparison of sample code compiled with and without -mmacosx-version-min=10.13:

Figure 7 - Backward compatibility with 10.13 disables ___chkstk_darwin() (right)

It is unclear if Parallels had ___chkstk_darwin() explicitly disabled using -fno-stack-check, but setting -mmacosx-version-min=10.13 has the same effect and silently drops the mitigation. The same behavior is also observed with -mmacosx-version-min=10.14 (Mojave). Interestingly, GCC has inlined the stack probe check instead of depending on an external library function. This is possibly due to an external dependency, as compiling in Apple Clang with the backward compatibility flags (macosx-version-min, target etc.) ended up removing the mitigation.

Saying that, Mojave (10.14.6) did not give any symbol errors when running executables with calls to ___chkstk_darwin(). One can find many such issues in High Sierra (10.13.6). It is noticed that, in older compilers (e.g. Apple LLVM version 10.0.1 on Mojave), the stack clash mitigation is not enabled by default unless the -fstack-check flag is explicitly provided. Therefore, the recent compilers seem to drop the mitigation entirely when compiled with macosx-version-min for any version less than 10.15. This can be worked around by providing both the -fstack-check and -mmacosx-version-min flags together. However, High Sierra compatibility is questionable. The highlight is that using macosx-version-min alone can make the bug exploitable even on latest versions of Mac OS.

Figure 8 - Fig 8. Apple Clang calls ___chkstk_darwin (left) vs GCC mitigation inlined (right)

Variant hunting using Binary Ninja

The next question is, are there other similar bugs in Parallels Desktop? How can we automate this process? The size of a stack frame is generally known during compile time. Moreover, any operations that shift the stack pointer can also be tracked. Binary Ninja has static data flow capability which keeps track of the stack frame offset at any point in a function. However, when there is a variable size allocation in stack, the stack offsets cannot be determined statically. This exact property can be used to look for variants of the bug.

Figure 9 - Fig 9. Known stack offset (left) vs Undetermined value after alloca() (right)

Figure 9 - Fig 9. Known stack offset (left) vs Undetermined value after alloca() (right)

Consider the index 88 in the above Low-Level IL, where RSP register is loaded.

Here, the new stack top is calculated using a guest-provided size and loaded into RSP. Binary Ninja provides get_possible_reg_values() and get_possible_reg_values_after() python APIs to fetch statically determined values for registers. The register values are also associated with type information (RegisterValueType). Here is the stack frame offset value in RSP before and after the load operation:

The RSP is always associated with StackFrameOffset RegisterValueType. However, when the RSP value is not known, it is marked as UndeterminedValue. With this value type information, search for all references to TG_ReadBuffer() where RSP value is undetermined. If RSP is undetermined before the call to TG_ReadBuffer(), its deduced that a variable size allocation was made in stack prior to the call.

The above query yielded 3 results; one was the Pwn2Own submission and 2 other previously unknown vulnerabilities.

      0x1001c6e35 - ZDI-21-1056 - TG_REQUEST_GL_CREATE_CONTEXT
      0x10025591a - ZDI-21-1055 - TG_REQUEST_DIRECT3D_CREATE_CONTEXT
      0x10080bcd9 - ZDI-21-937 - (Pwn2Own) - TG_REQUEST_INVSHARING

Figure 10 - Pwn2Own bug and its variants found using Binary Ninja

Figure 10 - Pwn2Own bug and its variants found using Binary Ninja

Conclusion

Our research has shown that, in some cases, compiling for backward compatibility might silently drop mitigations, thus making an entire bug class exploitable. Perhaps the vendors should consider releasing separate binaries in such cases?

We also took a look at how Binary Ninja’s static data flow capability and python API’s can be useful in automating bug finding tasks. If you find any such vulnerabilities, consider submitting it to our program. Until then, you can find me on Twitter @RenoRobertr, and follow the team for the latest in exploit techniques and security patches.