CVE-2021-20226: A Reference-Counting Bug in the Linux Kernel io_uring Subsystem

April 22, 2021 | Lucas Leong

In June 2020, we received a Linux kernel submission detailing a reference-counting bug in the recently introduced io_uring subsystem. The bug leads to a use-after-free on any file structure, which can be leveraged for privilege escalation in the kernel. This bug was submitted by Ryota Shiga (@Ga_ryo_) of Flatt Security.

We believe that the vulnerability affected the Linux kernel from version 5.6 to 5.7 inclusive. The vulnerability has been assigned identifiers ZDI-21-001 and CVE-2021-20226.

The Vulnerability

Linux kernel 5.1 introduced a new asynchronous I/O feature called io_uring. This subsystem operates by batching I/O operation system calls, so that multiple I/O operations can be performed in one system call.

Linux kernel 5.6 has a flawed implementation of the IORING_OP_CLOSE operation. When a system call passes a files_struct to a kernel thread, io_grab_files() doesn’t increment the reference counter at (1). This can lead to a later access of the freed file structure.

Exploitation

The map_lookup_elem() and map_update_elem() functions are good candidates for use in exploiting this bug.

The fdget() at (2) is an optimized function that doesn't increase the reference count if the current task is single-thread. The returned file structure, f, can be freed by a later IORING_OP_CLOSE. The __bpf_copy_key() syscall at (3) is actually a wrapper for copy_from_user(). This provides an opportunity to produce a race condition by using userfaultfd and triggering the vulnerability. At this point, file structure f and its corresponding map are freed. The memory of the map can be reallocated with fake data at (4) and (5). Finally, we can read arbitrary memory at (6) and disclose to usermode.

Here is an overview for the exploit timeline:

Figure 1 - The Exploit Timeline

The recvmsg() function is for timing control. The freed bpf_map can be faked by spraying with setxattr(). The arbitrary write can be achieved by map_update_elem(). This exploit method is restricted to a single-core environment due to the condition of fdget().

Conclusion

New features mean new attack surfaces, and new attack surfaces often lead to new bugs being discovered. It will be interesting to see if any other vulnerabilities are found in this subsystem. Regardless, it was a great find by Ryota, and we appreciate his submission. If that name sounds familiar at all, Ryota also competed in the most recent Pwn2Own and won $30,000 demonstrating a different privilege escalation bug on Ubuntu. We look forward to seeing more from him in the future.

You can find me on Twitter @_wmliang_, and follow the team for the latest in exploit techniques and security patches.