CVE-2021-31440: An Incorrect Bounds Calculation in the Linux Kernel eBPF Verifier

May 27, 2021 | Lucas Leong

In April 2021, the ZDI received a Linux kernel submission that turned out to be an incorrect bounds calculation bug in the extended Berkeley Packet Filter (eBPF) verifier. This bug was submitted to the program by Manfred Paul (@_manfp) of the RedRocket CTF team (@redrocket_ctf). Manfred Paul had successfully exploited two other eBPF verifier bugs in Pwn2Own 2020 and 2021 respectively.

This particular bug bypassed the eBPF verification and resulted in an out-of-bounds (OOB) access in the Linux kernel. The researcher exploited this bug and demonstrated a Kubernetes container escape. The patch was recently released as CVE-2021-31440 . Linux kernel versions from 5.7 and on were affected.

The Vulnerability

After CVE-2020-8835, one significant change was added to the verifier, namely 32-bit bound tracking. For the unsigned and signed minimum and maximum bounds, the 32-bit bounds u32_min_value, u32_max_value, s32_min_value and s32_max_value additionally and exclusively apply to the lower 32 bits of each tracked register.

However, a very similar mistake was reintroduced in __reg_combine_64_into_32(). This function uses the known bounds on a 64-bit register to infer bounds for the register’s lower 32 bits.

If the smin_value and smax_valueat (1) are both within in the range of signed 32-bit integers, the 32-bit signed bounds are updated accordingly. In all other cases the 32-bit signed bounds remain in the “unbounded” state. This logic is proper. However, in the corresponding logic for unsigned bounds, the checks on umin_value and umax_value are performed separately at (2) and (3). This logic is incorrect. For example, consider what happens if a register has umin_value = 1 and umax_value = 1<<32. data-preserve-html-node="true" At (2), the verifier will set u32_min_value to 1. At runtime, the register’s actual value can be 1<<32, data-preserve-html-node="true" making the lower 32 bits equal to 0. This violates the correctness of the register’s bounds, which indicate that the minimum value of the lower 32 bits is 1.

Notably, there was already a fix for the case of signed bounds at (1) from December 2020, but it missed the case of unsigned bounds.

Exploitation

The bug can be exploited as follows. Begin with these eBPF instructions:

These instructions set BPF_REG_2 as 1<<32. data-preserve-html-node="true" The two successive NEG instructions cause the verifier to lose track of all bounds for BPF_REG_2, while keeping its runtime value unchanged.

Next:

This conditional branch tests whether BPF_REG_2 is greater than or equal to 1. For the true side of the branch, the verifier sets the register’s umin_value to 1. Furthermore, the verifier calls __reg_combine_64_into_32(), which sets u32_min_value to 1 as well. This is the branch that will be followed at runtime.

Next:

This second conditional branch tests whether BPF_REG_2 is less than or equal to 1, so for the true side of the branch, the verifier sets the register’s u32_max_value to 1. At this point, for the true path, u32_max_value and u32_min_value are both set to 1, meaning that the verifier believes that the value of the lower 32 bits is known to be exactly 1. Recall, though, that the runtime value of BPF_REG_2 has been set to 1<<32 data-preserve-html-node="true" from the beginning, so that the true value of the lower 32 bits is 0. Thus, the verifier has inferred incorrect bounds for the register.

Finally, we can use this to produce an out-of-bounds access:

The BPF_MOV32_REG extends the wrong knowledge to the whole 64-bit register. After two ALU (arithmetic) operations, the register value is believed by the verifier to be 0, but is actually 1. This situation is the same as in the exploit steps in CVE-2020-8835, and the later steps of that exploit can be reused here. The out-of-bounds read/write access is achieved by using the wrongly bounded register, followed by an arbitrary read/write, and privilege escalation to root.

Demonstrating Exploitation

Since the Kubernetes container by default allows access to all system calls, a container escape can be achieved by exploiting this bug. The researcher set the current UID and GID to 0 and obtained the CAP_SYS_MODULE capability. This allowed the program to load an arbitrary kernel module outside of the container.

Conclusion

The eBPF module is still a good place for kernel bug hunting. It was also recently introduced to Windows. This attack surface is usable not only for local privilege escalation but also for container escapes. Those running systems affected by this bug should apply this mitigation, or even better, upgrade the kernel to an unaffected version. Thanks again to Manfred Paul(@_manfp) of the RedRocket CTF team (@redrocket_ctf) for submitting this bug. He’s submitted a few other reports to the program, and each has been great. We hope to see more from him in the future.

You can find me on Twitter @_wmliang_, and follow the team for the latest in exploit techniques and security patches.