Notes on the "slab: Introduce dedicated bucket allocator" series
Fri 22 March 2024 — download

Today LWN published Hardening the kernel against heap-spraying attacks (paywalled), detailing a series of patch suggested by Kees Cook for the Linux kernel. Since LWN didn't point out why this isn't such a great idea, here we go.

The main idea of the series is to add a kmem_buckets_create function with an associated kmem_buckets_alloc, and to make use of it to isolate every structures used in exploits in their own buckets. xattr and memdup_user are used as example.

The most glaring issue here is that every interesting structures' allocation sites would have to be patched to make use of this. And there is a lot of them in the kernel: patching all of their call-sites doesn't really scale. We're in 2024, we should aim at killing techniques instead of playing wack-a-mole. Moreover, manually annotating structures isn't sustainable in the long-run without having someone constantly looking at every newly added ones. A better approach would be to automate the segregation à la AUTOSLAB, added 2.5 years ago in grsecurity.

Some comments from my analysis of the randomized slab caches for kmalloc()" series apply as well:

  • The slabs needs to be pinned, otherwise an attacker could feng-shui their way into having the whole slab free'ed, garbage-collected, and have a slab for another type allocated at the same VA. Jann Horn and Matteo Rizzo have a nice set of patches, discussed a bit in this Project Zero blogpost, for a feature called SLAB_VIRTUAL, implementing precisely this. Worryingly, it isn't mentioned in Cook's series.
  • I don't see any explicit guard pages between caches, so inter-cache overflows are possible.

Amusingly, as pointed by spender, PAX_USERCOPY had since ~2012 a GFP_USERCOPY flag to tag allocations whose size is controlled from user-land to isolate them, which could be seen as a super-set of this series. Now that compilers are smarter, Jann Horn's CONFIG_KMALLOC_SPLIT_VARSIZE is way better anyway, and yet is routinely bypassed in all kind of ways in Google's own kernelCTF.

I guess it cloud be a starting point for something better, like all structures with kernel pointers in one bucket and all structures with user controlled data in another. But again, without a compiler plugin, it'll be excessively tedious at best, and uselessly bit-rotten at worst.

Finally, I'm also concerned by the lack of effectiveness/cost/coverage/overlap analysis, but maybe this is more of a "I got this idea, what do you people think?" than a "Please merge this" email, as I have no idea how the community around the Linux kernel works with their email-based workflows.