Title: Notes on the "slab: Introduce dedicated bucket allocator" series
Date: 2024-03-22 14:30

Today [LWN](https://en.wikipedia.org/wiki/LWN.net) published [Hardening the kernel against heap-spraying
attacks](https://lwn.net/Articles/965837/) (paywalled), detailing a [series of
patch](https://lore.kernel.org/lkml/20240304184252.work.496-kees@kernel.org/)
suggested by [Kees Cook](https://outflux.net/) for the Linux kernel. Since LWN
didn't point out why this isn't such a great idea, here we go.

The main idea of the series is to add a `kmem_buckets_create` function
with an associated `kmem_buckets_alloc`, and to make use of it to isolate
every structures used in exploits in their own buckets.
[`xattr`](https://lore.kernel.org/lkml/20240304184933.3672759-3-keescook@chromium.org/)
and
[`memdup_user`](https://lore.kernel.org/lkml/20240304184933.3672759-4-keescook@chromium.org/)
are used as example.

The most glaring issue here is that every *interesting* structures' allocation
sites would have to be patched to make use of this. And there is [**a
lot**](https://lookerstudio.google.com/c/reporting/68b02863-4f5c-4d85-b3c1-992af89c855c/page/n92nD)
of them in the kernel: patching all of their
call-sites doesn't really scale. We're in 2024, we should aim at killing techniques instead of playing wack-a-mole.
Moreover, manually annotating structures isn't sustainable in the long-run without having someone constantly looking at every newly added ones.
A better approach would be to automate the segregation à la [`AUTOSLAB`](https://grsecurity.net/how_autoslab_changes_the_memory_unsafety_game),
added 2.5 years ago in [grsecurity](https://grsecurity.net).

Some comments from my analysis of the [randomized slab caches for kmalloc()"]({filename}/security/slab_caches_kmalloc.md) series apply as well:

- The slabs needs to be pinned, otherwise an attacker could [feng-shui](https://en.wikipedia.org/wiki/Heap_feng_shui) their way
  into having the whole slab free'ed, garbage-collected, and have a slab for
  another type allocated at the same VA. [Jann Horn](https://thejh.net/) and [Matteo Rizzo](https://infosec.exchange/@nspace) have a [nice
  set of patches](https://github.com/torvalds/linux/compare/master...thejh:linux:slub-virtual-upstream),
  discussed a bit in [this Project Zero blogpost](https://googleprojectzero.blogspot.com/2021/10/how-simple-linux-kernel-memory.html),
  for a feature called [`SLAB_VIRTUAL`]( https://github.com/torvalds/linux/commit/f3afd3a2152353be355b90f5fd4367adbf6a955e),
  implementing precisely this. Worryingly, it isn't mentioned in Cook's series.
- I don't see any explicit guard pages between caches, so inter-cache overflows are possible.

Amusingly, as [pointed by spender](https://twitter.com/spendergrsec/status/1765121132699476206),
`PAX_USERCOPY` had since ~2012 a `GFP_USERCOPY` flag to tag
allocations whose size is controlled from user-land to isolate them, which could be
seen as a super-set of this series. Now that compilers are smarter,
Jann Horn's [`CONFIG_KMALLOC_SPLIT_VARSIZE`](https://github.com/thejh/linux/blob/slub-virtual/MITIGATION_README)
is way better anyway, and yet is [routinely bypassed in all kind of
ways](https://github.com/google/security-research/tree/master/pocs/linux/kernelctf)
in Google's own [kernelCTF](https://google.github.io/security-research/kernelctf/rules.html).

I guess it cloud be a starting point for something better, like all structures
with kernel pointers in one bucket and all structures with user controlled data
in another. But again, without a compiler plugin, it'll be excessively tedious
at best, and uselessly bit-rotten at worst.

Finally, I'm also concerned by the lack of effectiveness/cost/coverage/overlap analysis,
but maybe this is more of a "I got this idea, what do you people think?" than a
"Please merge this" email, as I have no idea how the community around the Linux
kernel works with their email-based workflows.