Some notes on "Randomized slab caches for kmalloc()"
Mon 11 September 2023 — download

Ruiqi Gong and Xiu Jianfeng got their Randomized slab caches for kmalloc() patch series merged upstream, and I've had enough discussions about it to warrant summarising them into a small blogpost.

The main idea is to have multiple slab caches, and pick one at random based on the address of code calling kmalloc() and a per-boot seed, to make heap-spraying harder. It's a great idea, but comes with some shortcomings for now:

  • Objects being allocated via wrappers around kmalloc(), like sock_kmalloc, f2fs_kmalloc, aligned_kmalloc, … will end up in the same slab cache.
  • The slabs needs to be pinned, otherwise an attacker could feng-shui their way into having the whole slab free'ed, garbage-collected, and have a slab for another type allocated at the same VA. Jann Horn and Matteo Rizzo have a nice set of patches, discussed a bit in this Project Zero blogpost, for a feature called SLAB_VIRTUAL, implementing precisely this.
  • There are 16 slabs by default, so one chance out of 16 to end up in the same slab cache as the target.
  • There are no guard pages between caches, so inter-caches overflows are possible.
  • As pointed by andreyknvl and minipli, the fewer allocations hitting a given cache means less noise, so it might even help with some heap feng-shui.
  • minipli also pointed that "randomized caches still freely mix kernel allocations with user controlled ones (xattr, keyctl, msg_msg, …). So even though merging is disabled for these caches, i.e. no direct overlap with cred_jar etc., other object types can still be targeted (struct pipe_buffer, BPF maps, its verifier state objects,…). It’s just a matter of probing which allocation index the targeted object falls into.", but I considered this out of scope, since it's much more involved; albeit something like Jann Horn's CONFIG_KMALLOC_SPLIT_VARSIZE wouldn't significantly increase complexity.

Also, while code addresses as a source of entropy has historically be a great way to provide KASLR bypasses, hash_64(caller ^ random_kmalloc_seed, ilog2(RANDOM_KMALLOC_CACHES_NR + 1)) shouldn't trivially leak offsets.

The segregation technique is a bit like a weaker version of grsecurity's AUTOSLAB, or a weaker kernel-land version of PartitionAlloc, but to be fair, making use-after-free exploitation harder, and significantly harder once pinning lands, with only ~150 lines of code and negligible performance impact is amazing and should be praised. Moreover, I wouldn't be surprised if this was backported in Google's KernelCTF soon, so we should see if my analysis is correct.