Ruiqi Gong and Xiu Jianfeng got their Randomized slab caches for kmalloc() patch series merged upstream, and I've had enough discussions about it to warrant summarising them into a small blogpost.
The main idea is to have multiple slab caches, and pick one at random based on
the address of code calling kmalloc() and a per-boot seed, to make heap-spraying harder.
It's a great idea, but comes with some shortcomings for now:
- Objects being allocated via wrappers around
kmalloc(), likesock_kmalloc,f2fs_kmalloc,aligned_kmalloc, … will end up in the same slab cache. A possible improvement would be to mix the callsite address and the parent caller's address. - The slabs needs to be pinned, otherwise an attacker could feng-shui their way
into having the whole slab free'ed, garbage-collected, and have a slab for
another type allocated at the same VA. Jann Horn and Matteo Rizzo have a nice
set of patches,
discussed a bit in this Project Zero blogpost,
for a feature called
SLAB_VIRTUAL, implementing precisely this. - There are 16 slabs by default, so one chance out of 16 to end up in the same slab cache as the target.
- There are no guard pages between caches, so inter-caches overflows are possible.
- As pointed by andreyknvl and minipli, the fewer allocations hitting a given cache means less noise, so it might even help with some heap feng-shui.
- minipli also pointed that "randomized caches still freely
mix kernel allocations with user controlled ones (
xattr,keyctl,msg_msg, …). So even though merging is disabled for these caches, i.e. no direct overlap withcred_jaretc., other object types can still be targeted (struct pipe_buffer, BPF maps, its verifier state objects,…). It’s just a matter of probing which allocation index the targeted object falls into.", but I considered this out of scope, since it's much more involved; albeit something like Jann Horn'sCONFIG_KMALLOC_SPLIT_VARSIZEwouldn't significantly increase complexity.
Also, while code addresses as a source of entropy has historically be a great
way to provide KASLR bypasses, hash_64(caller ^
random_kmalloc_seed, ilog2(RANDOM_KMALLOC_CACHES_NR + 1)) shouldn't trivially
leak offsets.
The segregation technique is a bit like a weaker version of grsecurity's AUTOSLAB, or a weaker kernel-land version of PartitionAlloc, but to be fair, making use-after-free exploitation harder, and significantly harder once pinning lands, with only ~150 lines of code and negligible performance impact is amazing and should be praised. Moreover, I wouldn't be surprised if this was backported in Google's KernelCTF soon, so we should see if my analysis is correct.