Ruiqi Gong and Xiu Jianfeng got their Randomized slab caches for kmalloc() patch series merged upstream, and I've had enough discussions about it to warrant summarising them into a small blogpost.
The main idea is to have multiple slab caches, and pick one at random based on
the address of code calling kmalloc()
and a per-boot seed, to make heap-spraying harder.
It's a great idea, but comes with some shortcomings for now:
- Objects being allocated via wrappers around
kmalloc()
, likesock_kmalloc
,f2fs_kmalloc
,aligned_kmalloc
, … will end up in the same slab cache. - The slabs needs to be pinned, otherwise an attacker could feng-shui their way
into having the whole slab free'ed, garbage-collected, and have a slab for
another type allocated at the same VA. Jann Horn and Matteo Rizzo have a nice
set of patches,
discussed a bit in this Project Zero blogpost,
for a feature called
SLAB_VIRTUAL
, implementing precisely this. - There are 16 slabs by default, so one chance out of 16 to end up in the same slab cache as the target.
- There are no guard pages between caches, so inter-caches overflows are possible.
- As pointed by andreyknvl and minipli, the fewer allocations hitting a given cache means less noise, so it might even help with some heap feng-shui.
- minipli also pointed that "randomized caches still freely
mix kernel allocations with user controlled ones (
xattr
,keyctl
,msg_msg
, …). So even though merging is disabled for these caches, i.e. no direct overlap withcred_jar
etc., other object types can still be targeted (struct pipe_buffer
, BPF maps, its verifier state objects,…). It’s just a matter of probing which allocation index the targeted object falls into.", but I considered this out of scope, since it's much more involved; albeit something like Jann Horn'sCONFIG_KMALLOC_SPLIT_VARSIZE
wouldn't significantly increase complexity.
Also, while code addresses as a source of entropy has historically be a great
way to provide KASLR bypasses, hash_64(caller ^
random_kmalloc_seed, ilog2(RANDOM_KMALLOC_CACHES_NR + 1))
shouldn't trivially
leak offsets.
The segregation technique is a bit like a weaker version of grsecurity's AUTOSLAB, or a weaker kernel-land version of PartitionAlloc, but to be fair, making use-after-free exploitation harder, and significantly harder once pinning lands, with only ~150 lines of code and negligible performance impact is amazing and should be praised. Moreover, I wouldn't be surprised if this was backported in Google's KernelCTF soon, so we should see if my analysis is correct.