FineIBT is a proposal by Intel's Joao Moreira for a fine-grained forward-edge CFI scheme. It was presented at the Linux Security Summit 2021.
One of Intel
CET's
shortcomings, as hilinghted in grsecurity's Close, but No Cigar: On the
Effectiveness of Intel's CET Against Code Reuse Attacks
blogpost, is that every function is a valid indirect call
/jmp
target.
This isn't a theoretical issue, since it was explicitly called out in Qualys'
Baron Samedit exploit.
Intel CET works like this, with the
endbr64
instruction marking valid
targets:
<main>
...
mov rax, <bar>
call *rax;
...
<bar>:
endbr64
...
The main improvement of FineIBT,
is to cluster functions and pointers by prototypes to reduce the number of
valid targets for a given call
/jmp
. This isn't a new idea, is was already
described in 2003 in
pax-future.txt,
implemented in PaX' RAP in 2015
and in Microsoft's XFG in 2019.
This is done by embedding a hash of target's type and checking it at runtime.
This has the nice advantage of not depending on LTO. For FineIBT, it looks like this:
<main>
...
mov rax, <bar>
mov r11, 0xcafecafe
call *rax
...
call <bar_oep> # direct calls can skip the prologue.
<bar>:
endbr64
xor 0xcafecafe, r11 # this has the nice side-effect of nuking r11.
je bar_oep
hlt
bar_oep:
The loader checks that all DSO are supporting FineIBT, and if so enables it via
a flag stored in fs:0x48
, making the prologue look like this:
<bar>:
endbr64
xor 0xcafecafe, r11
je bar_oep
testb 0x11, fs:0x48
jne bar_oep
hlt
bar_oep:
Unfortunately, this means that an attacker write arbitrary r/w will be able to disable FineIBT, which is a bit weird, since the threat-model for CFI is usually "arbitrary r/w". Moreover, this adds two instructions per function, hurting the performances/binary size even more.
The performance impact of this scheme is somewhere between negligible and a dozen percents, depending on the workload, both performance-wise and binary-size-wise.
There are a couple of prototypes floating around, in llvm/ld, glibc, musl, … The whole thing is still a work in progress, with questions like how to handle C++ construct like vtables and polymorphism. Rereading PaX' RAP would likely help a lot.
A couple of things/improvements/details aren't mentioned, but I guess they might be during the next iterations, since they're already being discussed in private circles:
- keyed hashing for binary diversification
- no
endbr64
instructions in functions that should never be indirectly called, and this, cross-DSO. - hash value range to segregate functions even more, like for
exceptions-related magic (
setjmp
/longjmp
/…). - type diversification to restrict valid targets even more, especially for common/dangerous function types.
- getting rid of the
fs:0x48
hack completely. hlt
vs.ud2
: while the former only takes one byte, the later works in kernel-land as well, and is used by clang and gcc to implemented__builtin_trap
. Moreover, there are less chances of having a handler onSIGILL
than onSIGSEGV
.