Artificial truth

archives | latest | homepage | atom/rss/twitter/mastodon

The more you see, the less you believe.

Paper Notes: FineIBT
Wed 07 December 2022 — download

FineIBT is a proposal by Intel's Joao Moreira for a fine-grained forward-edge CFI scheme. It was presented at the Linux Security Summit 2021.

One of Intel CET's shortcomings, as hilinghted in grsecurity's Close, but No Cigar: On the Effectiveness of Intel's CET Against Code Reuse Attacks blogpost, is that every function is a valid indirect call/jmp target. This isn't a theoretical issue, since it was explicitly called out in Qualys' Baron Samedit exploit.

Intel CET works like this, with the endbr64 instruction marking valid targets:

<main>
...
mov rax, <bar>
call *rax;
...

<bar>:
endbr64
...

The main improvement of FineIBT, is to cluster functions and pointers by prototypes to reduce the number of valid targets for a given call/jmp. This isn't a new idea, is was already described in 2003 in pax-future.txt, implemented in PaX' RAP in 2015 and in Microsoft's XFG in 2019. This is done by embedding a hash of target's type and checking it at runtime. This has the nice advantage of not depending on LTO. For FineIBT, it looks like this:

<main>
...
mov rax, <bar>
mov r11, 0xcafecafe
call *rax
...
call <bar_oep>  # direct calls can skip the prologue.

<bar>:
endbr64
xor 0xcafecafe, r11  # this has the nice side-effect of nuking r11.
je bar_oep
hlt
bar_oep:

The loader checks that all DSO are supporting FineIBT, and if so enables it via a flag stored in fs:0x48, making the prologue look like this:

<bar>:
endbr64
xor 0xcafecafe, r11
je bar_oep
testb 0x11, fs:0x48
jne bar_oep
hlt
bar_oep:

Unfortunately, this means that an attacker write arbitrary r/w will be able to disable FineIBT, which is a bit weird, since the threat-model for CFI is usually "arbitrary r/w". Moreover, this adds two instructions per function, hurting the performances/binary size even more.

The performance impact of this scheme is somewhere between negligible and a dozen percents, depending on the workload, both performance-wise and binary-size-wise.

There are a couple of prototypes floating around, in llvm/ld, glibc, musl, … The whole thing is still a work in progress, with questions like how to handle C++ construct like vtables and polymorphism. Rereading PaX' RAP would likely help a lot.

A couple of things/improvements/details aren't mentioned, but I guess they might be during the next iterations, since they're already being discussed in private circles:

  • keyed hashing for binary diversification
  • no endbr64 instructions in functions that should never be indirectly called, and this, cross-DSO.
  • hash value range to segregate functions even more, like for exceptions-related magic (setjmp/longjmp/…).
  • type diversification to restrict valid targets even more, especially for common/dangerous function types.
  • getting rid of the fs:0x48 hack completely.
  • hlt vs. ud2: while the former only takes one byte, the later works in kernel-land as well, and is used by clang and gcc to implemented __builtin_trap. Moreover, there are less chances of having a handler on SIGILL than on SIGSEGV.