Title: Paper notes: RetSpill
Date: 2024-01-18 16:45

- Full title: RetSpill: Igniting User-Controlled Data to Burn Away Linux Kernel Protections
- PDF: [ACM](https://dl.acm.org/doi/10.1145/3576915.3623220) —
  [mirror](https://kylebot.net/papers/retspill.pdf) —
  [local mirror]({static}/files/papers/retspill.pdf)
- Authors: [Kyle "kylebot" Zeng](https://kylebot.net/),
  [Ruoyu Wang](https://ruoyuwang.me/),
  [Yan Shoshitaishvili](https://yancomm.net/),
  and [Adam Doupé](https://adamdoupe.com/) from [Shellphish](https://shellphish.net/),
  along with [Zhenpeng Lin](https://zplin.me/),
  [Kangjie Lu](https://www-users.cse.umn.edu/~kjlu/),
  [Xinyu Xing](http://xinyuxing.org/) and
  [Tiffany Bao](https://www.tiffanybao.com/).

The idea of the paper is to use user-controlled data that are by design copied
in kernel-land when exercising syscalls to store a [ROP](https://en.wikipedia.org/wiki/Return-oriented_programming)-chain, via 4 main venues:

- Valid Data directly copied onto the kernel stack for performance reasons, like when
  calling `poll`;
- Preserved Registers, restored upon returning from kernel-land to
  userland. 
- Calling Convention compliant functions will save/restore registers, and
  apparently, system call handlers are calling convention compliant
  even though the kernel is already taking care of those,
  and syscalls can [only be called from userland](https://www.kernel.org/doc/html/latest/process/adding-syscalls.html?highlight=syscall_define#do-not-call-system-calls-in-the-kernel).
  But even if the syscalls handles weren't compliant, registers still contain
  userland values when they're called, and sub-functions might store/restore
  those registers, since those do need to be compliant.
- Uninitialized Memory, since the per-thread kernel stack is reused between syscalls,
  and not erased (unless `PAX_MEMORY_STACKLEAK` is used).

Then, only a [KASLR](https://en.wikipedia.org/wiki/KASLR) leak,
a CFHP (control-flow hijacking primitive)
and a `add rsp, X; ret`-like gadget are required to [ROP all the things](https://www.youtube.com/watch?v=FoUWHfh733Y).
Nowadays, most™ CFHP are created by corrupting the heap to hijack function
pointers, and since every kernel thread shares the same heap,
once it is is properly shaped, the control flow hijacking primitive can likely
be triggered again and again from a different threads.
Moreover, changing the exploit is simply a matter of re-invoking a syscall with
different data spill, instead of having to reshape the heap every single time.
One doesn't have to worry about crashes (enabling lame bruteforcing), since no
major Linux distributions (except CentOS, kudos) has `panic_on_oops` enabled,
so having a ROP-chain crash is no big deal, because the CFHP is still on the
heap, one syscall away.

Since the space afforded to store gadgets might be too small, one trick is to
invoke `do_task_dead` at the end of every ROP-chain to terminate it gracefully,
and trigger the CFHP again and again.

Mitigation-wise: 

- [SMEP](https://en.wikipedia.org/wiki/Control_register#SMEP), 
  [SMAP](https://en.wikipedia.org/wiki/Supervisor_Mode_Access_Prevention) and
  [KPTI](https://en.wikipedia.org/wiki/Kernel_page-table_isolation) are irrelevant.
- [RANDKSTACK](https://pax.grsecurity.net/docs/randkstack.txt) mitigates data spillage from Preserved Registers and Uninitialized Memory,
  but since it only provides 5 bits of randomness, a `ret`-sled is enough
  to bypass it (25.44% of the time if using gadgets from Preserved Registers or Uninitialized Memory, 100% otherwise),
  and in the absence of `panic_on_oops` it can quickly be bruteforced anyway.
- [STACKLEAK](https://en.wikibooks.org/wiki/Grsecurity/Appendix/Grsecurity_and_PaX_Configuration_Options#Sanitize_kernel_stack),
  [STRUCTLEAK](https://en.wikibooks.org/wiki/Grsecurity/Appendix/Grsecurity_and_PaX_Configuration_Options#Forcibly_initialize_local_variables_copied_to_userland),
  and [CONFIG\_INIT\_STACK\_\*](https://lwn.net/Articles/823152/)
  only mitigate data spillage from Uninitialized Memory.
- [FG-KASLR](https://lwn.net/Articles/824307/) is [useless](https://lkmidas.github.io/posts/20210205-linux-kernel-pwn-part-3/#gathering-useful-gadgets)
  since it doesn't randomize everything, leaving a couple (`42631` according to
  the paper) of gadgets at position-invariant positions, which are enough to perform
  arbitrary-reads and derandomize everything.
- [KCFI](https://lore.kernel.org/lkml/202210010918.4918F847C4@keescook/T/#u)
  and [IBT](https://www.intel.com/content/www/us/en/developer/articles/technical/technical-look-control-flow-enforcement-technology.html)
  also (currently) don't cover everything, but don't really matter much here
  anyway, since we only care about backward-edges, and as for the CFHP:
  - There [are ways](https://i.blackhat.com/USA-22/Wednesday/US-22-Jin-Monitoring-Surveillance-Vendors.pdf#page=35)
    to obtain one in the presence of perfect forward-edge CFI with a heap corruption.
  - Using `__x86_indirect_thunk_rdi` allows to transform a forward-edge control-flow transition to backward edge one.
- Shadow stack and perfect CFI are a pipe dream that would mitigate RetSpill,
  but [PaX' RAP](https://pax.grsecurity.net/docs/PaXTeam-H2HC15-RAP-RIP-ROP.pdf)
  is really close to it, likely making it insanely hard, with its type-based
  CFI, and its changing-on-every-syscall/task/… register-stored cookie paired
  with unreadable kernel stacks for backward edge, on top of CFI.

To showcase how cool all of this is, the paper comes with a semi-automated tool
outputting the address of a stack-shifting gadget, a function to performs data
spillage, invoke the triggering system call, and yield a root shell via a
classic `commit_creds(init_cred)` + returning back to user space. It works by:

- taking full snapshots of a vm to locate the syscall leading to CFHP by using
  a binary-search-like heuristic;
- mutating userland inputs (registers, `copy\_from\_user`/`get\_user`
  parameters, …), continuing the execution of the vm,
  marking the as user-controllable data if the CFHP still
  happens after modifications, and doing taint analysis to find how to modify
  them.
- generating a ROP-chain, which isn't that easy, given that:
  - it's done over discrete controlled regions
  - there are some constraints, like "`eax` contains the syscall number",
    or "`edx` comes from both *Saved Registers* and *Calling Convention*
    spillages.

Of course, given that some authors are [angr](https://angr.io/) developers,
[angrop](https://github.com/angr/angrop) was used to knit the ROP-chains, and
the results are pretty impressive:

> The abundance of data spillage allows 20 out of 22 proof-of-concept programs
that manifest CFHP to be semi-automatically turned into full privilege escalation exploits.

To kill this technique, the authors suggest:

1. *Preserved Register*: `RANDKSTACK` helps, but storing userspace registers
   somewhere else than on the stack would be even better, eg. in `task_struct`.
2. *Uninitialized Memory*: enable `STACKLEAK`/`STRUCTLEAK`/`CONFIG\_INIT\_STACK\_\*`,
   but the performances impact is pretty steep.
3. *Calling Convention* and *Valid Data*: an improved version of `RANDKSTACK`,
   adding a random offset at the bottom of each stack frame, between `rsp` and user data.
   This technique also mitigates Preserved Registers and Uninitialized Memory,
   with an average performance overhead of 0.61%.

Like all good papers it comes [with code](https://github.com/sefcom/RetSpill).

Amusingly:

- RetSpill completely bypasses OpenBSD's
  [MAP\_STACK](https://isopenbsdsecu.re/mitigations/map_stack/) mitigation,
  should it ever be implemented in kernel-land, 
- The [Organizers](https://org.anize.rs/) CTF team
  [used](https://org.anize.rs/0CTF-2021-finals/pwn/kernote)
  the [`ptregs`](https://elixir.bootlin.com/linux/latest/ident/pt_regs) structure
  to store their ROP chain for [0CTF/TCTF 2021
  Finals](https://ctftime.org/event/1357)'s
  [Kernote](https://ctftime.org/task/17461) pwn challenge.