Paper notes - Chosen-Instruction Attack Against Commercial Code Virtualization Obfuscators

PDF: 2afff887eae2ec3b9485b82e38df4451bfe9783f4b026be82c67eab032968530

Commercial virtualization-based obfuscators are hard to… well… devirtualize in a generic way. The idea of the paper is to use code like this:

void KnowledgeLeaking() {
  VIRTUALIZER START // VM macro
  __asm(
    "cmpxchg eax , eax;" // anchor
    "mov rax , 0x1337 ;"  // knowledge leaking code
    "cmpxchg eax , eax;" // anchor
  );
  VIRTUALIZER END // VM macro
}

and to throw it repeatedly at virtualizers, with the anchor being an instruction that aren't or can't be virtualized by the virtualizer, like atomic swaps, syscall, cpuid, … meaning that it won't be obfuscated (format-preserving). This construction will also force the VM to be started, suspended/terminated, execute the anchor, resumed/restart the vm, and then finally terminate it, effectively creating a "self-contained" obfuscation of the "knowledge leaking code".

Afterwards, backward and forward slicing can be used on the trace of the function, since all the input/output registers/memory values of the knowledge leaking code are known. Moreover, by using a nop instruction, the context-switch instructions can be precisely identified.

This allows to leak what the paper calls "Mapping rules": instruction → obfuscated corresponding code, with their associated additional transformation strategy (like xor edx, ecx → nor(or(edx, ~ecx), or(~edx, ecx))); the main hypothesis being that the different transformation strategies can be enumerated.

The authors threw their machinery at VMProtect, Code Virtualizer, Themida, and Obsidium. Their found out 760 anchor instructions, extracted 1915 customized mapping rules validated by Z3,

Surprisingly, they didn't write a pattern-matching-based deobfuscator ("[…] designed to assist analysts in extracting knowledge from commercial VM- based obfuscators, rather than directly simplifying virtualized malware. We leave it to future work."), but produced a benchmark, to see what instructions other tools like Syntia, VMhunt and generic-deobfuscator could successfully devirtualize.

Amusingly:

The different levels of obfuscation (i.e., white, black, and red) provided by Code Virtualizer and Themida only change the numbers of inserted junk instructions but will not influence the complexity of the mapping rules between original instructions and kernel virtualized instructions.

The code has of course been published, and the paper was part of the 29th Network and Distributed System Security Symposium.

Artificial truth

archives | latest | homepage

Paper notes - Chosen-Instruction Attack Against Commercial Code Virtualization Obfuscators
Sat 01 October 2022 — download