Commercial virtualization-based obfuscators are hard to… well… devirtualize in a generic way. The idea of the paper is to use code like this:
void KnowledgeLeaking() {
VIRTUALIZER START // VM macro
__asm(
"cmpxchg eax , eax;" // anchor
"mov rax , 0x1337 ;" // knowledge leaking code
"cmpxchg eax , eax;" // anchor
);
VIRTUALIZER END // VM macro
}
and to throw it repeatedly at virtualizers, with the anchor being an
instruction that aren't or can't be virtualized by the virtualizer, like atomic swaps,
syscall, cpuid, … meaning that it won't be obfuscated (format-preserving).
This construction will also force the VM to be started, suspended/terminated,
execute the anchor, resumed/restart the vm, and then finally terminate it,
effectively creating a "self-contained" obfuscation of the "knowledge leaking
code".
Afterwards, backward and forward slicing can be used on the trace of the
function, since all the input/output registers/memory values of the knowledge
leaking code are known. Moreover, by using a nop instruction, the
context-switch instructions can be precisely identified.
This allows to leak what the paper calls "Mapping rules": instruction → obfuscated
corresponding code, with their associated additional transformation strategy
(like xor edx, ecx → nor(or(edx, ~ecx), or(~edx, ecx))); the
main hypothesis being that the different transformation strategies can be
enumerated.
The authors threw their machinery at VMProtect, Code Virtualizer, Themida, and Obsidium. Their found out 760 anchor instructions, extracted 1915 customized mapping rules validated by Z3,
Surprisingly, they didn't write a pattern-matching-based deobfuscator ("[…] designed to assist analysts in extracting knowledge from commercial VM- based obfuscators, rather than directly simplifying virtualized malware. We leave it to future work."), but produced a benchmark, to see what instructions other tools like Syntia, VMhunt and generic-deobfuscator could successfully devirtualize.
Amusingly:
The different levels of obfuscation (i.e., white, black, and red) provided by Code Virtualizer and Themida only change the numbers of inserted junk instructions but will not influence the complexity of the mapping rules between original instructions and kernel virtualized instructions.
The code has of course been published, and the paper was part of the 29th Network and Distributed System Security Symposium.