Artificial truth

The more you see, the less you believe.

[archives] [latest] | [homepage] | [atom/rss]

IDAPython vs. r2pipe
Sat 08 September 2018 — download

This week, I'm at the r2con 2018, meeting some friends, making new ones, attending great talks, … But I'm also spending some time looking at a particular "real world" binary that shall remain unnamed yet. This binary contains encrypted strings, and since I have some free time, I wrote not one, but two scripts to decrypt them: one using r2pipe, the second using IDAPython.

The decryption function is always called like this:

[0x000e1944]> pd 2 @ 0x000e193d
  0x000e193d      c7042470ea10.  mov dword [esp], 0x10ea70
  0x000e1944      e8976bf5ff     call fcn.000384e0
[0x000e1944]>

An offset is pushed on the stack, and the decryption function (fcn.000384e0) is called. The function's graph is looking like this:

[0x000313b0]> s 0x384e0
[0x000384e0]> af
[0x000384e0]> VV

              ┌────────────────────┐
                0x384e0           
              └────────────────────┘
                      
                      └──────────────┐
          ┌─────────────────────────┐ 
            0x38596                 
           0x00038596 call 0x38380  
          └─────────────────────────┘ 
                      ┌──────────────┘
                      
             ┌────────────────────┐
               0x384f9           
             └────────────────────┘
                      
    ┌────────────────┘ └───────┐ ┌───────────────────────┐
                                                      
┌────────────────────┐  ┌────────────────────┐           
  0x38517               0x38527                      
└────────────────────┘  └────────────────────┘           
                                                      
                   ┌──────────┘ └──────────┐            
         ┌────────────────────┐  ┌────────────────────┐ 
           0x38520               0x3852b            
         └────────────────────┘  └────────────────────┘ 
                                                      
     ┌────────────┘ └───────────────────────────────────┘
     
┌───────────────────────────────────────┐
 [0x38539]                             
 0x00038543 call dword [reloc.malloc]  
 0x00038553 call dword [reloc.malloc]  
 0x00038569 call 0x37d30               
 0x00038574 call 0x37cc0               
└───────────────────────────────────────┘

The interesting part is 0x37cc0, because it really looks like a xor-decryption loop:

 [0x000e1944]> pdf @ 0x37cc0
 (fcn) fcn.00037cc0 43
   fcn.00037cc0 (int arg_ch);
           ; arg int arg_ch @ esp+0xc
           ; CALL XREF from fcn.000384e0 (0x38574)
           0x00037cc0      56             push esi
           0x00037cc1      31d2           xor edx, edx
           0x00037cc3      53             push ebx
           0x00037cc4      8b4c240c       mov ecx, dword [arg_ch]
           0x00037cc8      0fb601         movzx eax, byte [ecx]
           0x00037ccb      89c6           mov esi, eax
           0x00037ccd      8d5801         lea ebx, [eax + 1]
       ┌─> 0x00037cd0      8d0432         lea eax, [edx + esi]
          0x00037cd3      83e00f         and eax, 0xf
          0x00037cd6      0fb680204010.  movzx eax, byte [eax + 0x104020]
          0x00037cdd      30440a01       xor byte [edx + ecx + 1], al
          0x00037ce1      83c201         add edx, 1
          0x00037ce4      39da           cmp edx, ebx
       └─< 0x00037ce6      75e8           jne 0x37cd0
           0x00037ce8      5b             pop ebx
           0x00037ce9      5e             pop esi
           0x00037cea      c3             ret

It's looking like this via r2dec:

[0x000e1944]> s 0x00037cc0
[0x00037cc0]> pdd
void fcn_00037cc0 () {
    edx = 0;
    ecx = *(arg_ch);
    eax = ecx;
    esi = eax;
    ebx = eax + 1;
    do {
        eax = edx + esi;
        eax &= 0xf;
        eax = eax + 0x104020;
        *(edx + ecx + 1) ^= al;
        edx += 1;
    } while (edx == ebx);
}
[0x00037cc0]>

So the plan is to:

  1. Find every callsite for the function fcn.000384e0
  2. Get its argument pushed on the stack
  3. Emulate the decryption routine in Python

IDA Python

import idautils
import idc
import idaapi

table = idaapi.get_many_bytes(0x00104020, 255)
decrypt_str_addr = idc.get_name_ea_simple("decrypt_string")

for addr in idautils.CodeRefsTo(decrypt_str_addr, 0):
    arg_addr = idaapi.get_arg_addrs(addr)
    if arg_addr is None:
        continue

    print hex(addr), idc.generate_disasm_line(addr, 0), hex(arg_addr[0])

    ea = idaapi.get_fileregion_ea(arg_addr[0])
    data_addr = idc.GetOperandValue(ea, 1)

    key = idaapi.get_byte(data_addr)
    b = idaapi.get_many_bytes(data_addr, 256)

    out = ""
    for i in range(key):
        ret = ord(table[(i + key) & 0xf])
        out += chr(ret ^ ord(b[i + 1]))
    print(out)

r2pipe

import r2pipe

def get_previous_mov_esp(r, offset):
    """ Since instructions aren't aligned in x86
    and radare2's analysis is often "suboptimal",
    so we're simply bruteforcing the offset until
    we find a good looking™ instruction.
    """
    for i in range(1, 20):
        opcodes = r.cmdj("pdj -%d @%s" % (i, offset))
        for opcode in reversed(opcodes):
            if opcode['opcode'].startswith("mov dword [esp], 0x"):
                return opcode
    print("Error at %s" % offset)


def main():
    r = r2pipe.open('my_bin.so')
    table = r.cmdj('pxj 256 @ 0x00104020')  # read the decryption table

    # Those two commands are only cosmetic
    r.cmd('s 0x000384e0')  # seek to the decryption function
    r.cmd('af')  # create a function at 0x000384e0

    # The `/r` command is to search (`/` like in vim) for _r_eferences
    # The `$$` variable contains the current offset
    # `~[1]` is a filter to get the second column of the output
    for ref in r.cmd('/r $$~[1]').split('\n')  # `/r` doesn't support json yet™
        argument = get_previous_mov_esp(r, ref)
        if argument is None:
            continue
        offset = argument['val']
        print("Offset %s for call at %s" % (hex(offset), ref))
        data = r.cmdj('pxj 256 @ %s' % offset)  # read what's at the offset

        out = ""
        for i in range(data[0]):
            ret = table[(i + data[0]) & 0xf]
            out += chr(ret ^ data[i + 1])
        print(out)

main()

Comparison

IDAPython is a Python2.7 wrapper on top of IDA script. While its API is known to be awkward (juggling between CamelCase and snake_case for everything, its "sometimes you need to pass a context but sometimes you dont" approach, the "epydoc with most of the functions without description is enough" motto), there are countless examples floating on the internet on how to use it for everything.

r2pipe is a magical pipe where you throw r2 commands, and results come out. It's well known that radare2 commands might be "a bit" daunting, but since they are all recursively self-documented with ?*, it's just a matter of bruteforcing keywords, like ?*~references to find the right commands. Worst case, if you don't want to learn some r2-fu, you can always use r2pipe-api, that has a more conventional programming interface, with things like r.at('sym.imp.setenv').disasm(16).

The hardest part in my opinion was to deal with the absence of thorough analysis, like not being able to ask radare2 what is the value of a functions' first argument.

The two script are both instant, and took the same time to be written. The radare one finds 1330 decrypted strings, while IDA finds 1360. The difference is likely because of IDA's ability to propagate values through the control-flow for constructs like this one, where my ghetto-wannabe-analysis-by-bruteforce doesn't.

[0x000d6e6a]> pd 3 @ 0x000d6e62
            0x000d6e62      a180c81000     mov eax, dword [0x10c880]
            0x000d6e67      890424         mov dword [esp], eax
            0x000d6e6a      e87116f6ff     call 0x384e0
[0x000d6e6a]>