Artificial truth

The more you see, the less you believe.

[archives] [latest] | [homepage] | [atom/rss]

Fixing snuffleupagus' sloppy-comparison on array_keys for PHP7.4
Sat 23 February 2019 — download

Writing code for Snuffleupagus is often fun, and sometimes frustrating because digging inside php7's internals isn't necessarily super-straightworward. System administrators only care about the destination (an hardened php7+ stack) and not so much about the journey (the tears-despair-and-anger-powered process of writing working features and fixing bugs); but I know that some people are wondering how snuffleupagus is working internally, why certain features are missing, how come that I have so many anecdotes about PHP internals, why am I a grumpy person … and also, failures: A finished product usually doesn't tell much about failures, this is unfortunate, because they are usually a great source of cool stories.

Hence this blogpost, that will likely be the first of a series digging into the details and the processes of implementing features and fixing annoying interesting bugs.

Php has a this weird type called Array, that is actually an ordered map, meaning that it could be used like a classic array, but also as a hashtable or a dictionary. Should you want to use it as a dictionary, php also provides you a nice array_keys function:

  • if called with your array as sole argument, it'll returns all the keys.
  • otherwise, it'll filter on keys for which a value matches the one passed as second parameter. You can also pass a third one, to enable strict comparison.

Because Snuffleupagus has this handy feature to automatically enforce strict comparison as much as possible, everyone expects that it will modify the behaviour of this function accordingly, which is excatly what it does. Well, except for PHP7.4, hence this blogpost.

The testsuite is checking that the following code returns every single key, even with the strict mode activated:

<?php
$toto = [
        "toto" => 1,
        "titi" => 2,
        "tata" => 3,
];

var_dump(array_keys($toto));

Everything was working well for PHP7.0, PHP7.1, PHP7.2, PHP7.3, but not for PHP7.4, for which the displayed value was:

array(0) {
}

The code to hook array_keys uses some boilerplate, but comes down to the following function:

static void array_handler(INTERNAL_FUNCTION_PARAMETERS, const char* name,
                          size_t size, zif_handler orig_handler,
                          const char* spec) {
  zif_handler handler;
  zval func_name;
  zval params[3];
  zval *value, *array = NULL;
  zend_bool strict = 1;

  memset(&params, 0, sizeof(params));

  zend_parse_parameters(ZEND_NUM_ARGS(), spec, &value, &array, &strict);

  ZVAL_COPY(&params[0], value);
  if (array) {
    ZVAL_COPY(&params[1], array);
    ZVAL_BOOL(&params[2], 1);
  } else {
    ZVAL_BOOL(&params[2], 0);
  }

  ZVAL_STRING(&func_name, name);

  handler = zend_hash_str_find_ptr(SNUFFLEUPAGUS_G(sp_internal_functions_hook),
                                   name, size);
  zend_internal_function* func =
      zend_hash_str_find_ptr(CG(function_table), name, size);
  func->handler = handler;

  call_user_function(CG(function_table), NULL, &func_name, return_value, 3,
                     params);

  func->handler = orig_handler;
}

Tl;dr The function is parsing the parameters, forcing the third one (strict mode) to always be 1, retrieving the pointer to the original array_keys function and finally calling it.

I looked a bit at the commits between PHP7.3 and PHP7.4 with git log -Sarray_keys but didn't find anything immediately obvious there. I wasn't convinced that the bugs was in PHP, but rather that we did something wrong on Snuffleupagus' side, that worked by pure luck combined with a reproducible-enough memory layout that somehow made things work.

Rubber duck time! There is something wrong with the memset: there shouldn't be a & there, since we want to nullify the whole array, not the pointer to it. Also, why bother conditionally setting &params[2], since we always want it to be 1. The changes are looking like this:

--- a/src/sp_sloppy.c
+++ b/src/sp_sloppy.c
@@ -43,20 +43,16 @@ static void array_handler(INTERNAL_FUNCTION_PARAMETERS, const char* name,
   zval func_name;
   zval params[3];
   zval *value, *array = NULL;
-  zend_bool strict = 1;
+  zend_bool strict;

-  memset(&params, 0, sizeof(params));
+  memset(params, 0, sizeof(params));

   zend_parse_parameters(ZEND_NUM_ARGS(), spec, &value, &array, &strict);

   ZVAL_COPY(&params[0], value);
+       ZVAL_BOOL(&params[2], 1);
   if (array) {
     ZVAL_COPY(&params[1], array);
-    ZVAL_BOOL(&params[2], 1);
-  } else {
-    // if there is no array as parameter, don't set strict mode.
-    // check php's implementation for details.
-    ZVAL_BOOL(&params[2], 0);
   }

   ZVAL_STRING(&func_name, name);
jvoisin@grimhilde 23:49 ~/dev/snuffleupagus   

make debug, again, … yay, the test is failing, we have a reproducer! Quick, to the gdb-mobile (it's like the bat-mobile, but less cool, and with a GNU instead of a bat)! I added an __asm__ ("int3"); right before call_user_function to raise a SIGTRAP that I'll be able to easily catch in GDB, ran make debug to build Snuffleupagus then the testsuite, that obviously failed the test with a SIGTRAP. But we don't care, we're only interested in the src/tests/sloppy_comparison/sloppy_comparison_array_keys.sh file that it just generated for it: I edited it to prefix it with gdb --args, launched it, hit r, and voilà, we're in gdb, with our failing test:

gdb-peda$ r
Starting program: /usr/bin/php7.2 -n -c /home/jvoisin/dev/snuffleupagus/src/tmp-php.ini -d output_handler= -d open_basedir= -d disable_functions= -d output_buffering=Off -d error_reporting=32767 -d display_errors=1 -d display_startup_errors=1 -d log_errors=0 -d html_errors=0 -d track_errors=0 -d report_memleaks=1 -d report_zend_debug=0 -d docref_root= -d docref_ext=.html -d error_prepend_string= -d error_append_string= -d auto_prepend_file= -d auto_append_file= -d ignore_repeated_errors=0 -d precision=14 -d memory_limit=128M -d log_errors_max_len=0 -d opcache.fast_shutdown=0 -d opcache.file_update_protection=0 -d zend.assertions=1 -d extension_dir=/home/jvoisin/dev/snuffleupagus/src/modules/ -d extension=snuffleupagus.so -d sp.configuration_file=/home/jvoisin/dev/snuffleupagus/src/tests/sloppy_comparison/config/sloppy_comparison.ini -f /home/jvoisin/dev/snuffleupagus/src/tests/sloppy_comparison/sloppy_comparison_array_keys.php
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".

Program received signal SIGTRAP, Trace/breakpoint trap.
[----------------------------------registers-----------------------------------]
RAX: 0x555555c868a0 --> 0x10000000001 
RBX: 0x555555c186a0 --> 0x0 
RCX: 0x7379 ('ys')
RDX: 0x555555715ac0 (push   r15)
RSI: 0x7ffff2d1d4be --> 0x3733333232390000 ('')
RDI: 0x555555c867c2 --> 0x55feaaaaaaaaaa00 
RBP: 0x7fffffffa330 --> 0x7fffffffa350 --> 0x7fffffffa370 --> 0x555555c868a0 --> 0x10000000001 
RSP: 0x7fffffffa1d0 --> 0x7ffff2d1d4af --> 0x72726100627a7c7a ('z|zb')
RIP: 0x7ffff2d1a3d7 (<array_handler+1011>:  lea    rcx,[rbp-0x30])
R8 : 0x55555590b248 --> 0xfff00268fff00279 
R9 : 0x0 
R10: 0x1 
R11: 0x7ffff6057c40 --> 0xfffd4250fffd3f7f 
R12: 0x7ffff327d2a0 --> 0x800000000000002 
R13: 0x7ffff321d120 --> 0x0 
R14: 0x7ffff321d030 --> 0x7ffff328d180 --> 0x55555589ffe5 (<execute_ex+821>:    mov    r13,QWORD PTR [r14+0x8])
R15: 0x7ffff328d180 --> 0x55555589ffe5 (<execute_ex+821>:   mov    r13,QWORD PTR [r14+0x8])
EFLAGS: 0x246 (carry PARITY adjust ZERO sign trap INTERRUPT direction overflow)
[-------------------------------------code-------------------------------------]
   0x7ffff2d1a3cb <array_handler+999>:  mov    rdx,QWORD PTR [rbp-0xc0]
   0x7ffff2d1a3d2 <array_handler+1006>: mov    QWORD PTR [rax+0x30],rdx
   0x7ffff2d1a3d6 <array_handler+1010>: int3   
=> 0x7ffff2d1a3d7 <array_handler+1011>: lea    rcx,[rbp-0x30]
   0x7ffff2d1a3db <array_handler+1015>: mov    rdx,QWORD PTR [rbp-0x140]
   0x7ffff2d1a3e2 <array_handler+1022>: lea    rax,[rbp-0x40]
   0x7ffff2d1a3e6 <array_handler+1026>: mov    r9d,0x1
   0x7ffff2d1a3ec <array_handler+1032>: mov    r8,rcx
[------------------------------------stack-------------------------------------]
0000| 0x7fffffffa1d0 --> 0x7ffff2d1d4af --> 0x72726100627a7c7a ('z|zb')
0008| 0x7fffffffa1d8 --> 0x7ffff2d1a496 (<zif_sp_array_keys>:   push   rbp)
0016| 0x7fffffffa1e0 --> 0xa ('\n')
0024| 0x7fffffffa1e8 --> 0x7ffff2d1d4b4 ("array_keys")
0032| 0x7fffffffa1f0 --> 0x7ffff321d0a0 --> 0x0 
0040| 0x7fffffffa1f8 --> 0x7ffff321d120 --> 0x0 
0048| 0x7fffffffa200 --> 0x1007ffff3281260 
0056| 0x7fffffffa208 --> 0x1407 
[------------------------------------------------------------------------------]
Legend: code, data, rodata, value
Stopped reason: SIGTRAP
array_handler (execute_data=0x7ffff321d120, return_value=0x7ffff321d0a0, name=0x7ffff2d1d4b4 "array_keys", size=0xa, 
    orig_handler=0x7ffff2d1a496 <zif_sp_array_keys>, spec=0x7ffff2d1d4af "z|zb") at /home/jvoisin/dev/snuffleupagus/src/sp_sloppy.c:68
68    call_user_function(CG(function_table), NULL, &func_name, return_value, 3,
gdb-peda$ c

Step-step-step-… err, it seems that I have no debug symbols. Fortunately, I can easily install them:

$ apt install php7.2-common-dbgsym
Reading package lists... Done
Building dependency tree       
Reading state information... Done
Some packages could not be installed. This may mean that you have
requested an impossible situation or if you are using the unstable
distribution that some required packages have not yet been created
or been moved out of Incoming.
The following information may help to resolve the situation:

The following packages have unmet dependencies:
 php7.2-common-dbgsym : Depends: php7.2-common (= 7.2.3-1ubuntu1) but 7.2.15-0ubuntu0.18.04.1 is to be installed
E: Unable to correct problems, you have held broken packages.
zsh: exit 100   apt install php7.2-common-dbgsym
$

Wellp, apparently I can't, but it's ok, we'll just step our way to the function. As said previously, GDB isn't good at finding PHP's symbols. Fortunately, it manages to get Snuffleupagus' ones, meaning that we're able to place a breakpoint on sp_zend_execute_internal, which is the hook for zend_execute_internal, the function is charge of executing internal (as opposed to user-defined) php functions. There is no need to set the breakpoint before running into our SIGTRAP, otherwise odds are that we'll break on uninteresting functions, like rand(), since it's called before array_keys in our reproducer.

So, breakpoint hit:

gdb-peda$ c
Continuing.
[----------------------------------registers-----------------------------------]
RAX: 0x7ffff2d0e6cc (<sp_zend_execute_internal>:    push   rbp)
RBX: 0x0 
RCX: 0x3 
RDX: 0x7ffff321d0d0 --> 0x0 
RSI: 0x7ffff321d0d0 --> 0x0 
RDI: 0x7ffff321d230 --> 0x0 
RBP: 0x7fffffffa070 --> 0x7fffffffa320 --> 0x0 
RSP: 0x7fffffffa060 --> 0x7ffff321d0d0 --> 0x0 
RIP: 0x7ffff2d0e6dc (<sp_zend_execute_internal+16>: mov    rax,QWORD PTR [rbp-0x8])
R8 : 0x1 
R9 : 0x7fffffffa0a8 --> 0x0 
R10: 0x20 (' ')
R11: 0x555555c186a0 --> 0x0 
R12: 0x555555c868a0 --> 0x10000000001 
R13: 0x7fffffffa180 --> 0x38 ('8')
R14: 0x7ffff321d230 --> 0x0 
R15: 0x3
EFLAGS: 0x206 (carry PARITY adjust zero sign trap INTERRUPT direction overflow)
[-------------------------------------code-------------------------------------]
   0x7ffff2d0e6d0 <sp_zend_execute_internal+4>: sub    rsp,0x10
   0x7ffff2d0e6d4 <sp_zend_execute_internal+8>: mov    QWORD PTR [rbp-0x8],rdi
   0x7ffff2d0e6d8 <sp_zend_execute_internal+12>:    mov    QWORD PTR [rbp-0x10],rsi
=> 0x7ffff2d0e6dc <sp_zend_execute_internal+16>:    mov    rax,QWORD PTR [rbp-0x8]
   0x7ffff2d0e6e0 <sp_zend_execute_internal+20>:    mov    rdi,rax
   0x7ffff2d0e6e3 <sp_zend_execute_internal+23>:    call   0x7ffff2d0de9b <is_in_eval_and_whitelisted>
   0x7ffff2d0e6e8 <sp_zend_execute_internal+28>:    mov    rax,QWORD PTR [rip+0x2120a9]        # 0x7ffff2f20798 <orig_zend_execute_internal>
   0x7ffff2d0e6ef <sp_zend_execute_internal+35>:    test   rax,rax
[------------------------------------stack-------------------------------------]
0000| 0x7fffffffa060 --> 0x7ffff321d0d0 --> 0x0 
0008| 0x7fffffffa068 --> 0x7ffff321d230 --> 0x0 
0016| 0x7fffffffa070 --> 0x7fffffffa320 --> 0x0 
0024| 0x7fffffffa078 --> 0x5555557e6ab2 (<zend_call_function+2386>: mov    r11,QWORD PTR [rsp+0x10])
0032| 0x7fffffffa080 --> 0x7fffffffa0b0 --> 0x1 
0040| 0x7fffffffa088 --> 0x7fffffffa0e0 --> 0x7ffff3273160 --> 0x55555589ffe5 (<execute_ex+821>:    mov    r13,QWORD PTR [r14+0x8])
0048| 0x7fffffffa090 --> 0x555555c186a0 --> 0x0 
0056| 0x7fffffffa098 --> 0x5555557ffa8b (cmp    eax,0xffffffff)
[------------------------------------------------------------------------------]
Legend: code, data, rodata, value

Breakpoint 1, sp_zend_execute_internal (execute_data=0x7ffff321d230, return_value=0x7ffff321d0d0)
    at /home/jvoisin/dev/snuffleupagus/src/sp_execute.c:210
210   is_in_eval_and_whitelisted(execute_data);
gdb-peda$ 

Since we have the debug symbols here, we'll step with ni, to step instruction by instruction here, instead of line by line (I may or may not have yelled at GDB for an uncomfortable amount of time because I had no clue why it was jumping around in my listing instead of stepping correctly.). Step-step-step-step-step… and suddenly, a call rax. A call to a register usually indicate that the binary is about to perform a call via a function pointer, which is what we're looking for here, since PHP is storing its functions into a hashtable.

Since I'm a lazy bastard, instead of properly reversing the function to see if it's the right one, I did the following:

gdb-peda$ py ["call" in line and print(line) for line in gdb.execute("disassemble $rip,+2048", to_string=True).splitlines()]
   0x0000555555715b43:  call   0x5555557f8fd0 <_array_init>
   0x0000555555715b97:  call   0x555555806f50 <_zend_hash_next_index_insert_new>
   0x0000555555715cf1:  call   0x555555806f50 <_zend_hash_next_index_insert_new>
   0x0000555555715d0c:  call   0x55555563dd95 <zend_wrong_parameters_count_error>
   0x0000555555715d41:  call   0x5555557ebc80 <zend_is_identical>
   0x0000555555715d70:  call   0x55555563de87 <zend_wrong_parameter_type_error>
   0x0000555555715df0:  call   0x5555557f8fd0 <_array_init>
   0x0000555555715dfe:  call   0x5555558030e0 <zend_hash_real_init>
   0x0000555555715feb:  call   0x5555557f8aa0 <zend_parse_arg_bool_slow>
   0x000055555571601e:  call   0x5555557f3710 <compare_function>
   0x0000555555716083:  call   0x55555563a170 <memcmp@plt>
   0x00005555557160cd:  call   0x5555557ee4a0 <zendi_smart_strcmp>
   0x00005555557160ea:  call   0x555555638030 <__stack_chk_fail@plt>
   0x000055555571615a:  call   0x5555557f8fd0 <_array_init>
   0x0000555555716168:  call   0x5555558030e0 <zend_hash_real_init>
   0x00005555557162bf:  call   0x5555557f8fd0 <_array_init>
gdb-peda$ 

and

$ ~/dev/php-src git grep -A 128 'PHP_FUNCTION(array_keys)' -- '*.c' | grep -v '/\*' | grep -oE '[a-z0-9_-]+\('
zend_hash_num_elements(
array_init(
fast_is_identical_function(
zend_hash_next_index_insert_new(
fast_equal_check_function(
zend_hash_next_index_insert_new(
array_init_size(
zend_hash_real_init_packed(
zend_hash_get_current_key_zval_ex(
zend_hash_internal_pointer_end_ex(
zend_hash_get_current_key_zval_ex(
$ ~/dev/php-src   

Except zend_hash_num_elements which is a macro, it seems that we've got the right function. I marked this location with set $array_keys = 0x0000555555715ac0. An easiest solution would have been to simply ask readelf for the offset of array_keys in the php7 binary, unfortunately, it seems that the symbol isn't exported:

$ readelf -s /usr/bin/php7.2 | grep -i array_keys
$ 

Anyway, time to step-step-step-step again until something suspicious arrises.

   0x555555715af9:  mov    rbx,rdi
   0x555555715afc:  jne    0x555555715d60
   0x555555715b02:  cmp    esi,0x1
=> 0x555555715b05:  je     0x555555715dd8
   0x555555715b0b:  cmp    esi,0x2
   0x555555715b0e:  je     0x555555715c18
   0x555555715b14:  movzx  eax,BYTE PTR [rdi+0x78]
   0x555555715b18:  cmp    al,0x3
                                                              JUMP is NOT taken

This is suspicious, why is the jump not taken? To make it more obvious, lets look at the prologue of the function:

gdb-peda$ disassemble 0x555555715ac0,$rip+24
Dump of assembler code from 0x555555715ac0 to 0x555555715b1e:
   0x0000555555715ac0:  push   r15
   0x0000555555715ac2:  push   r14
   0x0000555555715ac4:  push   r13
   0x0000555555715ac6:  push   r12
   0x0000555555715ac8:  mov    r13,rsi
   0x0000555555715acb:  push   rbp
   0x0000555555715acc:  push   rbx
   0x0000555555715acd:  sub    rsp,0x68
   0x0000555555715ad1:  mov    esi,DWORD PTR [rdi+0x2c]
   0x0000555555715ad4:  mov    rax,QWORD PTR fs:0x28
   0x0000555555715add:  mov    QWORD PTR [rsp+0x58],rax
   0x0000555555715ae2:  xor    eax,eax
   0x0000555555715ae4:  mov    BYTE PTR [rsp+0x2f],0x0
   0x0000555555715ae9:  lea    eax,[rsi-0x1]
   0x0000555555715aec:  cmp    eax,0x2
   0x0000555555715aef:  ja     0x555555715d00
   0x0000555555715af5:  cmp    BYTE PTR [rdi+0x58],0x7
   0x0000555555715af9:  mov    rbx,rdi
   0x0000555555715afc:  jne    0x555555715d60
   0x0000555555715b02:  cmp    esi,0x1
=> 0x0000555555715b05:  je     0x555555715dd8
   0x0000555555715b0b:  cmp    esi,0x2
   0x0000555555715b0e:  je     0x555555715c18
   0x0000555555715b14:  movzx  eax,BYTE PTR [rdi+0x78]
   0x0000555555715b18:  cmp    al,0x3
   0x0000555555715b1a:  jne    0x555555715c10
End of assembler dump.
gdb-peda$ 

Still not obvious? What about with the prologue of the function in C:

/* {{{ proto array array_keys(array input [, mixed search_value[, bool strict]])
   Return just the keys from the input array, optionally only for the specified search_value */
PHP_FUNCTION(array_keys)
{
    zval *input,                /* Input array */
         *search_value = NULL,  /* Value to search for */
         *entry,                /* An entry in the input array */
           new_val;             /* New value */
    zend_bool strict = 0;       /* do strict comparison */
    zend_ulong num_idx;
    zend_string *str_idx;
    zend_array *arrval;
    zend_ulong elem_count;

    ZEND_PARSE_PARAMETERS_START(1, 3)
        Z_PARAM_ARRAY(input)
        Z_PARAM_OPTIONAL
        Z_PARAM_ZVAL(search_value)
        Z_PARAM_BOOL(strict)
    ZEND_PARSE_PARAMETERS_END();
    arrval = Z_ARRVAL_P(input);
    elem_count = zend_hash_num_elements(arrval);

The comparisons of esi to 1 and 2 and al to 3 are really looking like the function's arguments count check.

We only passed a single parameter to the array_keys function, and it seems that despite the fact that we passed a single argument to the function, the corresponding jump is not taken. Why‽

Well, it's likely because of this line in Snuffleupagus:

call_user_function(CG(function_table), NULL, &func_name, return_value, 3, params); 

Despite setting only the first parameter in the params array, we're telling call_user_function that the function &func_name (in our case, array_keys) is taking 3 arguments: our array, an empty value, and the value 1. No wonder the call the array_keys isn't outputting anything: out array doesn't contain any value strictly equal to nil.

The fix is trivial: lie about the number of parameter, which is exactly was has been done in this commit. It took me roughly 3h to find the bug, push a fix and write this blogpost.