Writing code for Snuffleupagus is often fun, and sometimes frustrating because digging inside php7's internals isn't necessarily super-straightworward. System administrators only care about the destination (an hardened php7+ stack) and not so much about the journey (the tears-despair-and-anger-powered process of writing working features and fixing bugs); but I know that some people are wondering how snuffleupagus is working internally, why certain features are missing, how come that I have so many anecdotes about PHP internals, why am I a grumpy person … and also, failures: A finished product usually doesn't tell much about failures, this is unfortunate, because they are usually a great source of cool stories.
Hence this blogpost, that will likely be the first of a series digging into the
details and the processes of implementing features and fixing annoying interesting bugs.
Php has a this weird type called Array,
that is actually an ordered map, meaning that it could be used like a classic
array, but also as a hashtable or a dictionary. Should you want to use it as a
dictionary, php also provides you a nice
array_keys
function:
- if called with your array as sole argument, it'll returns all the keys.
- otherwise, it'll filter on keys for which a value matches the one passed as second parameter. You can also pass a third one, to enable strict comparison.
Because Snuffleupagus has this handy feature to automatically enforce strict comparison as much as possible, everyone expects that it will modify the behaviour of this function accordingly, which is excatly what it does. Well, except for PHP7.4, hence this blogpost.
The testsuite is checking that the following code returns every single key, even with the strict mode activated:
<?php
$toto = [
"toto" => 1,
"titi" => 2,
"tata" => 3,
];
var_dump(array_keys($toto));
Everything was working well for PHP7.0, PHP7.1, PHP7.2, PHP7.3, but not for PHP7.4, for which the displayed value was:
array(0) {
}
The code to hook array_keys uses some boilerplate, but comes down to the
following function:
static void array_handler(INTERNAL_FUNCTION_PARAMETERS, const char* name,
size_t size, zif_handler orig_handler,
const char* spec) {
zif_handler handler;
zval func_name;
zval params[3];
zval *value, *array = NULL;
zend_bool strict = 1;
memset(¶ms, 0, sizeof(params));
zend_parse_parameters(ZEND_NUM_ARGS(), spec, &value, &array, &strict);
ZVAL_COPY(¶ms[0], value);
if (array) {
ZVAL_COPY(¶ms[1], array);
ZVAL_BOOL(¶ms[2], 1);
} else {
ZVAL_BOOL(¶ms[2], 0);
}
ZVAL_STRING(&func_name, name);
handler = zend_hash_str_find_ptr(SNUFFLEUPAGUS_G(sp_internal_functions_hook),
name, size);
zend_internal_function* func =
zend_hash_str_find_ptr(CG(function_table), name, size);
func->handler = handler;
call_user_function(CG(function_table), NULL, &func_name, return_value, 3,
params);
func->handler = orig_handler;
}
Tl;dr The function is parsing the parameters, forcing the third one (strict
mode) to always be 1, retrieving the pointer to the original array_keys
function and finally calling it.
I looked a bit at the commits between PHP7.3 and PHP7.4 with git log
-Sarray_keys but didn't find anything immediately obvious there.
I wasn't convinced that the bugs was in PHP, but rather that we did something
wrong on Snuffleupagus' side, that worked by pure luck combined with a
reproducible-enough memory layout that somehow made things work.
Rubber duck time! There
is something wrong with the memset: there shouldn't be a & there, since we want
to nullify the whole array, not the pointer to it. Also, why bother
conditionally setting ¶ms[2], since we always want it to be 1. The
changes are looking like this:
--- a/src/sp_sloppy.c
+++ b/src/sp_sloppy.c
@@ -43,20 +43,16 @@ static void array_handler(INTERNAL_FUNCTION_PARAMETERS, const char* name,
zval func_name;
zval params[3];
zval *value, *array = NULL;
- zend_bool strict = 1;
+ zend_bool strict;
- memset(¶ms, 0, sizeof(params));
+ memset(params, 0, sizeof(params));
zend_parse_parameters(ZEND_NUM_ARGS(), spec, &value, &array, &strict);
ZVAL_COPY(¶ms[0], value);
+ ZVAL_BOOL(¶ms[2], 1);
if (array) {
ZVAL_COPY(¶ms[1], array);
- ZVAL_BOOL(¶ms[2], 1);
- } else {
- // if there is no array as parameter, don't set strict mode.
- // check php's implementation for details.
- ZVAL_BOOL(¶ms[2], 0);
}
ZVAL_STRING(&func_name, name);
jvoisin@grimhilde 23:49 ~/dev/snuffleupagus
make debug, again, … yay, the test is failing, we have a reproducer!
Quick, to the gdb-mobile (it's like the bat-mobile, but less cool, and with a GNU instead of a bat)!
I added an __asm__ ("int3"); right before call_user_function to
raise a SIGTRAP that I'll be able to easily catch in GDB,
ran make debug to build Snuffleupagus then the testsuite, that obviously
failed the test with a SIGTRAP. But we don't care, we're only interested in
the src/tests/sloppy_comparison/sloppy_comparison_array_keys.sh file that it
just generated for it: I edited it to prefix it with gdb --args, launched it,
hit r, and voilà, we're in gdb, with our failing test:
gdb-peda$ r
Starting program: /usr/bin/php7.2 -n -c /home/jvoisin/dev/snuffleupagus/src/tmp-php.ini -d output_handler= -d open_basedir= -d disable_functions= -d output_buffering=Off -d error_reporting=32767 -d display_errors=1 -d display_startup_errors=1 -d log_errors=0 -d html_errors=0 -d track_errors=0 -d report_memleaks=1 -d report_zend_debug=0 -d docref_root= -d docref_ext=.html -d error_prepend_string= -d error_append_string= -d auto_prepend_file= -d auto_append_file= -d ignore_repeated_errors=0 -d precision=14 -d memory_limit=128M -d log_errors_max_len=0 -d opcache.fast_shutdown=0 -d opcache.file_update_protection=0 -d zend.assertions=1 -d extension_dir=/home/jvoisin/dev/snuffleupagus/src/modules/ -d extension=snuffleupagus.so -d sp.configuration_file=/home/jvoisin/dev/snuffleupagus/src/tests/sloppy_comparison/config/sloppy_comparison.ini -f /home/jvoisin/dev/snuffleupagus/src/tests/sloppy_comparison/sloppy_comparison_array_keys.php
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Program received signal SIGTRAP, Trace/breakpoint trap.
[----------------------------------registers-----------------------------------]
RAX: 0x555555c868a0 --> 0x10000000001
RBX: 0x555555c186a0 --> 0x0
RCX: 0x7379 ('ys')
RDX: 0x555555715ac0 (push r15)
RSI: 0x7ffff2d1d4be --> 0x3733333232390000 ('')
RDI: 0x555555c867c2 --> 0x55feaaaaaaaaaa00
RBP: 0x7fffffffa330 --> 0x7fffffffa350 --> 0x7fffffffa370 --> 0x555555c868a0 --> 0x10000000001
RSP: 0x7fffffffa1d0 --> 0x7ffff2d1d4af --> 0x72726100627a7c7a ('z|zb')
RIP: 0x7ffff2d1a3d7 (<array_handler+1011>: lea rcx,[rbp-0x30])
R8 : 0x55555590b248 --> 0xfff00268fff00279
R9 : 0x0
R10: 0x1
R11: 0x7ffff6057c40 --> 0xfffd4250fffd3f7f
R12: 0x7ffff327d2a0 --> 0x800000000000002
R13: 0x7ffff321d120 --> 0x0
R14: 0x7ffff321d030 --> 0x7ffff328d180 --> 0x55555589ffe5 (<execute_ex+821>: mov r13,QWORD PTR [r14+0x8])
R15: 0x7ffff328d180 --> 0x55555589ffe5 (<execute_ex+821>: mov r13,QWORD PTR [r14+0x8])
EFLAGS: 0x246 (carry PARITY adjust ZERO sign trap INTERRUPT direction overflow)
[-------------------------------------code-------------------------------------]
0x7ffff2d1a3cb <array_handler+999>: mov rdx,QWORD PTR [rbp-0xc0]
0x7ffff2d1a3d2 <array_handler+1006>: mov QWORD PTR [rax+0x30],rdx
0x7ffff2d1a3d6 <array_handler+1010>: int3
=> 0x7ffff2d1a3d7 <array_handler+1011>: lea rcx,[rbp-0x30]
0x7ffff2d1a3db <array_handler+1015>: mov rdx,QWORD PTR [rbp-0x140]
0x7ffff2d1a3e2 <array_handler+1022>: lea rax,[rbp-0x40]
0x7ffff2d1a3e6 <array_handler+1026>: mov r9d,0x1
0x7ffff2d1a3ec <array_handler+1032>: mov r8,rcx
[------------------------------------stack-------------------------------------]
0000| 0x7fffffffa1d0 --> 0x7ffff2d1d4af --> 0x72726100627a7c7a ('z|zb')
0008| 0x7fffffffa1d8 --> 0x7ffff2d1a496 (<zif_sp_array_keys>: push rbp)
0016| 0x7fffffffa1e0 --> 0xa ('\n')
0024| 0x7fffffffa1e8 --> 0x7ffff2d1d4b4 ("array_keys")
0032| 0x7fffffffa1f0 --> 0x7ffff321d0a0 --> 0x0
0040| 0x7fffffffa1f8 --> 0x7ffff321d120 --> 0x0
0048| 0x7fffffffa200 --> 0x1007ffff3281260
0056| 0x7fffffffa208 --> 0x1407
[------------------------------------------------------------------------------]
Legend: code, data, rodata, value
Stopped reason: SIGTRAP
array_handler (execute_data=0x7ffff321d120, return_value=0x7ffff321d0a0, name=0x7ffff2d1d4b4 "array_keys", size=0xa,
orig_handler=0x7ffff2d1a496 <zif_sp_array_keys>, spec=0x7ffff2d1d4af "z|zb") at /home/jvoisin/dev/snuffleupagus/src/sp_sloppy.c:68
68 call_user_function(CG(function_table), NULL, &func_name, return_value, 3,
gdb-peda$ c
Step-step-step-… err, it seems that I have no debug symbols. Fortunately, I can easily install them:
$ apt install php7.2-common-dbgsym
Reading package lists... Done
Building dependency tree
Reading state information... Done
Some packages could not be installed. This may mean that you have
requested an impossible situation or if you are using the unstable
distribution that some required packages have not yet been created
or been moved out of Incoming.
The following information may help to resolve the situation:
The following packages have unmet dependencies:
php7.2-common-dbgsym : Depends: php7.2-common (= 7.2.3-1ubuntu1) but 7.2.15-0ubuntu0.18.04.1 is to be installed
E: Unable to correct problems, you have held broken packages.
zsh: exit 100 apt install php7.2-common-dbgsym
$
Wellp, apparently I can't, but it's ok, we'll just step our way to the function.
As said previously, GDB isn't good at finding PHP's symbols. Fortunately,
it manages to get Snuffleupagus' ones, meaning that we're able to place a
breakpoint on sp_zend_execute_internal, which is the hook for
zend_execute_internal, the function is charge of executing internal (as
opposed to user-defined) php functions.
There is no need to set the breakpoint before running into our SIGTRAP,
otherwise odds are that we'll break on uninteresting functions, like rand(),
since it's called before array_keys in our reproducer.
So, breakpoint hit:
gdb-peda$ c
Continuing.
[----------------------------------registers-----------------------------------]
RAX: 0x7ffff2d0e6cc (<sp_zend_execute_internal>: push rbp)
RBX: 0x0
RCX: 0x3
RDX: 0x7ffff321d0d0 --> 0x0
RSI: 0x7ffff321d0d0 --> 0x0
RDI: 0x7ffff321d230 --> 0x0
RBP: 0x7fffffffa070 --> 0x7fffffffa320 --> 0x0
RSP: 0x7fffffffa060 --> 0x7ffff321d0d0 --> 0x0
RIP: 0x7ffff2d0e6dc (<sp_zend_execute_internal+16>: mov rax,QWORD PTR [rbp-0x8])
R8 : 0x1
R9 : 0x7fffffffa0a8 --> 0x0
R10: 0x20 (' ')
R11: 0x555555c186a0 --> 0x0
R12: 0x555555c868a0 --> 0x10000000001
R13: 0x7fffffffa180 --> 0x38 ('8')
R14: 0x7ffff321d230 --> 0x0
R15: 0x3
EFLAGS: 0x206 (carry PARITY adjust zero sign trap INTERRUPT direction overflow)
[-------------------------------------code-------------------------------------]
0x7ffff2d0e6d0 <sp_zend_execute_internal+4>: sub rsp,0x10
0x7ffff2d0e6d4 <sp_zend_execute_internal+8>: mov QWORD PTR [rbp-0x8],rdi
0x7ffff2d0e6d8 <sp_zend_execute_internal+12>: mov QWORD PTR [rbp-0x10],rsi
=> 0x7ffff2d0e6dc <sp_zend_execute_internal+16>: mov rax,QWORD PTR [rbp-0x8]
0x7ffff2d0e6e0 <sp_zend_execute_internal+20>: mov rdi,rax
0x7ffff2d0e6e3 <sp_zend_execute_internal+23>: call 0x7ffff2d0de9b <is_in_eval_and_whitelisted>
0x7ffff2d0e6e8 <sp_zend_execute_internal+28>: mov rax,QWORD PTR [rip+0x2120a9] # 0x7ffff2f20798 <orig_zend_execute_internal>
0x7ffff2d0e6ef <sp_zend_execute_internal+35>: test rax,rax
[------------------------------------stack-------------------------------------]
0000| 0x7fffffffa060 --> 0x7ffff321d0d0 --> 0x0
0008| 0x7fffffffa068 --> 0x7ffff321d230 --> 0x0
0016| 0x7fffffffa070 --> 0x7fffffffa320 --> 0x0
0024| 0x7fffffffa078 --> 0x5555557e6ab2 (<zend_call_function+2386>: mov r11,QWORD PTR [rsp+0x10])
0032| 0x7fffffffa080 --> 0x7fffffffa0b0 --> 0x1
0040| 0x7fffffffa088 --> 0x7fffffffa0e0 --> 0x7ffff3273160 --> 0x55555589ffe5 (<execute_ex+821>: mov r13,QWORD PTR [r14+0x8])
0048| 0x7fffffffa090 --> 0x555555c186a0 --> 0x0
0056| 0x7fffffffa098 --> 0x5555557ffa8b (cmp eax,0xffffffff)
[------------------------------------------------------------------------------]
Legend: code, data, rodata, value
Breakpoint 1, sp_zend_execute_internal (execute_data=0x7ffff321d230, return_value=0x7ffff321d0d0)
at /home/jvoisin/dev/snuffleupagus/src/sp_execute.c:210
210 is_in_eval_and_whitelisted(execute_data);
gdb-peda$
Since we have the debug symbols here, we'll step with ni, to step instruction
by instruction here, instead of line by line (I may or may not have yelled at GDB for an
uncomfortable amount of time because I had no clue why it was jumping around in
my listing instead of stepping correctly.). Step-step-step-step-step… and
suddenly, a call rax. A call to a register usually indicate that the binary
is about to perform a call via a function pointer, which is what we're looking
for here, since PHP is storing its functions into a hashtable.
Since I'm a lazy bastard, instead of properly reversing the function to see if it's the right one, I did the following:
gdb-peda$ py ["call" in line and print(line) for line in gdb.execute("disassemble $rip,+2048", to_string=True).splitlines()]
0x0000555555715b43: call 0x5555557f8fd0 <_array_init>
0x0000555555715b97: call 0x555555806f50 <_zend_hash_next_index_insert_new>
0x0000555555715cf1: call 0x555555806f50 <_zend_hash_next_index_insert_new>
0x0000555555715d0c: call 0x55555563dd95 <zend_wrong_parameters_count_error>
0x0000555555715d41: call 0x5555557ebc80 <zend_is_identical>
0x0000555555715d70: call 0x55555563de87 <zend_wrong_parameter_type_error>
0x0000555555715df0: call 0x5555557f8fd0 <_array_init>
0x0000555555715dfe: call 0x5555558030e0 <zend_hash_real_init>
0x0000555555715feb: call 0x5555557f8aa0 <zend_parse_arg_bool_slow>
0x000055555571601e: call 0x5555557f3710 <compare_function>
0x0000555555716083: call 0x55555563a170 <memcmp@plt>
0x00005555557160cd: call 0x5555557ee4a0 <zendi_smart_strcmp>
0x00005555557160ea: call 0x555555638030 <__stack_chk_fail@plt>
0x000055555571615a: call 0x5555557f8fd0 <_array_init>
0x0000555555716168: call 0x5555558030e0 <zend_hash_real_init>
0x00005555557162bf: call 0x5555557f8fd0 <_array_init>
gdb-peda$
and
$ ~/dev/php-src git grep -A 128 'PHP_FUNCTION(array_keys)' -- '*.c' | grep -v '/\*' | grep -oE '[a-z0-9_-]+\('
zend_hash_num_elements(
array_init(
fast_is_identical_function(
zend_hash_next_index_insert_new(
fast_equal_check_function(
zend_hash_next_index_insert_new(
array_init_size(
zend_hash_real_init_packed(
zend_hash_get_current_key_zval_ex(
zend_hash_internal_pointer_end_ex(
zend_hash_get_current_key_zval_ex(
$ ~/dev/php-src
Except zend_hash_num_elements which is a macro, it seems that we've got the
right function. I marked this location with set $array_keys = 0x0000555555715ac0.
An easiest solution would have been to simply ask readelf for
the offset of array_keys in the php7 binary, unfortunately, it seems that the
symbol isn't exported:
$ readelf -s /usr/bin/php7.2 | grep -i array_keys
$
Anyway, time to step-step-step-step again until something suspicious arrises.
0x555555715af9: mov rbx,rdi
0x555555715afc: jne 0x555555715d60
0x555555715b02: cmp esi,0x1
=> 0x555555715b05: je 0x555555715dd8
0x555555715b0b: cmp esi,0x2
0x555555715b0e: je 0x555555715c18
0x555555715b14: movzx eax,BYTE PTR [rdi+0x78]
0x555555715b18: cmp al,0x3
JUMP is NOT taken
This is suspicious, why is the jump not taken? To make it more obvious, lets look at the prologue of the function:
gdb-peda$ disassemble 0x555555715ac0,$rip+24
Dump of assembler code from 0x555555715ac0 to 0x555555715b1e:
0x0000555555715ac0: push r15
0x0000555555715ac2: push r14
0x0000555555715ac4: push r13
0x0000555555715ac6: push r12
0x0000555555715ac8: mov r13,rsi
0x0000555555715acb: push rbp
0x0000555555715acc: push rbx
0x0000555555715acd: sub rsp,0x68
0x0000555555715ad1: mov esi,DWORD PTR [rdi+0x2c]
0x0000555555715ad4: mov rax,QWORD PTR fs:0x28
0x0000555555715add: mov QWORD PTR [rsp+0x58],rax
0x0000555555715ae2: xor eax,eax
0x0000555555715ae4: mov BYTE PTR [rsp+0x2f],0x0
0x0000555555715ae9: lea eax,[rsi-0x1]
0x0000555555715aec: cmp eax,0x2
0x0000555555715aef: ja 0x555555715d00
0x0000555555715af5: cmp BYTE PTR [rdi+0x58],0x7
0x0000555555715af9: mov rbx,rdi
0x0000555555715afc: jne 0x555555715d60
0x0000555555715b02: cmp esi,0x1
=> 0x0000555555715b05: je 0x555555715dd8
0x0000555555715b0b: cmp esi,0x2
0x0000555555715b0e: je 0x555555715c18
0x0000555555715b14: movzx eax,BYTE PTR [rdi+0x78]
0x0000555555715b18: cmp al,0x3
0x0000555555715b1a: jne 0x555555715c10
End of assembler dump.
gdb-peda$
Still not obvious? What about with the prologue of the function in C:
/* {{{ proto array array_keys(array input [, mixed search_value[, bool strict]])
Return just the keys from the input array, optionally only for the specified search_value */
PHP_FUNCTION(array_keys)
{
zval *input, /* Input array */
*search_value = NULL, /* Value to search for */
*entry, /* An entry in the input array */
new_val; /* New value */
zend_bool strict = 0; /* do strict comparison */
zend_ulong num_idx;
zend_string *str_idx;
zend_array *arrval;
zend_ulong elem_count;
ZEND_PARSE_PARAMETERS_START(1, 3)
Z_PARAM_ARRAY(input)
Z_PARAM_OPTIONAL
Z_PARAM_ZVAL(search_value)
Z_PARAM_BOOL(strict)
ZEND_PARSE_PARAMETERS_END();
arrval = Z_ARRVAL_P(input);
elem_count = zend_hash_num_elements(arrval);
The comparisons of esi to 1 and 2 and al to 3 are really looking like
the function's arguments count check.
We only passed a single parameter to the array_keys function, and it seems
that despite the fact that we passed a single argument to the function,
the corresponding jump is not taken. Why‽
Well, it's likely because of this line in Snuffleupagus:
call_user_function(CG(function_table), NULL, &func_name, return_value, 3, params);
Despite setting only the first parameter in the params array, we're telling
call_user_function that the function &func_name (in our case, array_keys)
is taking 3 arguments: our array, an empty value, and the value 1. No wonder
the call the array_keys isn't outputting anything: out array doesn't contain
any value strictly equal to nil.
The fix is trivial: lie about the number of parameter, which is exactly was has been done in this commit. It took me roughly 3h to find the bug, push a fix and write this blogpost.