Artificial truth

The more you see, the less you believe.

[archives] [latest] | [homepage] | [atom/rss/twitter]

Playing with Weggli
Thu 14 October 2021 — download

Felix Wilhelm from Google's Project Zero recently released weggli:

weggli is a fast and robust semantic search tool for C and C++ codebases. It is designed to help security researchers identify interesting functionality in large codebases.

Oblivion, avid CodeQL user was of course interested, so we spent an evening on irc drinking beer and trying to come up with interesting queries to run, mostly against the Linux kernel.


To find kmalloc multiplication overflows:

$ weggli --unique -R 'a!=^[A-Z_]+$' 'kmalloc($a * _);' ~/linux

Since this commit, binary expressions are commutative, meaning that the query will match if at least one variable isn't in capital.

In this one, the idea is to find overflows happening only in the allocation, but not in the usage:

$ weggli --unique 'kmalloc($a + _); memcpy(_, _, $a);' ~/linux

A classic mistake in C is to use sizeof(ptr) instead of sizeof(type of the pointed thing):

$ weggli -R 'func=^mem' --unique '$a * _; $func(_ , _, sizeof($a));' ~/linux

Unfortunately, there is currently no way for now to tell weggli that the first argument of $func shouldn't be &a; but it's possible to use something like -R 'b!=&', but it sucks.

Copy functions like memcpy and its friends should always copy up to the size of the target, not the source. Unfortunately, it's not uncommon to see the latter, via this query:

$ weggli --unique -R 'func=co?py' -R 'size=sizeof|strlen' '$func($dest, $src, $size($src));' ~/linux

Variants to match on structures are also producing interesting results:

$ weggli --unique -R 'func=co?py' '$func($dest, $src, $src->$len);' ~/linux
$ weggli --unique -R 'func=co?py' '$func($dest, $src->$buf, $src->$len);' ~/linux

We tried various approaches to find trivial double-frees, like:

$ weggli --unique '{
    NOT: goto _;
    NOT: break;
    NOT: continue;
    NOT: return;
    NOT: $a = _;
}' ~/linux

but didn't manage to make anything elegant, since there is no way to formulate that we don't want any break, goto _, … between the two frees, or at least that the two are reachable.

Variable length arrays are risky and prone to errors; if the length is more than the stack size, a stack overrun will occur, and the possibilities of error checking are… suboptimal. So here's how to find them:

$ weggli --unique '_ $func(_ $len) {
NOT: _ = $buf[$len];
NOT: $buf[$len] = _;
_ $buf[$len];
}' ~/linux

Stupid things like free'ing stack-allocated variables:

$ weggli --unique '$a = alloca(_); free($a);' ~/target

Shady-looking side-effects:

$ weggli --unique -R '$op=\+\+|--' 'if ( _ && _ $op)' ~/linux

Unspecified parameter order evaluation with side-effects in the mix:

$ weggli --unique '$f($a++, $b++)' ~/linux
$ weggli --unique '$f(++$a, ++$b)' ~/linux
$ weggli --unique '$f($a--, $b--)' ~/linux
$ weggli --unique '$f(--$a, --$b)' ~/linux

Division by zero:

$ weggli --unique '$a = 0; _ / $a' ~/linux

Same condition:

$ weggli --unique 'if ($a); else if ($a);' ~/linux

Sizeof void:

$ weggli --unique 'void * $a; sizeof(*$a)' ~/linux

It is possible that not all data has been initialized or that kernel pointers are present:

$ weggli --unique '{
    NOT: $a = memdup_user(_);
    NOT: memset($a);
    NOT: memset($a->$b);
    copy_to_user(_, $a, sizeof(*$a));
}' ~/linux

To find KASLR bypasses like this one:

$ weggli -R 'a=addr' 'dev_info($a);' ~/dev/linux

Not accounting for the terminal 0 when allocating a string via snprintf:

$ weggli --unique '$a = snprintf(0, 0, _); malloc($a);' ~/target

Not reading snprintf's manpage:

weggli --unique '$pos = snprintf(_ + $pos);' ~/target

Since weggli supports C++, here is a dumb one to find type-confusion frees:

$ weggli --cpp --unique '$a = new _; $b = (_) $a; delete $b;' ~/target_cpp


Overflow in format string, since there is no way to express constrains between variables or to manipulate string literals.

$ weggli --unique --contrain '$a>$b' '$buf[$b]; scanf("%$as", $buf);' ~/target

Trivial double-free detection, since there is no way to express that statements must be reachable:

$ weggli --unique --followup 'free($a); free($a);' ~/target

String literal again, and wildcard for the number of arguments:

$ weggli --unique -R 'a=addr' -R 'b=0x%' 'dev_info(_, $b, ..., $a);' ~/target


We found a couple of bugs, but since the goal was to play around, we didn't spend time triaging nor reporting them. Weggli is pretty cool, kind of in-between grep and CodeQL. It still comes with some shortcomings: some by design like the absence of interprocedural semantics and control-flow notions, others because it's still a young project, but Felix is (still?) enthusiastic about adding missing features!