Playing with Weggli
source link: https://dustri.org/b/playing-with-weggli.html
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
Felix Wilhelm from Google's Project Zero recently released weggli:
weggli is a fast and robust semantic search tool for C and C++ codebases. It is designed to help security researchers identify interesting functionality in large codebases.
Oblivion, avid CodeQL user was of course interested, so we spent an evening on irc drinking beer and trying to come up with interesting queries to run, mostly against the Linux kernel.
Queries
To find kmalloc
multiplication overflows:
$ weggli --unique -R 'a!=^[A-Z_]+$' 'kmalloc($a * _);' ~/linux
Since this commit, binary expressions are commutative, meaning that the query will match if at least one variable isn't in capital.
In this one, the idea is to find overflows happening only in the allocation, but not in the usage:
$ weggli --unique 'kmalloc($a + _); memcpy(_, _, $a);' ~/linux
A classic mistake in C is to use sizeof(ptr)
instead of sizeof(type of the
pointed thing)
:
$ weggli -R 'func=^mem' --unique '$a * _; $func(_ , _, sizeof($a));' ~/linux
Unfortunately, there is currently no way for now to tell weggli that the first
argument of $func
shouldn't be &a
; but it's possible to use something like
-R 'b!=&'
, but it sucks.
Copy functions like memcpy
and its friends should always copy up to the size
of the target, not the source. Unfortunately, it's not uncommon to see the
latter, via this query:
$ weggli --unique -R 'func=co?py' -R 'size=sizeof|strlen' '$func($dest, $src, $size($src));' ~/linux
Variants to match on structures are also producing interesting results:
$ weggli --unique -R 'func=co?py' '$func($dest, $src, $src->$len);' ~/linux
$ weggli --unique -R 'func=co?py' '$func($dest, $src->$buf, $src->$len);' ~/linux
We tried various approaches to find trivial double-frees, like:
$ weggli --unique '{
kfree($a);
NOT: goto _;
NOT: break;
NOT: continue;
NOT: return;
NOT: $a = _;
kfree($a);
}' ~/linux
but didn't manage to make anything elegant, since there is no way to formulate
that we don't want any break
, goto _
, … between the two frees, or at least
that the two are reachable.
Variable length arrays are risky and prone to errors; if the length is more than the stack size, a stack overrun will occur, and the possibilities of error checking are… suboptimal. So here's how to find them:
$ weggli --unique '_ $func(_ $len) {
NOT: _ = $buf[$len];
NOT: $buf[$len] = _;
_ $buf[$len];
}' ~/linux
Stupid things like free'ing stack-allocated variables:
$ weggli --unique '$a = alloca(_); free($a);' ~/target
Shady-looking side-effects:
$ weggli --unique -R '$op=\+\+|--' 'if ( _ && _ $op)' ~/linux
Unspecified parameter order evaluation with side-effects in the mix:
$ weggli --unique '$f($a++, $b++)' ~/linux
$ weggli --unique '$f(++$a, ++$b)' ~/linux
$ weggli --unique '$f($a--, $b--)' ~/linux
$ weggli --unique '$f(--$a, --$b)' ~/linux
Division by zero:
$ weggli --unique '$a = 0; _ / $a' ~/linux
Same condition:
$ weggli --unique 'if ($a); else if ($a);' ~/linux
Sizeof void:
$ weggli --unique 'void * $a; sizeof(*$a)' ~/linux
It is possible that not all data has been initialized or that kernel pointers are present:
$ weggli --unique '{
NOT: $a = memdup_user(_);
NOT: memset($a);
NOT: memset($a->$b);
copy_to_user(_, $a, sizeof(*$a));
}' ~/linux
To find KASLR bypasses like this one:
$ weggli -R 'a=addr' 'dev_info($a);' ~/dev/linux
Not accounting for the terminal 0
when allocating a string via snprintf
:
$ weggli --unique '$a = snprintf(0, 0, _); malloc($a);' ~/target
Not reading snprintf
's manpage:
weggli --unique '$pos = snprintf(_ + $pos);' ~/target
Since weggli supports C++, here is a dumb one to find type-confusion frees:
$ weggli --cpp --unique '$a = new _; $b = (_) $a; delete $b;' ~/target_cpp
Limitations
Overflow in format string, since there is no way to express constrains between variables or to manipulate string literals.
$ weggli --unique --contrain '$a>$b' '$buf[$b]; scanf("%$as", $buf);' ~/target
Trivial double-free detection, since there is no way to express that statements must be reachable:
$ weggli --unique --followup 'free($a); free($a);' ~/target
String literal again, and wildcard for the number of arguments:
$ weggli --unique -R 'a=addr' -R 'b=0x%' 'dev_info(_, $b, ..., $a);' ~/target
Conclusion
We found a couple of bugs, but since the goal was to play around, we didn't
spend time triaging nor reporting them. Weggli is pretty cool, kind of
in-between grep
and CodeQL
. It still comes with some shortcomings: some by
design like the absence of interprocedural semantics and control-flow notions,
others because it's still a young project, but Felix is (still?) enthusiastic
about adding missing features!
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK