

Fuzzing Zig Code Using AFL++
source link: https://www.ryanliptak.com/blog/fuzzing-zig-code/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Fuzzing Zig Code Using AFL++
2021-09-19 - Programming - Zig - FuzzingAfter using code coverage information and real-world files to improve an audio metadata parser I am writing in Zig, the next step was to fuzz test it in order to ensure that crashes, memory leaks, etc were ironed out as much as possible.
The problem was that I had no idea how to fuzz Zig code. While Zig uses LLVM and therefore in theory has access to libFuzzer
, the necessary integration with SanitizerCoverage
has yet to be implemented (see also this comment on a closed PR), so I figured I would try to to find another avenue in the meantime.
Treating zig code as a black box
I thought I’d look into trying afl++
which has support for fuzzing ‘black box’ binaries, meaning it has modes that are intended to allow fuzzing binaries for which no source code is available. This wouldn’t be ideal, but it’d at least be a start. To try this, I wrote a fuzz.zig
and compiled it as an executable with libc linked (linking libc seemed to be necessary for this to work):
const std = @import("std");
const audiometa = @import("audiometa");
pub fn main() !void {
// Setup an allocator that will detect leaks/use-after-free/etc
var gpa = std.heap.GeneralPurposeAllocator(.{}){};
// this will check for leaks and crash the program if it finds any
defer std.debug.assert(gpa.deinit() == false);
const allocator = &gpa.allocator;
// Read the data from stdin
const stdin = std.io.getStdIn();
const data = try stdin.readToEndAlloc(allocator, std.math.maxInt(usize));
defer allocator.free(data);
var stream_source = std.io.StreamSource{ .buffer = std.io.fixedBufferStream(data) };
// Try to parse the data
var metadata = try audiometa.metadata.readAll(allocator, &stream_source);
defer metadata.deinit();
}
Note: afl++
passes the fuzzed data in via stdin
by default
With this, I tried a few of the black-box options that afl++
has:
- Binary rewriters were a no-go. I tried retrowrite and E9AFL but they both choked on the Zig-compiled binary.
- QEMU mode (
-Q
) would crash immediately on any input; didn’t investigate why this is. - FRIDA mode (
-O
) worked without any fiddling required.
afl++
as ‘hangs’ rather than ‘crashes.’ I’m not sure exactly why that is.And with that, I was off to the races. There was a heavy runtime penalty to running in this mode, but it was able to catch many problems that were subsequently solved:
- I wasn’t checking for possible text data size underflows
- I wasn’t protecting against out-of-bounds reads when checking UTF-16 BOMs
- A few more data size underflow/index out-of-bounds protections were needed elsewhere
- I wasn’t handling malformed extended ID3v2 headers safely
- There was a bug in the Zig standard library where the
std.unicode
functions that allocated memory would fail to free the memory if they returned an error
Despite the success, I felt that things could be improved.
Treating zig code as a static library
Normally, afl++
relies on compiling source code with its own patched compilers in order to instrument the fuzzed binary. This approach wouldn’t work for Zig code, but I noticed that afl++
has a ‘LTO (link time optimization) mode’ that instruments the binary at link-time rather than compile-time (with the caveat that the objects must be compiled with LTO enabled). Fortunately, Zig has support for compiling with LTO enabled via the -flto
flag.
So, my idea was to compile the Zig code as a static library with LTO enabled, and then use the afl-clang-lto
compiler to compile a normal C program that calls the Zig library. This ended up looking like:
const std = @import("std");
const audiometa = @import("audiometa");
// export the zig function so that it can be called from C
export fn fuzz_zig_main() void {
// code omitted--it's the same as the previous example,
// but with the try's swapped out for catch unreachable's
}
// fuzz_lib.h
void fuzz_zig_main();
// fuzz.c
#include "fuzz_lib.h"
int main() {
fuzz_zig_main();
return 0;
}
I was then able to compile the Zig portion as a static library:
- with LTO (passing
-flto
or settingLibExeObjStep.want_lto = true
) - with compiler_rt bundled to avoid
undefined symbol: __zig_probe_stack
linker errors (passing-fcompiler-rt
or settingLibExeObjStep.bundle_compiler_rt = true
)
and then compile the C portion via afl-clang-lto
and link in the Zig portion:
$ afl-clang-lto -o fuzz.o -c fuzz.c
$ afl-clang-lto -o fuzz fuzz.o -Lzig-out/lib -laudiometa-fuzz
afl-llvm-lto++3.15a by Marc "vanHauser" Heuse <[email protected]>
AUTODICTIONARY: 10 strings found
[+] Instrumented 3426 locations with no collisions (on average 88 collisions would be in afl-gcc/vanilla AFL) (non-hardened mode).
This resulting binary could then be fuzzed as normal:
$ afl-fuzz -i path/to/inputs -o path/to/outputs -- ./fuzz
This hugely improved execution speed–it went from around 500/sec to around 9000/sec. This seemed great, but I still thought there were some unnecessary steps involved.
Treating zig code as an object
Instead of linking the Zig code in as a static library, I wondered if it was possible to compile the Zig code to an object file and then use afl-clang-lto
to transform the object file into an executable, thereby getting the instrumentation without having to compile any C code. It turns out this is very possible if you:
- Export a
callconv(.C)
main symbol (i.e.export fn main()
) to act as the entry point (without this,afl-clang-lto
will compain about anundefined symbol: main
) - Call your Zig code from the exported main
- And use
zig build-obj -lto
to build the object file
Here’s an example with some contrived and intentionally buggy code:
const std = @import("std");
fn cMain() callconv(.C) void {
main() catch unreachable;
}
comptime {
@export(cMain, .{ .name = "main", .linkage = .Strong });
}
pub fn main() !void {
var gpa = std.heap.GeneralPurposeAllocator(.{}){};
// this will check for leaks and crash the program if it finds any
defer std.debug.assert(gpa.deinit() == false);
const allocator = &gpa.allocator;
const stdin = std.io.getStdIn();
const data = try stdin.readToEndAlloc(allocator, std.math.maxInt(usize));
defer allocator.free(data);
if (data.len == 0) return;
switch (data[0]) {
0 => {
// alloc without free
_ = try allocator.alloc(u8, 10);
},
1 => {
// returning an error
return error.BadInput;
},
else => {},
}
}
Note: Zig’s error/stack traces don’t seem to work right in the afl
-instrumented binaries, so for debugging purposes it’s helpful to compile a second executable with the Zig compiler that can run the crash-inducing outputs to give you relevant stack traces. This example code could be simplified a bit by using export fn main()
instead of the more verbose @export
, but using @export
and pub fn main()
in the manner shown above allows the same code to be compiled either for fuzzing or for debugging without any modifications.
To build:
$ zig build-obj -flto fuzz.zig
$ afl-clang-lto -o fuzz fuzz.o
And then run the fuzzer:
$ afl-fuzz -i input -o output -- ./fuzz
total execs : 113k │ total crashes : 13.5k (2 unique)
We can also verify that the resulting crash files trigger the buggy code as expected:
$ cat 'output/default/crashes/id:000000,sig:06,src:000000,time:2,op:havoc,rep:4' | ./fuzz
error(gpa): memory address 0x7ffff7ffb000 leaked:
$ cat 'output/default/crashes/id:000001,sig:06,src:000000,time:8,op:havoc,rep:8' | ./fuzz
thread 2903735 panic: attempt to unwrap error: BadInput
Note: For some reason this method doesn’t work if you link libc, as you start getting undefined symbol: __zig_probe_stack
linker errors. Normally this would be fixed by -fcompiler-rt
but it seems like there might currently be a bug with -fcompiler-rt
when used with zig build-obj
. If you need to link libc for your fuzz tests, you’ll probably need to (for now at least) use the ‘compile as a static library’ method talked about previously instead.
Integrating with build.zig
There are probably better ways to do this, but here’s what I was able to come up with:
const std = @import("std");
pub fn build(b: *std.build.Builder) !void {
// The object file
const fuzz_obj = b.addObject("fuzz-obj", "fuzz.zig");
fuzz_obj.setBuildMode(.Debug);
fuzz_obj.want_lto = true;
// Setup the output name
const fuzz_executable_name = "fuzz";
const fuzz_exe_path = try std.fs.path.join(b.allocator, &[_][]const u8{ b.cache_root, fuzz_executable_name });
// We want `afl-clang-lto -o path/to/output path/to/object.o`
const fuzz_compile = b.addSystemCommand(&[_][]const u8{ "afl-clang-lto", "-o" });
// Add the output path to afl-clang-lto's args
fuzz_compile.addArg(fuzz_exe_path);
// Add the path to the object file to afl-clang-lto's args
fuzz_compile.addArtifactArg(fuzz_obj);
// Install the cached output to the install 'bin' path
const fuzz_install = b.addInstallBinFile(.{ .path = fuzz_exe_path }, fuzz_executable_name);
// Add a top-level step that compiles and installs the fuzz executable
const fuzz_compile_run = b.step("fuzz", "Build executable for fuzz testing using afl-clang-lto");
fuzz_compile_run.dependOn(&fuzz_compile.step);
fuzz_compile_run.dependOn(&fuzz_install.step);
}
With this, running
zig build fuzz
Would build an executable named fuzz
and put it into the ‘bin’ install path (zig-out/bin
by default) that can then be used with afl-fuzz
(note that the compile step requires afl-clang-lto
to be installed on the system).
It’s also possible with this setup to easily build a second Zig executable (with the same code) for debugging the crashes as mentioned above. To do this, you could add the following to the build.zig
:
// Compile a companion exe for debugging crashes
const fuzz_debug_exe = b.addExecutable("fuzz-debug", "fuzz.zig");
fuzz_debug_exe.setBuildMode(.Debug);
// Only install fuzz-debug when the fuzz step is run
const install_fuzz_debug_exe = b.addInstallArtifact(fuzz_debug_exe);
fuzz_compile_run.dependOn(&install_fuzz_debug_exe.step);
This will build a fuzz-debug
executable and install it next to the fuzz
executable. When the fuzzer detects a bug, you can then get a proper stack trace by running the offending input through fuzz-debug
:
$ cat 'output/default/crashes/id:000000,sig:06,src:000000,time:2,op:havoc,rep:4' | ./zig-out/bin/fuzz-debug
error(gpa): memory address 0x7ffff7ff8000 leaked:
/home/ryan/Programming/zig/tmp/fuzz/fuzz.zig:25:36: 0x205ec3 in main (fuzz-debug)
_ = try allocator.alloc(u8, 10);
^
/home/ryan/Programming/zig/zig/build/lib/zig/std/start.zig:510:37: 0x229a3a in std.start.callMain (fuzz-debug)
const result = root.main() catch |err| {
^
...
A complete example can be found here:
Wrapping up
Hopefully the methods detailed here can serve as a stop-gap until Zig gets more fuzzing capabilities built-in. Funnily enough, the slower FRIDA mode I used initially may have caught all of the bugs in my audio metadata parsing library (or at least all of the low-hanging ones), as after the speedups from the static library/object file methods I haven’t been able to trigger any more crashes.
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK