Fuzzing Zig Code Using AFL++

2021-09-19 - Programming - Zig - Fuzzing

After using code coverage information and real-world files to improve an audio metadata parser I am writing in Zig, the next step was to fuzz test it in order to ensure that crashes, memory leaks, etc were ironed out as much as possible.

The problem was that I had no idea how to fuzz Zig code. While Zig uses LLVM and therefore in theory has access to libFuzzer, the necessary integration with SanitizerCoverage has yet to be implemented (see also this comment on a closed PR), so I figured I would try to to find another avenue in the meantime.

Treating zig code as a black box

I thought I’d look into trying afl++ which has support for fuzzing ‘black box’ binaries, meaning it has modes that are intended to allow fuzzing binaries for which no source code is available. This wouldn’t be ideal, but it’d at least be a start. To try this, I wrote a fuzz.zig and compiled it as an executable with libc linked (linking libc seemed to be necessary for this to work):

const std = @import("std");
const audiometa = @import("audiometa");

pub fn main() !void {
    // Setup an allocator that will detect leaks/use-after-free/etc
    var gpa = std.heap.GeneralPurposeAllocator(.{}){};
    // this will check for leaks and crash the program if it finds any
    defer std.debug.assert(gpa.deinit() == false);
    const allocator = &gpa.allocator;

    // Read the data from stdin
    const stdin = std.io.getStdIn();
    const data = try stdin.readToEndAlloc(allocator, std.math.maxInt(usize));
    defer allocator.free(data);
    var stream_source = std.io.StreamSource{ .buffer = std.io.fixedBufferStream(data) };

    // Try to parse the data
    var metadata = try audiometa.metadata.readAll(allocator, &stream_source);
    defer metadata.deinit();
}

Note: afl++ passes the fuzzed data in via stdin by default

With this, I tried a few of the black-box options that afl++ has:

Binary rewriters were a no-go. I tried retrowrite and E9AFL but they both choked on the Zig-compiled binary.
QEMU mode (-Q) would crash immediately on any input; didn’t investigate why this is.
FRIDA mode (-O) worked without any fiddling required.

Note: In FRIDA mode, bugs were marked by afl++ as ‘hangs’ rather than ‘crashes.’ I’m not sure exactly why that is.

And with that, I was off to the races. There was a heavy runtime penalty to running in this mode, but it was able to catch many problems that were subsequently solved:

I wasn’t checking for possible text data size underflows
I wasn’t protecting against out-of-bounds reads when checking UTF-16 BOMs
A few more data size underflow/index out-of-bounds protections were needed elsewhere
I wasn’t handling malformed extended ID3v2 headers safely
There was a bug in the Zig standard library where the std.unicode functions that allocated memory would fail to free the memory if they returned an error

Despite the success, I felt that things could be improved.

Treating zig code as a static library

Normally, afl++ relies on compiling source code with its own patched compilers in order to instrument the fuzzed binary. This approach wouldn’t work for Zig code, but I noticed that afl++ has a ‘LTO (link time optimization) mode’ that instruments the binary at link-time rather than compile-time (with the caveat that the objects must be compiled with LTO enabled). Fortunately, Zig has support for compiling with LTO enabled via the -flto flag.

So, my idea was to compile the Zig code as a static library with LTO enabled, and then use the afl-clang-lto compiler to compile a normal C program that calls the Zig library. This ended up looking like:

const std = @import("std");
const audiometa = @import("audiometa");

// export the zig function so that it can be called from C
export fn fuzz_zig_main() void {
    // code omitted--it's the same as the previous example,
    // but with the try's swapped out for catch unreachable's
}

// fuzz_lib.h
void fuzz_zig_main();

// fuzz.c
#include "fuzz_lib.h"

int main() {
    fuzz_zig_main();
    return 0;
}

I was then able to compile the Zig portion as a static library:

with LTO (passing -flto or setting LibExeObjStep.want_lto = true)
with compiler_rt bundled to avoid undefined symbol: __zig_probe_stack linker errors (passing -fcompiler-rt or setting LibExeObjStep.bundle_compiler_rt = true)

and then compile the C portion via afl-clang-lto and link in the Zig portion:

$ afl-clang-lto -o fuzz.o -c fuzz.c
$ afl-clang-lto -o fuzz fuzz.o -Lzig-out/lib -laudiometa-fuzz
afl-llvm-lto++3.15a by Marc "vanHauser" Heuse <[email protected]>
AUTODICTIONARY: 10 strings found
[+] Instrumented 3426 locations with no collisions (on average 88 collisions would be in afl-gcc/vanilla AFL) (non-hardened mode).

This resulting binary could then be fuzzed as normal:

$ afl-fuzz -i path/to/inputs -o path/to/outputs -- ./fuzz

This hugely improved execution speed–it went from around 500/sec to around 9000/sec. This seemed great, but I still thought there were some unnecessary steps involved.

Treating zig code as an object

Instead of linking the Zig code in as a static library, I wondered if it was possible to compile the Zig code to an object file and then use afl-clang-lto to transform the object file into an executable, thereby getting the instrumentation without having to compile any C code. It turns out this is very possible if you:

Export a callconv(.C) main symbol (i.e. export fn main()) to act as the entry point (without this, afl-clang-lto will compain about an undefined symbol: main)
Call your Zig code from the exported main
And use zig build-obj -lto to build the object file

Here’s an example with some contrived and intentionally buggy code:

const std = @import("std");

fn cMain() callconv(.C) void {
    main() catch unreachable;
}

comptime {
    @export(cMain, .{ .name = "main", .linkage = .Strong });
}

pub fn main() !void {
    var gpa = std.heap.GeneralPurposeAllocator(.{}){};
    // this will check for leaks and crash the program if it finds any
    defer std.debug.assert(gpa.deinit() == false);
    const allocator = &gpa.allocator;

    const stdin = std.io.getStdIn();
    const data = try stdin.readToEndAlloc(allocator, std.math.maxInt(usize));
    defer allocator.free(data);

    if (data.len == 0) return;

    switch (data[0]) {
        0 => {
            // alloc without free
            _ = try allocator.alloc(u8, 10);
        },
        1 => {
            // returning an error
            return error.BadInput;
        },
        else => {},
    }
}

Note: Zig’s error/stack traces don’t seem to work right in the afl-instrumented binaries, so for debugging purposes it’s helpful to compile a second executable with the Zig compiler that can run the crash-inducing outputs to give you relevant stack traces. This example code could be simplified a bit by using export fn main() instead of the more verbose @export, but using @export and pub fn main() in the manner shown above allows the same code to be compiled either for fuzzing or for debugging without any modifications.

To build:

$ zig build-obj -flto fuzz.zig
$ afl-clang-lto -o fuzz fuzz.o

And then run the fuzzer:

$ afl-fuzz -i input -o output -- ./fuzz

total execs : 113k │ total crashes : 13.5k (2 unique)

We can also verify that the resulting crash files trigger the buggy code as expected:

$ cat 'output/default/crashes/id:000000,sig:06,src:000000,time:2,op:havoc,rep:4' | ./fuzz
error(gpa): memory address 0x7ffff7ffb000 leaked: 
$ cat 'output/default/crashes/id:000001,sig:06,src:000000,time:8,op:havoc,rep:8' | ./fuzz
thread 2903735 panic: attempt to unwrap error: BadInput

Note: For some reason this method doesn’t work if you link libc, as you start getting undefined symbol: __zig_probe_stack linker errors. Normally this would be fixed by -fcompiler-rt but it seems like there might currently be a bug with -fcompiler-rt when used with zig build-obj. If you need to link libc for your fuzz tests, you’ll probably need to (for now at least) use the ‘compile as a static library’ method talked about previously instead.

Integrating with `build.zig`

There are probably better ways to do this, but here’s what I was able to come up with:

const std = @import("std");

pub fn build(b: *std.build.Builder) !void {
    // The object file
    const fuzz_obj = b.addObject("fuzz-obj", "fuzz.zig");
    fuzz_obj.setBuildMode(.Debug);
    fuzz_obj.want_lto = true;

    // Setup the output name
    const fuzz_executable_name = "fuzz";
    const fuzz_exe_path = try std.fs.path.join(b.allocator, &[_][]const u8{ b.cache_root, fuzz_executable_name });

    // We want `afl-clang-lto -o path/to/output path/to/object.o`
    const fuzz_compile = b.addSystemCommand(&[_][]const u8{ "afl-clang-lto", "-o" });
    // Add the output path to afl-clang-lto's args
    fuzz_compile.addArg(fuzz_exe_path);
    // Add the path to the object file to afl-clang-lto's args
    fuzz_compile.addArtifactArg(fuzz_obj);

    // Install the cached output to the install 'bin' path
    const fuzz_install = b.addInstallBinFile(.{ .path = fuzz_exe_path }, fuzz_executable_name);

    // Add a top-level step that compiles and installs the fuzz executable
    const fuzz_compile_run = b.step("fuzz", "Build executable for fuzz testing using afl-clang-lto");
    fuzz_compile_run.dependOn(&fuzz_compile.step);
    fuzz_compile_run.dependOn(&fuzz_install.step);
}

With this, running

zig build fuzz

Would build an executable named fuzz and put it into the ‘bin’ install path (zig-out/bin by default) that can then be used with afl-fuzz (note that the compile step requires afl-clang-lto to be installed on the system).

It’s also possible with this setup to easily build a second Zig executable (with the same code) for debugging the crashes as mentioned above. To do this, you could add the following to the build.zig:

// Compile a companion exe for debugging crashes
const fuzz_debug_exe = b.addExecutable("fuzz-debug", "fuzz.zig");
fuzz_debug_exe.setBuildMode(.Debug);

// Only install fuzz-debug when the fuzz step is run
const install_fuzz_debug_exe = b.addInstallArtifact(fuzz_debug_exe);
fuzz_compile_run.dependOn(&install_fuzz_debug_exe.step);

This will build a fuzz-debug executable and install it next to the fuzz executable. When the fuzzer detects a bug, you can then get a proper stack trace by running the offending input through fuzz-debug:

$ cat 'output/default/crashes/id:000000,sig:06,src:000000,time:2,op:havoc,rep:4' | ./zig-out/bin/fuzz-debug
error(gpa): memory address 0x7ffff7ff8000 leaked: 
/home/ryan/Programming/zig/tmp/fuzz/fuzz.zig:25:36: 0x205ec3 in main (fuzz-debug)
            _ = try allocator.alloc(u8, 10);
                                   ^
/home/ryan/Programming/zig/zig/build/lib/zig/std/start.zig:510:37: 0x229a3a in std.start.callMain (fuzz-debug)
            const result = root.main() catch |err| {
                                    ^
...

A complete example can be found here:

https://github.com/squeek502/zig-fuzzing-example

Wrapping up

Hopefully the methods detailed here can serve as a stop-gap until Zig gets more fuzzing capabilities built-in. Funnily enough, the slower FRIDA mode I used initially may have caught all of the bugs in my audio metadata parsing library (or at least all of the low-hanging ones), as after the speedups from the static library/object file methods I haven’t been able to trigger any more crashes.

Fuzzing Zig Code Using AFL++

Fuzzing Zig Code Using AFL++

Treating zig code as a black box

Treating zig code as a static library

Treating zig code as an object

Integrating with `build.zig`

Wrapping up

Recommend

Let's Encrypt's Root Certificate is expiring!

Azure Policy for Kubernetes – custom policies on Azure Arc enabled Kubernetes

iPadOS 15 now available to all iPads supported by iPadOS 14, here’s the full lis...

So what is the deal with A/UX anyways?

How Book Marketing Can Actually Improve Your Content Marketing Results

Explaining Rust Analyzer 15: Error Resilient Parsing

Apple Officially Releases iOS 15 With Improved Notifications and More

Imaging mounted disk volumes under duress

Hands-on with the top iOS 15 features for iPhone [Video]

First iPhone 13 Pro Max unboxing pops up online

About Joyk

Fuzzing Zig Code Using AFL++

Fuzzing Zig Code Using AFL++

Treating zig code as a black box

Treating zig code as a static library

Treating zig code as an object

Integrating with build.zig

Wrapping up

Recommend

About Joyk

Integrating with `build.zig`