Tweag Fellowship: Fuzzing Nix #2

Pamplemousse · July 23, 2021, 5:29pm

Six weeks in, time for an update!

More information about the topic and goals of this project: https://discourse.nixos.org/t/tweag-fellowship-fuzzing-nix-0 .
Previous update can be found at: https://discourse.nixos.org/t/tweag-fellowship-fuzzing-nix-1 .

Progress

Generally, progress seemed slower than for the first three weeks.

First bug

I caught the first bug using a fuzzing harness exercising the code instrumented with ASan.
As far as I can tell, it did not have serious security implications, but it is nice to see our efforts starting to pay off!

The underlying issue received an easy fix, via libexpr: Fix read out-of-bound on the heap by Pamplemousse · Pull Request #5011 · NixOS/nix · GitHub , and got merged quickly.

Building the fuzzing binaries with `meson`

In the previous update, I mentioned the difficulties I had to use the current build system to fit my use case.

After a couple of days trying to implement the building of the necessary components for fuzzing with make without success, I found https://github.com/NixOS/nix/pull/3160 , introducing meson as a replacement.
Standing on its shoulders, I managed to get something working within a couple of hours.

Hence, I plan continuing to work on top of this PR for the time being, and humbly help it to approach a “production-ready” state.

More crashes

After setting up dedicated hardware, I was able to run a fuzzing session for a longer time.
Interestingly, it managed to find hundred of crashers in a couple of hours, after which I stopped (paused) it, as there is no point gathering more than what could be humanly triageable.

A “crasher” (sometimes “crash file”, or simply “crash”) is a file containing the Data causing the fuzz target to fail (either from critical memory corruption - segfaults, or from corruptions detected by ASan).
And although libFuzzer does its best to avoid redundant crashers based on the coverage they produce (only one of two crashers exercising the same code path would be reported), a single bug could still produce several crashers.

I spent some time investigating the crashers obtained earlier, prioritising the fourteen ones produced by running the fuzzer with ASan on the test expressions.

Sadly, it made apparent that all of them were due to a faulty fuzz target, offering me a great transition to the next section.

Difficulties encountered

Side effects of the fuzz target

Reminders

fuzz target - a function that accepts an array of bytes and does something interesting with these bytes using the API under test ²

Using libFuzzer, our fuzz target is implemented as the body of the LLVMFuzzerTestOneInput(const uint8_t *Data, size_t Size) function.
Being “in-process”, libFuzzer calls this function thousands of times in a raw, providing different Data (and Size accordingly).

To make the process deterministic, LLVMFuzzerTestOneInput should avoid mutating a global state (as a subsequent call to this function will have requirements that are not reproducible in isolation).

Symptoms

Take two minimized crashers:

$ cat map.bug
map

$ cat seq.bug
builtins.seq

Passing them one by one through the fuzzer¹ does not trigger any error:

$ ASAN_OPTIONS=detect_leaks=false ./buildir/fuzz/parse_eval-fuzzer-with-asan -detect_leaks=0 map.bug 2>/dev/null && echo "success"
success

$ ASAN_OPTIONS=detect_leaks=false ./buildir/fuzz/parse_eval-fuzzer-with-asan -detect_leaks=0 seq.bug 2>/dev/null && echo "success"
success

But ASan reports a memory violation when the fuzzer runs them consequently (order does not matter):

$ ASAN_OPTIONS=detect_leaks=false ./buildir/fuzz/parse_eval-fuzzer-with-asan -detect_leaks=0 map.bug seq.bug 2>&1 | head -n 9 | tail -n 5
Running: map.bug
Executed map.bug in 26 ms
Running: seq.bug
=================================================================
==19174==ERROR: AddressSanitizer: dynamic-stack-buffer-overflow on address 0x7fffffff2e48 at pc 0x7ffff5279eb5 bp 0x7fffffff2d90 sp 0x7fffffff2d88

$ ASAN_OPTIONS=detect_leaks=false ./buildir/fuzz/parse_eval-fuzzer-with-asan -detect_leaks=0 seq.bug map.bug 2>&1 | head -n 9 | tail -n 5
Running: seq.bug
Executed seq.bug in 23 ms
Running: map.bug
=================================================================
==19300==ERROR: AddressSanitizer: dynamic-stack-buffer-overflow on address 0x7fffffff3168 at pc 0x7ffff5296f58 bp 0x7fffffff3060 sp 0x7fffffff3058

As if the first run of LLVMFuzzerTestOneInput left a global state causing the second run to fail…

What now?

I am still in the process of trying to understand this bug better, and finding a way to fix it.

It’s recognized that the evaluation function (EvalState::eval) is not reentrant, so any global state involved in the evaluation is one of our primary suspect;
Also, any other global state that gets mutated might be on this list.

Future plans

Currently buggy, the harness is useless, so fixing it is a priority.
Once that is done, I intend to resume where I left of:

Run long fuzzing sessions;
Triage and patch bugs;
Enrich the fuzzing toolset (build fuzzers with different sanitizers).

Stay tuned!

cyplo · August 1, 2021, 6:42am

Really nice writeup, keep up the good work !