Tweag Fellowship: Fuzzing Nix #3

Pamplemousse · August 13, 2021, 7:00pm

Another three weeks have passed, here is the update!

More information about the topic and goals of this project: https://discourse.nixos.org/t/tweag-fellowship-fuzzing-nix-0 .
Previous updates can be found at:

Progress… and difficulties encountered

The most depressing period so far: no progress on previous issues, and I even encountered strong pain points when trying to explore different directions!

New fuzz target

After failing to figure out where the side effects of the harness exercising parsing and evaluation came from, I decided to move on and fuzz a different part of the codebase.
I aimed to make a simpler fuzz target, to at least have a working example: so I targeted Store::parseStorePath .

The fuzz target is pretty straightforward, and I used it to run a less than a day long fuzzing session, which (un)fortunately did not produce any crasher.

At least, I have an operational target!

Harnessing the daemon

Encouraged by this meager success, I got interested in targeting another (with parsing and evaluation) critical component of nix: the daemon.

Daemon?

the Nix daemon […] is a required component in multi-user Nix installations. It performs build actions and other operations on the Nix store on behalf of non-root users. ¹

The daemon receives data over a Unix domain socket, and performs the appropriate actions.
Note that one could directly make the fuzz target use library functions that are called following the reception of the data, but I thought it could be more beneficial to fuzz the way the daemon handles the protocol itself (especially as there are no protocol specification).

The harness

The produced fuzz target thus needs to:

Simulate a client: Send data (received as a parameter of the fuzz target) to the daemon;
Receive data using processConnection.

So far, the harness creates two threads: one for the client, the other for the daemon, communicating through Unix sockets. What’s left is for the former to send data to the latter…

Structuring `Data`

The most naive approach would be to have the client forward the raw byte array to the daemon, let it interpret what it receives and act correspondingly.
However, the daemon input is highly structured; It expects: a magic byte, the client version, extra parameters, a sequence of operations and their arguments, an end-of-file.

Leaving to luck that random mutations of Data will produce sequence of bytes respecting this structure is pretty optimistic.
Gladfully, libFuzzer provides two means to increase the odds with structure aware fuzzing:

Define a custom mutator: to manually mutate the Data fed to the fuzz target;
Use libprotobuf-mutator to produce mutations respecting a given structure definition.

Implementing any of these solution felt like diving into another rabbit hole, so I left this idea on the side for now, to focus my efforts on a more tangible outcome.

Integration to OSS-fuzz

OSS-fuzz offers open source projects the capability to run fuzzers on a dedicated infrastructure, making use of Google’s computing power, (hypothetically) without too much hassle.

Contribute

To contribute nix to the OSS-fuzz projects, we need:

One or several fuzz target. The motivation behind our simple target;
A build.sh script.
Somewhat already dealt with as we keep scripts to build the fuzzers in our fork of nix.
However, those scripts use meson, which is not (yet?) used upstream;
A Dockerfile for building reproducibility.

“Reproducibility”

OSS-fuzz champions Docker for reproducibility.

An easy-to-use Docker image is provided to simplify toolchain distribution. This also simplifies our support for a variety of Linux distributions and provides a reproducible and secure environment for fuzzer building and execution. ²

However, it appears that their use of Docker only provides “build time” reproducibility.

Packages that are installed via Dockerfile or built as part of build.sh are not available on the bot runtime environment (where the fuzz targets run). ³

All build artifacts needed during fuzz target execution should be inside the $OUT directory. Only those artifacts are archived and used on the bots. Everything else is ignored (e.g. artifacts in $WORK, $SRC, etc) and hence is not available in the execution environment.

We strongly recommend static linking because it just works. […] ⁴

Another thing to note is that they provide an image with a clang toolchain: gcr.io/oss-fuzz-base/base-builder, which is based on Ubuntu 16.04…
Uh-oh .
I didn’t see any other projects’s image use a different base. So I assume OSS-fuzz requires the Dockerfile to be use theirs.

So, here is what needs to be done, in an Ubuntu 16.04 Docker container: install nix’s dependencies, and build the statically linked fuzzers.

Status

After failing to install nix dependencies from the Ubuntu package repositories, I wondered if I could leverage the flake.nix that we use for development anyway. I made the Dockerfile install nix from a tarball, for the root user. ⁵

I can successfully nix develop from the container.

I have yet to update the build of the fuzzers to produce statically linked binaries.