Another three weeks have passed, here is the update!
More information about the topic and goals of this project: https://discourse.nixos.org/t/tweag-fellowship-fuzzing-nix-0 .
Previous updates can be found at:
- https://discourse.nixos.org/t/tweag-fellowship-fuzzing-nix-1 ;
- https://discourse.nixos.org/t/tweag-fellowship-fuzzing-nix-2 .
Progress… and difficulties encountered
The most depressing period so far: no progress on previous issues, and I even encountered strong pain points when trying to explore different directions!
New fuzz target
After failing to figure out where the side effects of the harness exercising parsing and evaluation came from, I decided to move on and fuzz a different part of the codebase.
I aimed to make a simpler fuzz target, to at least have a working example: so I targeted Store::parseStorePath .
The fuzz target is pretty straightforward, and I used it to run a less than a day long fuzzing session, which (un)fortunately did not produce any crasher.
At least, I have an operational target!
Harnessing the daemon
Encouraged by this meager success, I got interested in targeting another (with parsing and evaluation) critical component of nix
: the daemon.
Daemon?
the Nix daemon […] is a required component in multi-user Nix installations. It performs build actions and other operations on the Nix store on behalf of non-root users. 1
The daemon receives data over a Unix domain socket, and performs the appropriate actions.
Note that one could directly make the fuzz target use library functions that are called following the reception of the data, but I thought it could be more beneficial to fuzz the way the daemon handles the protocol itself (especially as there are no protocol specification).
The harness
The produced fuzz target thus needs to:
- Simulate a client: Send data (received as a parameter of the fuzz target) to the daemon;
- Receive data using
processConnection
.
So far, the harness creates two threads: one for the client, the other for the daemon, communicating through Unix sockets. What’s left is for the former to send data to the latter…
Structuring Data
The most naive approach would be to have the client forward the raw byte array to the daemon, let it interpret what it receives and act correspondingly.
However, the daemon input is highly structured; It expects: a magic byte, the client version, extra parameters, a sequence of operations and their arguments, an end-of-file.
Leaving to luck that random mutations of Data
will produce sequence of bytes respecting this structure is pretty optimistic.
Gladfully, libFuzzer
provides two means to increase the odds with structure aware fuzzing:
- Define a custom mutator: to manually mutate the
Data
fed to the fuzz target; - Use
libprotobuf-mutator
to produce mutations respecting a given structure definition.
Implementing any of these solution felt like diving into another rabbit hole, so I left this idea on the side for now, to focus my efforts on a more tangible outcome.
Integration to OSS-fuzz
OSS-fuzz offers open source projects the capability to run fuzzers on a dedicated infrastructure, making use of Google’s computing power, (hypothetically) without too much hassle.
Contribute
To contribute nix
to the OSS-fuzz projects, we need:
- One or several fuzz target. The motivation behind our simple target;
-
A
build.sh
script.
Somewhat already dealt with as we keep scripts to build the fuzzers in our fork ofnix
.
However, those scripts usemeson
, which is not (yet?) used upstream; -
A
Dockerfile
for building reproducibility.
“Reproducibility”
OSS-fuzz champions Docker for reproducibility.
An easy-to-use Docker image is provided to simplify toolchain distribution. This also simplifies our support for a variety of Linux distributions and provides a reproducible and secure environment for fuzzer building and execution. 2
However, it appears that their use of Docker only provides “build time” reproducibility.
Packages that are installed via Dockerfile or built as part of build.sh are not available on the bot runtime environment (where the fuzz targets run). 3
All build artifacts needed during fuzz target execution should be inside the $OUT directory. Only those artifacts are archived and used on the bots. Everything else is ignored (e.g. artifacts in $WORK, $SRC, etc) and hence is not available in the execution environment.
We strongly recommend static linking because it just works. […] 4
Another thing to note is that they provide an image with a clang
toolchain: gcr.io/oss-fuzz-base/base-builder
, which is based on Ubuntu 16.04…
Uh-oh .
I didn’t see any other projects’s image use a different base. So I assume OSS-fuzz requires the Dockerfile
to be use theirs.
So, here is what needs to be done, in an Ubuntu 16.04
Docker container: install nix
’s dependencies, and build the statically linked fuzzers.
Status
After failing to install nix
dependencies from the Ubuntu package repositories, I wondered if I could leverage the flake.nix
that we use for development anyway. I made the Dockerfile
install nix
from a tarball, for the root
user. 5
I can successfully nix develop
from the container.
I have yet to update the build of the fuzzers to produce statically linked binaries.
Future plans
Only three weeks are left before the end of the fellowship!
I intend to focus 100% onto bringing nix
to OSS-fuzz.
Stay tuned!
1: nix daemon - Nix Reference Manual
3: Fuzzer environment | OSS-Fuzz
4: Fuzzer environment | OSS-Fuzz
5: Again, I assume it needs to use the root
user as no other projects use another one.