Nix-casync, a more efficient way to store and substitute Nix store paths

This blog post introduces nix-casync, a HTTP binary cache using the casync mechanism internally to efficiently store NAR files in a deduplicated fashion, and provides an outlook on how to use it to speed up substitution.


Yea, we have autoOptimise on the client side. Would be nice to have something similar on the server side :).

Keep up the good work.

I tested nix-casync (rev 25eb0e59e23fd580187cab3c8e7860d9c0044e0c) on a bunch of nar files from In, we store all nar files in ZFS pools, using the builtin lz4 compression in ZFS (with default settings). I compared how nix-casync performs (storage-wise) to that, and also tried out how the zstd compression of ZFS performs (by copying all nar files into a new, zstd-enabled, ZFS pool).

I didn’t look into the performance in terms of throughput and CPU usage, because I ran most things with nice/ionice and also ran multiple disk-intensive tasks at the same time as the nix-casync ingestion. I know (from the HTTP log of nix-casync) that many nar files took multiple minutes to ingest. It would probably be interesting to run a performance test under more controlled circumstances, especially against a large existing chunk store.

This was the data set i tested with (all “real-world” nar files, a subset of the store):

number of nar files 671652
total nar file size (MB) 8012630

After ingesting this into nix-casync, I ended up with 34219994 chunks in the chunk store. I then summed up the sizes of all zstd-compressed chunks, and also of the uncompressed sizes of all chunks to get a feeling for how much deduplication vs zstd-compression matters.

size in MB compress ratio
zfs+lz4 3816761 2.10
zfs+zstd 2978241 2.69
nix-casync (uncompressed) 2487089 3.22
nix-casync (zstd) 1223264 6.55

As you can see, nix-casync managed to achieve an impressive compression rate of 6.55!


Thanks for sharing this data! This is really helpful :slight_smile:

Do you have a breakdown of the (uncompressed) chunk sizes you ended up with?

I’m wondering how much different chunking parameters would affect the compression ratio and average chunk size…
Right now, the chunking parameters are not configurable, but it’s part on my TODO list.

There’s also some more possible, but more complicated to develop improvements:

  • Right now, we just chunk .nar files in as blobs. We could however “parse” them more and actually persist the file store. This would help the chunker to always slice on file boundaries (which might not necessarily happen today)
  • We might not deduplicate some chunks as (only) references to other nix store paths differ in the blob. If we replace store paths with some more generic identifiers before chunking (and restore them back on assembly), we might be able to deduplicate those chunks

The ingestion isn’t very smart right now. It loads everything into a tempfile, then runs a (theoretically parallelized) chunker on that tempfile. We could either do the chunking while still receiving the file (which is not very well exposed from the underlying library), or ramp up the concurrency (at the cost of higher CPU usage during chunking).

It’s probably fine to keep this as-is for now, while we’re still juggling with parameters, and casync-powered substitution isn’t implemented yet :wink:

1 Like

For reference, further nix-casync benchmarks are being coordinated here: Benchmark / Performance Testing · Issue #2 · flokli/nix-casync · GitHub

Hosted by Flying Circus.