Nix-casync, a more efficient way to store and substitute Nix store paths

flokli · December 10, 2021, 7:20pm

This blog post introduces nix-casync, a HTTP binary cache using the casync mechanism internally to efficiently store NAR files in a deduplicated fashion, and provides an outlook on how to use it to speed up substitution.

jonringer · December 10, 2021, 7:41pm

Yea, we have autoOptimise on the client side. Would be nice to have something similar on the server side :).

Keep up the good work.

rickynils · December 21, 2021, 7:41pm

I tested nix-casync (rev 25eb0e59e23fd580187cab3c8e7860d9c0044e0c) on a bunch of nar files from nixbuild.net. In nixbuild.net, we store all nar files in ZFS pools, using the builtin lz4 compression in ZFS (with default settings). I compared how nix-casync performs (storage-wise) to that, and also tried out how the zstd compression of ZFS performs (by copying all nar files into a new, zstd-enabled, ZFS pool).

I didn’t look into the performance in terms of throughput and CPU usage, because I ran most things with nice/ionice and also ran multiple disk-intensive tasks at the same time as the nix-casync ingestion. I know (from the HTTP log of nix-casync) that many nar files took multiple minutes to ingest. It would probably be interesting to run a performance test under more controlled circumstances, especially against a large existing chunk store.

This was the data set i tested with (all “real-world” nar files, a subset of the nixbuild.net store):

| | |
|—|—|—|
| number of nar files | 671652 |
| total nar file size (MB) | 8012630 |

After ingesting this into nix-casync, I ended up with 34219994 chunks in the chunk store. I then summed up the sizes of all zstd-compressed chunks, and also of the uncompressed sizes of all chunks to get a feeling for how much deduplication vs zstd-compression matters.

	size in MB	compress ratio
zfs+lz4	3816761	2.10
zfs+zstd	2978241	2.69
nix-casync (uncompressed)	2487089	3.22
nix-casync (zstd)	1223264	6.55

As you can see, nix-casync managed to achieve an impressive compression rate of 6.55!

flokli · December 21, 2021, 8:06pm

Thanks for sharing this data! This is really helpful

Do you have a breakdown of the (uncompressed) chunk sizes you ended up with?

I’m wondering how much different chunking parameters would affect the compression ratio and average chunk size…
Right now, the chunking parameters are not configurable, but it’s part on my TODO list.

There’s also some more possible, but more complicated to develop improvements:

Right now, we just chunk .nar files in as blobs. We could however “parse” them more and actually persist the file store. This would help the chunker to always slice on file boundaries (which might not necessarily happen today)
We might not deduplicate some chunks as (only) references to other nix store paths differ in the blob. If we replace store paths with some more generic identifiers before chunking (and restore them back on assembly), we might be able to deduplicate those chunks

The ingestion isn’t very smart right now. It loads everything into a tempfile, then runs a (theoretically parallelized) chunker on that tempfile. We could either do the chunking while still receiving the file (which is not very well exposed from the underlying library), or ramp up the concurrency (at the cost of higher CPU usage during chunking).

It’s probably fine to keep this as-is for now, while we’re still juggling with parameters, and casync-powered substitution isn’t implemented yet

rickynils · January 14, 2022, 12:46pm

For reference, further nix-casync benchmarks are being coordinated here: Benchmark / Performance Testing · Issue #2 · flokli/nix-casync · GitHub

wmertens · June 13, 2023, 8:16am

@flokli did you end up stripping store paths before dedup? I think it will make a big difference still, since chunks are on the big side and are not aware of store paths, so there will be lots of mismatches.

I outline what I mean by stripping here.

flokli · July 1, 2023, 8:23am

For now I didn’t strip - I wanted to first grab the most obvious deduplication possibility, properly cutting at the individual file boundaries, and making everything a Merkle structure.

That’s why I dropped nix-casync in favor of tvix-store, which does use such a Merkle structure under the hood (with some other nice properties documented there), while the code is still able to synthesize the NAR file stream back if needed.

The code is pretty much in a shape where we could start experimenting with various chunking strategies and deduplication ratios, and check how many of these contain store path references too.

If you’re interested in helping with these efforts, please reach out!