This blog post introduces nix-casync, a HTTP binary cache using the casync mechanism internally to efficiently store NAR files in a deduplicated fashion, and provides an outlook on how to use it to speed up substitution.
Yea, we have
autoOptimise on the client side. Would be nice to have something similar on the server side :).
Keep up the good work.
25eb0e59e23fd580187cab3c8e7860d9c0044e0c) on a bunch of
nar files from nixbuild.net. In nixbuild.net, we store all
nar files in ZFS pools, using the builtin
lz4 compression in ZFS (with default settings). I compared how
nix-casync performs (storage-wise) to that, and also tried out how the
zstd compression of ZFS performs (by copying all nar files into a new, zstd-enabled, ZFS pool).
I didn’t look into the performance in terms of throughput and CPU usage, because I ran most things with nice/ionice and also ran multiple disk-intensive tasks at the same time as the
nix-casync ingestion. I know (from the HTTP log of
nix-casync) that many nar files took multiple minutes to ingest. It would probably be interesting to run a performance test under more controlled circumstances, especially against a large existing chunk store.
This was the data set i tested with (all “real-world” nar files, a subset of the nixbuild.net store):
|number of nar files||671652|
|total nar file size (MB)||8012630|
After ingesting this into
nix-casync, I ended up with
34219994 chunks in the chunk store. I then summed up the sizes of all zstd-compressed chunks, and also of the uncompressed sizes of all chunks to get a feeling for how much deduplication vs zstd-compression matters.
|size in MB||compress ratio|
As you can see,
nix-casync managed to achieve an impressive compression rate of
Thanks for sharing this data! This is really helpful
Do you have a breakdown of the (uncompressed) chunk sizes you ended up with?
I’m wondering how much different chunking parameters would affect the compression ratio and average chunk size…
Right now, the chunking parameters are not configurable, but it’s part on my TODO list.
There’s also some more possible, but more complicated to develop improvements:
- Right now, we just chunk
.narfiles in as blobs. We could however “parse” them more and actually persist the file store. This would help the chunker to always slice on file boundaries (which might not necessarily happen today)
- We might not deduplicate some chunks as (only) references to other nix store paths differ in the blob. If we replace store paths with some more generic identifiers before chunking (and restore them back on assembly), we might be able to deduplicate those chunks
The ingestion isn’t very smart right now. It loads everything into a tempfile, then runs a (theoretically parallelized) chunker on that tempfile. We could either do the chunking while still receiving the file (which is not very well exposed from the underlying library), or ramp up the concurrency (at the cost of higher CPU usage during chunking).
It’s probably fine to keep this as-is for now, while we’re still juggling with parameters, and casync-powered substitution isn’t implemented yet
For reference, further nix-casync benchmarks are being coordinated here: Benchmark / Performance Testing · Issue #2 · flokli/nix-casync · GitHub