Switch cache.nixos.org to ZSTD to fix slow NixOS updates / nix downloads?

It appears to be that NixOS updates on servers and machines with reasonable fast Internet are much slower than the could be because cache.nixos.org packages are LZMA-compressed by default.

Repro

When I do a NixOS 22.05 → 22.11 update on my laptop, nix prints:

these 2656 paths will be fetched (7396.08 MiB download, 28788.08 MiB unpacked):

It proceeds to download at only ~100 Mbit/s. This is much slower than my connection would permit, since in many locations, 1 Gbit/s and even 10 Gbit/s are available.

Nevertheless, despite only pushing ~12 MB/s, the nix process consumes around 100% CPU (varying between 70% and 130% in htop).

Investigation

I suspect LZMA decompression is the culprit. Attaching to the nix process using sudo gdb -p [PID], I see:

Thread 5 (LWP 2294539 "nix-build"):
#0  0x00007feffc87422b in lzma_decode () from target:/nix/store/w3sdhqiazzp4iy40wc2g85mv0grg1cx0-xz-5.2.7/lib/liblzma.so.5
Click to expand full `gdb` thread output
(gdb) thread apply all bt

Thread 5 (LWP 2294539 "nix-build"):
#0  0x00007feffc87422b in lzma_decode () from target:/nix/store/w3sdhqiazzp4iy40wc2g85mv0grg1cx0-xz-5.2.7/lib/liblzma.so.5
#1  0x00007feffc875edd in lzma2_decode () from target:/nix/store/w3sdhqiazzp4iy40wc2g85mv0grg1cx0-xz-5.2.7/lib/liblzma.so.5
#2  0x00007feffc86d666 in decode_buffer () from target:/nix/store/w3sdhqiazzp4iy40wc2g85mv0grg1cx0-xz-5.2.7/lib/liblzma.so.5
#3  0x00007feffc868009 in block_decode () from target:/nix/store/w3sdhqiazzp4iy40wc2g85mv0grg1cx0-xz-5.2.7/lib/liblzma.so.5
#4  0x00007feffc869b70 in stream_decode () from target:/nix/store/w3sdhqiazzp4iy40wc2g85mv0grg1cx0-xz-5.2.7/lib/liblzma.so.5
#5  0x00007feffc860be3 in lzma_code () from target:/nix/store/w3sdhqiazzp4iy40wc2g85mv0grg1cx0-xz-5.2.7/lib/liblzma.so.5
#6  0x00007feffcecbd3d in xz_filter_read () from target:/nix/store/x83nrgbl489c90nnrg84dsmwpmy11cv5-libarchive-3.6.1-lib/lib/libarchive.so.13
#7  0x00007feffcec1856 in __archive_read_filter_ahead () from target:/nix/store/x83nrgbl489c90nnrg84dsmwpmy11cv5-libarchive-3.6.1-lib/lib/libarchive.so.13
#8  0x00007feffceef6f0 in archive_read_format_raw_read_data () from target:/nix/store/x83nrgbl489c90nnrg84dsmwpmy11cv5-libarchive-3.6.1-lib/lib/libarchive.so.13
#9  0x00007feffcec1090 in archive_read_data () from target:/nix/store/x83nrgbl489c90nnrg84dsmwpmy11cv5-libarchive-3.6.1-lib/lib/libarchive.so.13
#10 0x00007feffe12d240 in nix::ArchiveDecompressionSource::read(char*, unsigned long) () from target:/nix/store/1qxf5i4na4a4cdykhxki2wyal82kl0zb-nix-2.11.0/lib/libnixutil.so
#11 0x00007feffe164b29 in nix::Source::drainInto(nix::Sink&) () from target:/nix/store/1qxf5i4na4a4cdykhxki2wyal82kl0zb-nix-2.11.0/lib/libnixutil.so
#12 0x00007feffe12d18e in std::_Function_handler<void (nix::Source&), nix::makeDecompressionSink(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, nix::Sink&)::{lambda(nix::Source&)#1}>::_M_invoke(std::_Any_data const&, nix::Source&) () from target:/nix/store/1qxf5i4na4a4cdykhxki2wyal82kl0zb-nix-2.11.0/lib/libnixutil.so
#13 0x00007feffe16ff9f in void boost::context::detail::fiber_entry<boost::context::detail::fiber_record<boost::context::fiber, nix::VirtualStackAllocator, boost::coroutines2::detail::push_coroutine<bool>::control_block::control_block<nix::VirtualStackAllocator, nix::sourceToSink(std::function<void (nix::Source&)>)::SourceToSink::operator()(std::basic_string_view<char, std::char_traits<char> >)::{lambda(boost::coroutines2::detail::pull_coroutine<bool>&)#1}>(boost::context::preallocated, nix::VirtualStackAllocator&&, nix::sourceToSink(std::function<void (nix::Source&)>)::SourceToSink::operator()(std::basic_string_view<char, std::char_traits<char> >)::{lambda(boost::coroutines2::detail::pull_coroutine<bool>&)#1}&&)::{lambda(boost::context::fiber&&)#1}> >(boost::context::detail::transfer_t) [clone .lto_priv.0] () from target:/nix/store/1qxf5i4na4a4cdykhxki2wyal82kl0zb-nix-2.11.0/lib/libnixutil.so
#14 0x00007feffdb2018f in make_fcontext () from target:/nix/store/1qxf5i4na4a4cdykhxki2wyal82kl0zb-nix-2.11.0/lib/libboost_context.so.1.79.0
#15 0x0000000000000000 in ?? ()

...

Meanwhile checking in htop, this thread is indeed the one with high CPU usage:

nix-build <nixpkgs/nixos> --no-out-link -A system -I nixpkgs=/etc/nixos/nixpkgs

    PID△USER       PRI  NI  VIRT   RES   SHR S CPU%  MEM%    DISK R/W   TIME+  Command
2294539 root        20   0 2165M 1138M 15792 R  67.8  2.4   32.37 M/s  0:04.59 │  │  │                 ├─ nix-build <nixpkgs/nixos> --no-out-link -A

(Seeing threads – gdb LPWs in htop – requires setting F2 → Display options[ ] Hide userland process threads.)

LZMA decoding is slow

From the above, I conclude that LZMA decoding of downloaded binary packages takes 70% at 100 Mbit/s download speed, so downloading at 1 Gbit/s or 10 Gbit/s transfer speeds is impossible due to the decompression bottlenecked.

On NixOS machines with fast Internet, this makes updates take 10x or more longer than necessary.

ZSTD binary caches?

The most obvious solution seems to be to use zstd instead of LZMA. Would that be a solution?

zstd decompresses 10x faster than LZMA ccording to Gregory Szorc's Digital Home | Better Compression with Zstandard

zstd vs other algorithms decompression speed

This article is now already 4 years old, and more decompression improvements have been made to zstd since then, see CHANGELOG, so the factor might be even larger by now.

Nix already supports zstd

ZSTD support for nix as added in April 2021 and released with nix >= 2.4:

Multi-threaded compression and compression level controls was added as well:

What other distros do

Other distributions already did this switch (feel free to edit in, or mention in replies, further ones you know about), in time order:

  • Fedora: Switched to zstd for .rpms since Fedora 31, released October 2019. (source)
  • Arch Linux: Switched from xz (which is LZMA) to zstd in December 2019. (source)

    Recompressing all packages to zstd with our options yields a total ~0.8% increase in package size on all of our packages combined, but the decompression time for all packages saw a ~1300% speedup.

  • Ubuntu: Switched to zstd for .debs since Ubuntu 21.10. (source)
32 Likes

What would make sense is for hydra to start uploading new nars using zstd and slowly we’d get there.

Cachix is getting zstd support very soon too :slight_smile:

8 Likes

This might be a show stopper than, as 2_3 is still in nixpkgs and a valid option for a lot of people.

2 Likes

How difficult is it to backport zstd to Nix 2.3?

3 Likes

Removal of nix_2_3 was proposed in nixVersion.nix_2_3: Remove by mweinelt · Pull Request #204371 · NixOS/nixpkgs · GitHub, but a vocal minority is opposed, given the UX changes and supposed “beta” quality after 2.3.

1 Like

The PR for the zstd support is quite small so a backport looks easy:

But I do not know if this is based on other changes that 2.3 doesn’t have – maybe somebody else knows?

I agree that if backporting zstd support is easy and trivially enables zstd for new packages, that sounds like a great idea.

1 Like

Yes, to be clear, I don’t think it is useful to convert old packages to zstd.

Given that Nix rebuilds everything when a fundamental library like the libc is upgraded, we’d get there a lot faster than “slowly” anyway.

4 Likes

“vocal”? I found a single comment, complaining about unhandled PRs and unhelpful comments without linking those issues/PRs.

I’m totally with Sandro… Just merge it…

4 Likes

There are currently 4 thumbsups to andir’s post and, more coming in, so it doesn’t look like something one should just discard.

I also buy the argument that some stuff doesn’t work in newer nix. I’ve hit workflow-breaking regressions myself as well, and they directly impacted the ability of my company to build stuff.

I do agree that it would help productivity and decision making when issues are stated directly (e.g. link to issue), so that they can be worked on, and so that it’s known what the acceptance criteria for advancing the minimum are, so that there’s a better chance for literally everyone advancing past e.g. nix 2.3. Vague references are more difficult to work with and result in subjective opinions instead of clear steps forward.

However, I still think that

How difficult is it to backport zstd to Nix 2.3?

should be answered if possible since that could solve the problem straight away.

8 Likes

Here is the naive backport without much manual testing (in Nix tradition):

Note: this is incomplete, there have been many changes to how sinks and sources work in Nix. It isn’t unfeasible but requires a bit more work.

8 Likes

I don’t know if there is some history behind this passive-aggressive comment, but in my experience Sandro has always been very responsive and thorough when reviewing PRs, providing good and thoughtful suggestions and undoubtedly is one of the main contributors to nixpkgs.

14 Likes

These numbers make it look like the difference in compression ratio between lzma and zstd are insignificant, but I’m not so sure of this.

cache.nixos.org seems to use xz at the highest level (-9).

Fedora used xz with level -2 which is significantly less effective, and therefore there was no loss switching for them.

The article for Arch Linux does not state which level was used for xz, but as they claim they only have lost 0.8%, I assume they did not use it at (-9) either.
Also the source for ubuntu doesn’t reveal the level used for xz.

I made a little benchmark with qemu, which is quite a big nar file. The xz coming from cache.nixos.org is 117M in size and the zstd compressed one (-19 / highes level) is 129M.
That is a difference of around 10%

Therefore by optimizing for the people sitting behind a fast connection, we will make the experience worse for the people who have to work with limited bandwidth. As a frequent traveler, I’m often in that position. That’s why I care.

Though, I have to admit, 10% loss sounds acceptable to gain 10X speeds elsewhere. Still, it would be nice to have actual numbers here and see what would be the difference on the compressed closure size of a whole nixos system, for example.

10 Likes

Another option is something like GitHub - vasi/pixz: Parallel, indexed xz compressor. A quick benchmark tarring up on some version of CrossOver I had lying around on the mac (969MiB of executables):

Tool CTime Dtime Size
xz -9 5min 10s 207M
zstd -19 40s 1s 254M
pixz -9 1min 2s 217M

Not sure how easy that’d be to integrate into a library but this shows the potential.

Would it be feasible for Hydra to upload multiple files with different compression algorithms and allow the user to choose? Retains compat with 2.3 without a backport if it defaults to xz, but requires double the space for Hydra.

2 Likes

I doubt that this is doable. We don’t even use separateDebugInfo = true; on each package because that’d imply a ~30% (IIRC) increase in storage size of a single nixpkgs build.

1 Like

FWIW that is kind of different. Increasing a single derivation’s output file size burdens all users with increased storage requirements, regardless of the NAR compression used. Increasing the space required to store more than one compressed variant of a NAR only burdens the cache’s storage. Might still be too much of a burden, but I think your comparison doesn’t apply.

1 Like

No, the debug outputs aren’t normally downloaded. EDIT: I believe the concerns really were mainly about the AWS expenses.

3 Likes

I believe the concerns really were mainly about the AWS expenses.

In case that was a response to my comment, then that’s correct. I’m aware that downloads of debug-outputs only happen with e.g. environment.enableDebugInfo = true;, but we still have to store it in S3.

1 Like

Regarding pixz:

Any parallelism speedup it brings is also available to zstd (with pzstd from the same package). That is, pzstd remains ~13x faster than pixz.

For example, I benchmarked this on a 6-core/12-threads Xeon E5-1650, with invocations like pzstd -v file --stdout > /dev/null:

               user(s)   system(s)    CPU   total(s)   througput
10 GiB with zstd 1.5.0
  compression
      zstd     18.61s    3.05s       128%     16.792    607 MB/s
      pzstd    37.91s    2.87s      1067%      3.821   2668 MB/s
  decompression
      zstd      5.37s    0.10s        99%      5.482   1859 MB/s
      pzstd    11.24s    2.20s      1035%      1.299   7848 MB/s

So parallelism brings > 2 GB/s compression and > 8 GB/s decompression, thus suitable also for 10/100 Gbit/s networking and a single fast SSD.

I used zstd -T8 in my testing. It was already as parallel as reasonably possible on this machine.
Sorry for not being explicit about that. Didn’t know about the pzstd alias, else I would’ve just written that for clarity.

>2GB/s compression sounds very high, too high. Are you using /dev/zero as a source here? Extremely low or extremely high entropy test data aren’t very useful for evaluating performance of medium or mixed entropy data IME as programs like zstd like to have specific fastpaths for those kinds of data.