Switch cache.nixos.org to ZSTD to fix slow NixOS updates / nix downloads?

I encountered 2 issues with pixz, one about compression ratio, and one about using all available cores:

Anybody know how can I get pixz’s compression side to use all cores? On my 6-core/12-thread machine, it uses only 400% CPU for the chromium store path, which over time of the compression drops to 300% and then to 200%. The averace usage according to time is then 270%.

Passing -p6 or -p12 doesn’t seem to change anything about that.


I have now added benchmarks of pixz 1.0.7 to the table (including the above-mentioned problem).

I have also added maxres outputs from command time, showing how many MB maximum RAM were needed for compression and decompression:

                             |--------------------- compression ------------------------------------| |------ decompression ------|
                                                                      per-core   total                          total
                       size  user(s) system(s)  CPU  total(m)  throughput throughput  maxres  total(s) throughput maxres  comments
chromium: tar c /nix/store/620lqprbzy4pgd2x4zkg7n19rfd59ap7-chromium-unwrapped-108.0.5359.98     
  uncompressed         473Mi
  compression (note `--ultra` is given to enable zstd levels > 19), zstd v1.5.0, XZ utils v5.2.5
    xz -9              102M  216.34   0.54s     99%  3:37.07   2.29 MB/s  2.29 MB/s   691 MB   6.227     79 MB/s   67 MB
    pixz -9            137M  216.98   1.71s    271%  1:20.57   2.28 MB/s  6.19 MB/s  2951 MB   2.551    194 MB/s  657 MB  did not use all cores consistently, for both compression and decompression
    zstd -19           113M  176.42   0.56s    100%  2:56.66   2.81 MB/s  2.81 MB/s   241 MB   0.624    794 MB/s   10 MB
    zstd -19  --long   111M  200.84   0.52s    100%  3:21.07   2.46 MB/s  2.46 MB/s   454 MB   0.686    722 MB/s  133 MB
    zstd -22           108M  210.77   0.74s    100%  3:31.44   2.35 MB/s  2.35 MB/s  1263 MB   0.716    692 MB/s  133 MB
    zstd -22  --long   108M  214.96   0.64s    100%  3:35.53   2.30 MB/s  2.30 MB/s  1263 MB   0.716    692 MB/s  133 MB  bit-identical to above for this input
    pzstd -19          114M  270.05   1.20s   1064%    25.47   1.83 MB/s 19.83 MB/s  1641 MB   0.244   2032 MB/s  564 MB
    pzstd -22          108M  224.17   0.66s    100%  3:44.80   2.21 MB/s  2.21 MB/s  1392 MB   0.721    687 MB/s  245 MB  single-threaded comp/decomp!

Oddly, pixz produces a much worse compresison ratio than any of the other approaches.

xz/pixz need 5x more RAM for decompression

pzstd needs disproportionately much RAM for decompression. I suspect this is because with the invocation pzstd -d ... > /dev/null, outputs from the various threads need to be bufferend in-memory to write them in order into the output pipe.
However, pzstd does this even when writing outputs to a regular file with -o.

I also checked how much RAM plain zstd needs to decompress the pzstd outputs; there is no difference compared to decompressing the zstd outputs.


For the chromium derivation in my table above, single-threaded zstd has 10x higher decompression throughput than single-threaded xz, and for pzstd vs pixz it’s also 10x.

Summary of my findings so far

On this chromium tar:

  • single-threaded:
    • zstd -19 wins against xz -9 on decompression speed (10x) and decompression memory usage (5x)
    • zstd -22 still wins against xz -9 on decompression speed (same 10x) but loses against decompression memory usage (0.5x)
    • xz -9 wins on best compression ratio: By 10% against zstd -19 and 5% against zstd -22
  • multi-threaded:
    • pixz -9 loses against pzstd -19 on all metrics
    • decompression memory usage can apparently be reduced 10x by decompressing 6 zstd -19 archives independently, rather than using pzstd -d to decompress 1 zstd -19 archive. That seems wrong, at least when writing to regular files. I filed a zstd issue.

Please point out if you spot any mistakes!

1 Like