GitHub - elfshaker/elfshaker: elfshaker stores binary objects efficiently

2 Likes

This seems to be very interesting optimisation we could perhaps apply to the nix store

1 Like

There’s a similar tool called bup and there’s been some discussion & testing.

https://github.com/NixOS/nixpkgs/issues/89380

For reference, on Btrfs using zstd:1 compression and auto-optimise-store = true:

Type       Perc     Disk Usage   Uncompressed Referenced
TOTAL       54%       75G         138G         350G
none       100%       46G          46G         135G
zstd        31%       29G          92G         214G

In what is probably a typical scenario, hardlinking already saves ~45% (350GB → 138GB). Compression via ZSTD then saves another ~45% (138GB → 75GB). The remainder of space to be saved as far as I can tell is the 46GB that did not compress and represents ~60% of the on-disk usage. These are most likely the binary files that Btrfs skipped compression on so these could be compacted something like Elfshaker or bud.

One thing I think is important to note is that Elfshaker still works best if different versions of the same binary only change slightly so they had to compile LLVM with specific options to achieve that level of compaction. It sounds like this change is what allows them to go from a ~20% standard compression ration to 0.01%.

It works particularly well for our presented use case because storing pre-link object files has these properties:

  • There are many files,
  • Most of them don’t change very often so there are a lot of duplicate files,
  • When they do change, the deltas of the binaries are not huge.

We achieve this in manyclangs by compiling object code with the -ffunction-sections and -fdata-sections compiler flags. This has the effect that if you ‘insert’ a function into a translation unit, the insertion does not cause all of the addresses to change across the whole object file.

2 Likes

I’m not quite sure how this could be applied, but around 1s access time sounds like too much for any use case

elfshaker does not provide transparent on demand access. One could probably use this tools when coupling elfshaker snapshots to NixOS or profile generations, but store paths not available in the currently active elfshaker snapshot would have to be-redownloaded in the case of a nix run even if they are present in one of the inactive snapshots.
If your use case is keeping a lot of generations and you use the on-demand functionality of nix (including nix-shell, per project-environments etc) very sparingly, it might be pluggable into the nixos-rebuild script.
Plus: elfshaker doesn’t seem to have a way to remove snapshots, so you will have endless fun with garbage collection.

1 Like

zfs dedup gets good results too, notionally better than hardlinking because the files don’t have to be entirely identical, only have blocks in common.