Introducing Attic, a self-hostable Nix Binary Cache server

Another thing I don’t understand: Why would length changes matter? The benchmarks are doing content-defined chunking (CDC). They deduplicate even in the presence of length changes of store paths.

Further, should zeroing out store paths bring a significant improvement of deduplication?

Most references to store paths should be in the section of binaries or .so files that are usef for dynamic linking, e.g. this for chromium:

# grep --text --byte-offset -R -P '/nix/store/.{32}-' /nix/store/3gpnym3j1aiys0isd6vqvzhr7fg87bv5-chromium-unwrapped-111.0.5563.110 | cut -d: -f-2        
/nix/store/3gpnym3j1aiys0isd6vqvzhr7fg87bv5-chromium-unwrapped-111.0.5563.110/libexec/chromium/libvk_swiftshader.so:6537
/nix/store/3gpnym3j1aiys0isd6vqvzhr7fg87bv5-chromium-unwrapped-111.0.5563.110/libexec/chromium/libvulkan.so.1:15569
/nix/store/3gpnym3j1aiys0isd6vqvzhr7fg87bv5-chromium-unwrapped-111.0.5563.110/libexec/chromium/libVkLayer_khronos_validation.so:8277
/nix/store/3gpnym3j1aiys0isd6vqvzhr7fg87bv5-chromium-unwrapped-111.0.5563.110/libexec/chromium/libVkICD_mock_icd.so:4477
/nix/store/3gpnym3j1aiys0isd6vqvzhr7fg87bv5-chromium-unwrapped-111.0.5563.110/libexec/chromium/chrome_crashpad_handler:0
/nix/store/3gpnym3j1aiys0isd6vqvzhr7fg87bv5-chromium-unwrapped-111.0.5563.110/libexec/chromium/chrome_crashpad_handler:9905
/nix/store/3gpnym3j1aiys0isd6vqvzhr7fg87bv5-chromium-unwrapped-111.0.5563.110/libexec/chromium/libEGL.so:8649
/nix/store/3gpnym3j1aiys0isd6vqvzhr7fg87bv5-chromium-unwrapped-111.0.5563.110/libexec/chromium/libGLESv2.so:83361
/nix/store/3gpnym3j1aiys0isd6vqvzhr7fg87bv5-chromium-unwrapped-111.0.5563.110/libexec/chromium/chromium:20499185
/nix/store/3gpnym3j1aiys0isd6vqvzhr7fg87bv5-chromium-unwrapped-111.0.5563.110/libexec/chromium/chromium:21989989
/nix/store/3gpnym3j1aiys0isd6vqvzhr7fg87bv5-chromium-unwrapped-111.0.5563.110/libexec/chromium/chromium:23409090
/nix/store/3gpnym3j1aiys0isd6vqvzhr7fg87bv5-chromium-unwrapped-111.0.5563.110/libexec/chromium/chromium:32017337
/nix/store/3gpnym3j1aiys0isd6vqvzhr7fg87bv5-chromium-unwrapped-111.0.5563.110/libexec/chromium/chromium:267909201
/nix/store/3gpnym3j1aiys0isd6vqvzhr7fg87bv5-chromium-unwrapped-111.0.5563.110/libexec/chromium/chromium:359225482

Here we can see that the locations of store paths are compact, which means that under CDC, they will not contribute significantly to stored file size.

I think a key thing that works against CDC for such binaries is that absolute reference to labels, string constants, etc, are spread around the file, so if the position of a static string in the binary changes, this will result in changed pointers all across the binary, making the CDC find different chunks.

Thus I would not be surprised if “decompilation” into assembly, which turns addresses back into named labels, would improve the deduplication significantly. However, then it’s unclear how to get back to the original bit-identical binary.

1 Like