Improve deduplication with late binding?

Problem statement:

If/when Nix switches to CAS storage, then binaries that reference different dependencies but for the rest are the same will not result in deduplication.

Example:
curl uses openssl. openssl gets a patch release which results in a small change in the .so file but nothing else.
This results in curl rebuilding. The resulting binary will (should) be the same, except for the reference to openssl being updated.
The checksum is different and everybody needs to install the new version; the unchanged supporting files will be hardlinked, but the binaries will have a few different bytes.

In “regular” Linux distros, simply updating the openssl package would be sufficient. In low-powered devices, that is preferable.

Solutions

If curl was somehow able to decide which openssl to use at runtime, the package would be unchanged and any dependencies would not need rebuilding. There could be a separate wrapper package that wraps binaries and libraries, perhaps by patching ld.so. This separate wrapper would be tiny, all dependent packages could then skip a rebuild by only updating their own wrappers, and the bigger wrapped packages remain unchanged and unbuilt.

Another option might be to generate diffs for all builds with the same version. It wouldn’t help with skipping rebuilds, and local diskspace would still be impacted, but there would be less to download.

1 Like

Would this wrapper query the derivation or the store in some way to discover the information it needs? In a sense, this is an impurity, though perhaps a useful one. Or is there a way to make this pure?

Is this basically overwriting or making LD_LIBRARY_PATH(or ld.so) “Nix-aware”?

Well, each executable could be wrapped by something compiled that loads the wrappee and applies the hardcoded dependencies. Then it wouldn’t need to query anything, it would be like a “ld.so script” (or dyld).

These wrappers would be in a separate package, which would need to be recreated, but the wrapped packages can remain unchanged. I imagine a similar thing is possible for libraries.

As I understand it, it would be like:

openssl 1.0.0 is built. We build curl 1.0.0 with openssl 1.0.0. We then (later) build openssl 1.0.1, with no API incompatibility. We then replace pkgs.curl from curl 1.0.0 to curl 1.0.0 build with openssl 1.0.0 but wrapped to use openssl 1.0.1.

This seem doable while keeping the purity, but would need to manually specify this overwrite, and if someone who have the incompatible openssl 0.9.0 compiled, it will need to build both openssl 1.0.0 and openssl 1.0.1. This still look like an interesing idea for the most commonly used library.

If I understand right, this building involves downloading from the binary cache, which is, for most users, much easier than building from source.

Indeed, it will be downloaded from the binary cache if avalaible.