Content-addressed Nix − call for testers

I decided to experiment with ca-derivations on my little 900-derivations build graph today. The first realization (hah) was that you really want ca-specific-schema.sql: add index on RealisationsRefs(referrer) by trofi · Pull Request #5366 · NixOS/nix · GitHub, otherwise build times for a non-negligible number of drvs go through the roof. With that patch applied to master, I got

standard build 2:16s
standard build, --offline 2:13s
CA build 2:48s
CA build, --offline 2:32s

Not too bad I’d say! It makes sense that substitution introduces quite a bit more overhead for CA drvs since they both induce more requests in total and they are interleaved with building.

1 Like

A similar suggestion that should be compatible with more existing build scripts is to first transform input shared objects into stub libraries that contain nothing but the symbol table. The stub references could then be replaced with the actual libraries in a second derivation, which would be cheap to rebuild. Apparently LLVM now has a tool llvm-ifs that can produce ELF stub libraries as well as Apple .tbd stub files.

5 Likes

Well in the case of a compiler being rebuilt, if the new compiler binary has a different hash and therefore store path, it may still (and hopefully will be) functionally equivalent to the old one. So everything that uses that compiler directly will be rebuilt, but not anything depending on those, because these are then bit for bit identical.

Not sure how common a situation this is though, because most compilers will have a huge amount of direct reverse dependencies…

New compiler versions do usually come with new optimisations, these new optimisations get applied and create new binaries.

If its a major release, certainly. If it only fixes bugs, then possibly only a small subset will be affected.

I think the real issue with the compiler “use case”, is that most programs that transitively depend on a compiler most likely also directly depend on it.

1 Like

My current large problem with content-addressed derivations is how Nix itself mixes in variability indirectly with the hash input.

My expectations go like this: I compile GNU Make, Nix gives it an input-addressed hash Ai as $out. GNU Make hardcodes Ai/lib inside the binary as a constant string because GNU tools are insane. Then Nix comes and rewrites it to a content-addressed Ac/lib. If I change something insignificant in the buildscript, the output stays the same, gets rewritten to Ac/lib, rebuld gets cut short, the day is saved.

Reality is much more disappointing. When I recompile GNU Make with my insiginificant change. Nix gives it an input hash Bi, GNU Make hardcodes Bi/lib, but now the linker decides to place it at some different position in the binary’s rodata because… Ai and Bi were different, I suppose? And if they’re placed differently then no rewriting will save me, one new different Bc incoming. And here I am, recompiling all GNU Make reverse builddeps like in the old input-addressed days =(

4 Likes

Very interesting problem, so the linker is applying some sort of sorting?

Can we give the build always the same path, same length as the input hash but not available outside the sandbox?

Then path rewriting must succeed for builds to work, and since all the inputs are CA, all paths remain unchanged if you’re just compiling a small change.

1 Like

Also, there’s now a --shrinkwrap option to patchelf, which might help and probably should be used in the CA builds?

It gets all the transitive dependencies of a binary (so including those of libraries that are used) and pins them all with their full path into the binary’s ELF header. Speeds up loading but might also solve that linker order?

1 Like

Can we give the build always the same path, same length as the input hash but not available outside the sandbox?

Sure, but that’d break non-sandboxed builds. =)

–shrinkwrap

Not related to the string constants in rodata.

I think this is something to solve on the linker or autotools or project patch level, not in Nix per se. Builds with different prefix shouldn’t differ that much. If only I knew what determines the rodata string ordering, I need to look deeper into that.

2 Likes

Yes, I agree.

It sounds like in this case

GNU Make hardcodes Ai/lib inside the binary as a constant string because GNU tools are insane.

Perhaps just getting rid of the self-rereference here somehow is the best option.

3 Likes

Actually, would it? We could make the path for a build be /nix/store/0000000...000-name-version and then it would only be a problem if 2 builds were making the same name-version at the same time.

True, locking these paths and serializing such builds does avoid the problem.

Oh no, even Nix itself does it.

1 Like

Found a slide deck hinting that linker parallelization is gonna rain on our parade: https://llvm.org/devmtg/2017-10/slides/Ueyama-lld.pdf

To summarize my problem so far:

  1. applications hardcode self-references into binaries as constant strings
  2. to get stable results, hash rewriting in content-addressed derivation requires the linker to maintain a stable order of strings even in the face of $out hashes changing
  3. the worlds wants new fast parallel linkers like lld or mold that will shard the work and process strings in parallel.
  4. even if they merge it back in a stable order based, say, on a hash of a string, the order is going to be different for different $out values, so we’re screwed
  5. (most depressing to me) even if we come to all the linker writers and they listen to us… what will we say? please maintain a stable order based on… what exactly?

This is gonna bite us and I’d really like to hear we have a way out.

EDIT: and I’m afraid it’s patching all linkers for all languages to “link slow” =(

4 Likes

but during the build we can provide a stable $out name that we later rewrite, no?

1 Like

We can, but it’s not clear to me whether we should, as that invites extra complexity and tradebacks.

1 Like

@t184256 thank for your astute operations, but I am still optomistic. self-reference rewriting to me was always a best effort for legacy apps. I am stretched in a few direction at once, but have had a long term interest in seeing what can be done to make self references less needed across the board, try to steer upstream projects in that direction, etc.

I have also had a long term interest in stuff like Capsicum/CloudABI/WASI, and I think that dovetails nicely — for example, what if we just gave every executable a directory descriptor for each of it’s “GNU install dirs” (libdir, bindir, etc. etc.).

We Nix people need to break out of our bubble.

  • One one hand we need to entice more upstream devs, e.g. with things like [RFC 0109] Nixpkgs Generated Code Policy by Ericson2314 · Pull Request #109 · NixOS/rfcs · GitHub so Nixpkgs demonstrates one how to make a good dev environments for fast evolving software using idiomatic language-specific package ecosystems. Onboarding them will increase our weight.

  • On the other hand, we need to work more with like minded projects (content-adressing networks like IPFS, Software Heritage, the WASI people, etc.) and try to foster non-nix-specific standards so we are minimally bespoke. This allows up to punch above our weight.

It’s a long slow process, but if we play our cards right people the open source ecosystem will just naturally be funneled our way, and there won’t be such constant new and accumulated frictions from designs made by others in ignorance of problems like this.

10 Likes

I’m a relatively novice Nix user who recently tried enabling __contentAddressed on a tree of about ~1000 packages internal to my company. Mostly it seemed to work well, and some basic testing did show about a 20% reduction in churn over time from using CA store paths.

However, I’m pretty regularly hitting the Bad file descriptor crash that a number of others in this thread have mentioned, to the point that I don’t think this change is one we could currently roll out in production. I’m on NixOS with this machine, and I’ve confirmed that both the daemon and client are the same Nix version:

$ sudo lsof -p $(pgrep nix-daemon) | grep nix$
nix-daemo 1404 root  txt    REG                8,2  3901464   17180875 /nix/store/q3zmh3is1hxpvnw0w8bm1wvis6q8aijv-nix-2.6.1/bin/nix

$ ls -la $(which nix)
lrwxrwxrwx 1 root root 61 Jan  1  1970 /home/ciadmin/.nix-profile/bin/nix -> /nix/store/q3zmh3is1hxpvnw0w8bm1wvis6q8aijv-nix-2.6.1/bin/nix

And I believe these are both by default using the same nix.conf, and so have the same flags enabled:

$ cat /etc/nix/nix.conf | grep exp
experimental-features = nix-command flakes ca-derivations

It does seem to be a spurious thing, a retry will always get it past whatever point it crashed at.

Are there any other possibilities for what could be going on here?

I’ve been able to mitigate this issue by giving the nix daemon a high FD limit:

systemd.services.nix-daemon.serviceConfig = {
  LimitNOFILE = lib.mkForce 131072;
};

…which works fine on my system of about 3000 packages. I’d like to set it to infinity but GNU patch doesn’t like that and I haven’t bothered investigating yet.

3 Likes