Why does the NixOS infrastructure have to be hosted in a centralized way?

amjoseph · August 6, 2024, 9:52pm

You might have heard of a tool that does this on purpose, called “autoconf”

Yes I had, but I couldn’t remember where I had read it. Thanks for the reminder; for everybody else: @fzakaria’s article Speeding up ELF relocations for store-based systems shows what’s possible, and the results are much better than prelink(8).

Yeah, in long run the the (install)checkPhase needs to be moved into a separate derivation (like #209870 did for gcc) and nix-build’s scheduler needs to learn validations like Ninja has. Then the checks could run after relinking. It would also let us run the checkPhase for cross-compiled builds, using either qemu or a smaller cluster of native builders. Some more details.

emily · August 6, 2024, 10:23pm

Never heard of it! Is it any good?

(But yeah, I was concentrating too much on stuff you could actually do from within C and C++ and missed the elephant in the room…)

Just linking the other two parts here for anyone curious:

It’s very cool that we can still have a meaningful form of “dynamic linking” even in our hermetic hash‐addressed world, and then do the dynamic loading statically. And a little sad that we have hitherto basically not taken advantages of these possibilities at all!

Yes, for sure checkPhase being part of the main derivations is terrible anyway. The idea of validations making sure checks run without getting in the way of the critical path of the builds is brilliant; wish I’d thought of that before!

Running the tests for every package in Nixpkgs still seems like it’d make even a one‐line glibc security patch pretty painful, though. I’m not sure all this machinery is quite enough to free us from the tyranny of staging for that case, as much as I think it’d produce a better system in general. (Though I realize the irony of saying this in a topic about decentralization and multirepos.)

Thanks for sharing your wisdom about this! I am pleased to know that I am not the only one to have independently derived basically the exact system I’d like to see for this, and it makes me more confident that it’s a good path to go down. I’m not sure Nixpkgs could survive the kind of surgery it would involve, but I think there’s clear wins any new system could gain here.

SergeK · August 7, 2024, 1:49am

Thanks @amjoseph and @emily, reading your conversation was enjoyable and inspiring.

How does this work with pkgsBuildHost/nativeBuildInputs? AFAIU you propose that if a = mkDerivation { ... buildInputs = [ b ]; } then a be implicitly split into a.__stubs and a.__relinked, where a.__stubs is the actual build and only depends on b.__stubs. However, if a = mkDerivation { ... nativeBuildInputs = [ c ]; }, we have to use c.__relinked whose hash is unstable?

Honestly, if we did this on a granular opt-in basis (I don’t see why we couldn’t), this sounds OK? For a maintainer, it’s not much different from having to update the srcs, and as to out-of-tree users they could always “unpin” the derivation by removing outputHash in an overlay if their compute hours are cheaper than their human hours

Taking notes: “FODs in non-leaf nodes to pin down the graph indeed are a thing”

A self-deduplifying back-end at cache.nixos.org doesn’t involve changes to nix either:)

Anytime you realize the build-derivation, you must also realize the test-derivation concurrently with any referrers of the build-derivation.

The word “concurrently” is what distinguishes this from a simple wrapper,

Gosh yes. I wonder if this could fit into tvix’s lazy “we don’t realize the output until we read it” paradigm, but it does seem like these “tests” are their own concept. OTOH, I don’t see how this is better than simply moving all checks to passthru and building better tooling to collect and build them. Without changing anything about the language or about .drvs

This. I’m rather hopeful that having “a” formatter enforced by the CI will, over time, prove to reduce the number of conflicts, even if it makes a few locally suboptimal decisions…

The infra is not the most worrying lever of power, or even a major one.

Yes let’s get rid of these python-updates while at it:) On a serious note, while I think that all decisions that can be made locally should be made locally, there are problems that require making decisions that are global (causing global conflicts), your stub thing and cross-by-default being examples already at hand. It’s not just “power levers” because we’re trying to strike a balance here, between scarce availability/expert time, long-term maintainability, and being still useful in various consumption scenarios. And yes I think occasionally removing cruft can be helpful with the former

emily · August 7, 2024, 4:50pm

Yeah, I don’t think there’s any getting around changes in build tools resulting in mass compilations. You might still be able to benefit from content‐addressed short‐circuiting behaviour to some extent, but glibc was probably a bad example because it’s unlikely we can come up with any truly satisfying, compromise‐free scheme that doesn’t result in a ton of builds every time glibc changes in any way. There’s still a lot of room for stuff that would need to go to staging now but would result in dramatically less CPU burned under this scheme, though.

I also personally have hopes for a system where we can track dependencies on a more granular level closer to files than entire packages and therefore hopefully achieve even fewer rebuilds, which I think would multiply the potential benefits of a scheme like this. But that, I think, is probably beyond what we could reasonably turn Nix into.

SergeK · August 7, 2024, 6:21pm

Well there is an obvious way out, it’s just slightly embarrassing: we pin a “pre-cached nixpkgs revision” and take pgksBuildHost from there

SergeK · August 7, 2024, 6:29pm

This would be very interesting. I think that Bazel’s approach, where cc_{library,executable} suddenly float up to the same layer as language-agnostic “packages”, is very unsatisfying because it means that if a project uses Bazel it essentially can’t be consumed in any other way. This contrasts with e.g. “a meson project whose authors test and deploy it using nix”, which is normally still just a meson project. I hope we can improve dynamism/granularity without compromising this property

emily · August 7, 2024, 8:19pm

That falls under “giving up” for me, I’m afraid

I have a lot of thoughts and ideas about this, but unfortunately this particular margin is much too small to contain them!

amjoseph · August 12, 2024, 9:35am

It may not seem like it, but the tests are just a tiny fraction of the cpu cycles needed in order to do a mass rebuild of staging. Tests seem like a big burden because they often use only one CPU core (enableParallelChecking is awesome when it works though) and they block (run in series with) any downstream dependencies.

Think of it this way: For a fresh rebuild you can’t start building gcc until the (big slow) bison test suite has finished. For a security fix with separate compile/relink derivations you can run both the bison tests and the gcc tests at the same time.

Yeah I’m not sure of that either. And in any case I don’t think I am that surgeon.

amjoseph · August 12, 2024, 9:54am

This is a great question.

And yes, that’s the (start of the) answer!

In the long run something like multiversion packages will be cleaner. Especially if the package meta attrset has a causes-stubs-change field (only necessary for mass-rebuild-triggering libraries, of course). Then the pinning can be automated.

Yes that’s the plan. It’s only needed for packages that produce .so outputs and are widely depended upon.

Sure, but there are a few aspects of the way it formats that aren’t just aesthetics. Minimizing the number of lines that change (because git diffs are line-oriented, not token-oriented) is important. The traditional formatting of ] ++ lib.optionals foo [ was chosen to minimize the number of changed lines, not because it looks asethetically pleasing.

A much older example is the GNU coding standards, which put the { on a separate line. It looks hideous on the screen, but it’s worth putting up with it because in a language with optional curly braces (i.e. not Rust) turning a one-statement body into a two-statement body can cause a conflict avalanche if you don’t do this.

emily · August 12, 2024, 10:06am

Right, I think there is a fundamental difference here where I see the fact that the package set is built as a consistent atomic unit with the same versions for everything (modulo the unfortunate exceptions where we have to carry multiple versions) as one of the core strengths over other distributions that I’d like to maintain. So “just build stuff against the old OpenSSL but link to the new one”, “don’t bother rebuilding just because build tools got bumped”, etc. don’t really appeal to me. (And this of course ties in to what this thread was originally about…)

FWIW I understand that Git diff size/reducing likelihood of conflicts has been one of the top, if not the top priority of the formatter work. Of course whether that’s been fully achieved is another matter, but it’s definitely not been ignored; diff size comes up constantly in discussion from what I’ve seen.