Why does the NixOS infrastructure have to be hosted in a centralized way?

AWS? Surprised.

There was some work towards supporting IPFS which would maybe also solve this,
but indeed, even just manual mirrors would work great.

Correction: I’m not sure what hydra runs on, but the binary cache itself is on S3, if I read NixOS Foundation Financial Summary : A Transparent Look into 2023 correctly

2 Likes

There’s mention of Hetzner and Equinix Metal in there too. Hetzner is providing an AArch64 machine. To me, none of that is “overpriced cloud nonsense” (a ridiculous statement anyway).

Of course the NixOS foundation has the prerogative to use whichever infra works for them, but the point above was that running a buildfarm does not have to be tens of thousands of euros per month - using a server you own can be more cost-effective and performant in some cases (especially since most individuals would not be getting thousands in free hosting / bandwidth donated to them). After all, “cloud” is just someone else’s computer :slight_smile: And we’d have more opportunity to decentralise the infra if people are accurately able to assess the costs/options.

running a buildfarm does not have to be tens of thousands of euros per month

Agreed, it would be interesting to see where greater efficiency can be found within the existing infrastructure.

especially since most individuals would not be getting thousands in free hosting / bandwidth donated to them

Not really sure what you mean by this?

“cloud” is just someone else’s computer :slight_smile:

This is an incredibly reductionist description of what cloud computing is…

1 Like

I’m very confident that the infrastructure isn’t going anywhere and a big part of that is why I want to make sure our financials are transparent. The main goal is to make sure we consistently have 2 years of funds to secure any scenario. Regardless to direct funding, there are multiple teams collaborating to ensure we have solid fallback options if needed.

This includes a lot of ongoing work by the Infra team, currently split into a few areas. There’s the the long-term S3 cache effort were it would be interesting to explore decentralization to see if it can support making Nix more sustainable. We also have an experiment running outside of AWS (@edef / @fricklerhandwerk), and I’m working with the AWS open source team to extend our credits (hope to have an update on this in the coming weeks).

Just to reiterate, any work or help is greatly appreciated!

8 Likes

Just brainstorming here; you have been warned :slightly_smiling_face:

Would it be reasonable to create a decentralized build system based on a cryptocurrency?

This could give those who use their hardware to build and distribute Nix packages an incentive of sorts. Generally, relying on others’ good will is not sustainable long-term. It would also create a public ledger that provides some form a reputation and history, which may mitigate some security concerns.

Perhaps users can opt to only download binaries built by reputable builders. Perhaps they can opt to pay a fee that would give the package priority and be awarded to the first builder. Perhaps builds could be verified through other builders, for which they would receive a small award.

I realize that the world is getting tired of new coins, but I see a potentially valid use case. My only concern is that the coin must serve an actual purpose, and be safeguarded against financial speculation.

1 Like

What would be the most effective way for the community to participate in this regard, such that we’re not doing work that’s already in motion?

@waffle8946 you could study @flokli’s work on the deduplicating content-addressed store for Tvix. Eventually we’d want to plug that behind the CDN, but someone needs to build a prototype for that and experiment with performance.

1 Like

I appreciate you asking! and I just noticed I left this message on draft! (apologies)

On this topic there’s about 3 related potential threads:

  1. S3 Cache Long Term effort which @edef is leading with support from @fricklerhandwerk and a few others. (noticed frickler responded above)
  2. S3 Cache AWS Credits - I’m leading that to get 12 more months of additional runway for us
  3. Infra team involvement @hexa
4 Likes

None of this has anything to do with finance or cryptocurrency.

The moonshot is fixing the way we deal with shared libraries. We pay an immense disk-space and bandwidth cost every time we rebuild a library which is depended upon by other packages. It is an insane cost. No other distro (except Guix) pays this cost, and it’s getting increasingly impractical.

I’ve said a bit more about how to fix this elsewhere but it’s still in the oven and not done cooking yet.

So long as its symbol table doesn’t change, modifying glibc shouldn’t trigger rebuilds of things downstream of it. It also shouldn’t change the contents of packages downstream of it, except for a tiny intermediate “indirection table”. The only thing that should change in the downstream outpaths is a pointer to that indirection table.

This goes way beyond CA derivations (but works best in combination with them) and better compression. It’s not the same thing as pkgs.replaceDependency but can be used instead of it if the dependency’s symbol table hasn’t changed.

3 Likes

I’ve wanted to apply the same approach to shared libraries, but I’m wondering how you’d intend to accomplish it without CA derivations? You still need a way to turn the new glibc sources into a stubs output, and short‐circuit the rebuild of downstream packages based on the actual hash of that output, right? Dealing with header file changes is also tricky.

LLVM already has a pretty comprehrensive‐looking tool for creating stub shared library files, and of course macOS has been natively doing this kind of thing with .tbd files for years. Though I think you would still need to do a minimal amount of O(n) work across all packages by rewriting the paths in the resulting binaries from the stubs to whatever actual library version you want them to be linked to in the final system. (That’s a lot better than rebuilding the world for a glibc security update, though.)

3 Likes

I’ve thus far presumed that the store hash is based on inputs + content of the derivation (damaged by too much git?).
So the hash of a derivation will not change if I modify the build phases (w/o input changes)?

Build phases, etc. should be considered inputs here; Nix expressions eventually produce a .drv file with the builder, inputs, and variables, and the hash covers all the information in the .drv file. The key point is that it’s not a hash of the build output.

1 Like

Exactly. Levers of power attract unpleasant people and bad behavior. Eliminating those levers is not easy, but the benefits are worth it.

The infra is not the most worrying lever of power, or even a major one. The fact that nixpkgs’ design makes maintaining long-lived forks extremely painful is probably the biggest.[^1] Some of the obstacles to forking nixpkgs include:

  • Constant treewide commits cause horrendous merge conflicts. These treewides are often for trivial reasons, like English grammar nitpicks. Nobody is weighing the benefit of this stuff against the cost of causing merge conflicts.
  • The traditional nixpkgs formatting style was designed to minimize merge conflicts, but the new autoformatter is amplifying them instead. This is not “aesthetics”, and it is a serious enough problem that “mumble mumble RFC process” is not a justification. Making merge conflicts (even) worse will make nixpkgs (even) more centralized, which will – in the long run – make the political battles even more fierce.

  • No static types in the Nix language means we have to rely on expensive build success/failure as the only CI, and we can’t check any meaningful assurances given to out-of-tree uses – which means that moving leaf packages out-of-tree means giving up any CI. [^2]

[^1]: I always thought it was madness to try to maintain a long-lived nixpkgs fork, but apparently there is now a group attempting to do that.

[^2]: With a real type system like Haskell’s, typechecking nixpkgs would provide a guarantee of things like “for all well-typed out-of-tree use of this expression, the resulting use will be well-typed”. Right now none of our CI has the ability to express this kind of “forall” quantification – we’re stuck at the bottom level of the arithmetical hierarchy.

PS, why don’t discourse footnotes work here anymore?

5 Likes

[Citation needed]. The traditional style was mostly whatever people were feeling like. Like the current style, everything is a tradeoff between different desirable properties. The RFC text describes the approach taken to reach compromise on these, and why. Also note that one will mainly notice conflicts caused by the new style, and not the many merge conflict situations this style avoids (e.g. last-element insertions due to trailing commas).

[Citation needed]. I think that you are over-emphasizing the (real and non-trivial) effect of merge conflicts, especially regarding centralization. Moreover, I think you are conflating merge conflicts caused by the migration to the new style (which is mostly one-off) with systemic merge conflicts caused by it.


I briefly looked into it back when I was Discourse admin, IIRC it requires some plugin functionality, at least it wasn’t as trivial to enable as flipping an option, unfortunately.

7 Likes

Thanks I did not know about this! Yes, llvm-ifs --output-elf is probably 95% of what is needed. We will probably need to do additional normalization (e.g. sorting symbols, maybe additional filtering) since bit-exact linkable stubs are a high priority for us but probably not as important to LLVM.

Yes; that stubs.so gets built by a Fixed Output Derivation.

We’re accustomed to FODs being used for fetching stuff over the network, but they can be used for other things. You can run llvm-ifs --output-elf=$out/stubs.so on .so files in another derivation’s outpath, from within an FOD.

Of course then you have to manually copy the hash into your .nix expression, which is a drag. If you have floating output CA-derivations (FLOCADs?) and use those instead of an FOD you don’t have to do this. So floating output CA-derivations make this much more convenient.

Yes, every derivation which has .so dependencies turns into two separate derivations, which I’ll call compile and relink (this can be automated inside of stdenv.mkDerivation).

  • The compile derivation looks exactly like what we have today, except that it has only the stubs.so of its library dependencies as inputs. So it gets rebuilt only when the symbol table changes in a way that doesn’t get normalized away.

  • The relink derivation takes the compile derivation as an input, and simply uses patchelf to change any references to the stubs derivation so they point to the real dependency.

When you upgrade a library (say, glibc) in a way that doesn’t change the symbol table, the stubs FOD won’t change, so none of the compile derivations will get rebuilt. These derivations are the ones that involve significant build effort. All of the relink derivations will get rebuilt, but those are trivial – they just run patchelf.

These patchelf runs are what you noticed:

To avoid bloating the binary cache, the straightforward approach is to mark all of the relink derivations with allowSubstitutes=false and exclude them from cache.nixos.org. That’s a very crude sledgehammer, but it works today with no changes to the nix interpreter. There are better solutions but they take longer to describe or need new interpreter features.

This two-derivation-step build process would let us use prelink(8) to get faster startup times, like Red Hat and MacOS do.

Also left to future work is dealing with the situation where a new version of a library adds symbols to the symbol table, but doesn’t change or delete any. Ideally we’d like to avoid rebuilding dependencies that don’t need the new symbols, but the mechanism for detecting whether or not those new symbols are needed and making note of that fact in nixpkgs both need to be developed. This is closely connected to the point that you raise:

I have a feeling that the two problems (deciding which versions to take header files from / avoiding rebuilds when unused symbols are added) are related, and will probably both be solved by the same mechanism at some point. But it’s just a hunch. llvm-ifs --output-ifs is probably useful here.

7 Likes

Parallel means analogous.

1 Like

It sounds like we have had very similar thoughts on this, which is encouraging! I didn’t think of just hacking it up with FODs, though.

I didn’t know about llvm-ifs until recently either! I suspect it may already have bit‐reproducibility as a goal, given the implications of “The linkable shared object stubs can be used to avoid unnecessary relinks when the ABI of shared libraries does not change”. But if not, it should be easy to detour via the text‐based format and do any necessary normalization there.

Re: prelink, @fzakaria’s recent blog posts on optimizing dynamic loading with Nix and “management time” feel very relevant here.

I am not sure the headers problem is quite so simple because, even ignoring the preprocessor or the fact that C++ headers regularly contain actual code, I think there are probably ways that downstream derivations could condition on the availability of an API that would be tricky to deal with. If nothing else, there’s nothing stopping a derivation from textually processing a header to react to pretty much any change to it. My preferred solution that did not risk introducing what are effectively impurities was to write a libclang‐based tool to normalize header files to the greatest extent possible (formatting, comments, order of declarations where irrelevant, etc.), and then just eat the rebuilds when there’s any non‐trivial change to the headers. (Maybe precompiled headers could be useful here.) So far, that’s the best thing I’ve come up with that doesn’t risk jeopardizing the essential properties – i.e., that you should never find yourself in the traditional distro position of “gah, I should have rebuilt with the new library version!”.

But yes, this means mass rebuilds still happen when the API is extended (to some degree; CA derivations mean that we at least benefit from short‐circuiting). I am not sure that can be avoided in a way that doesn’t compromise the essential properties we want, but I would be interested in any attempt to avoid it!

Unfortunately there is also the matter of tests. Tests have to come after relinking executables with the actual libraries, for obvious reasons, and since the whole point is that we’re linking against a new library version that presumably has changed behaviour, you really do have to run all the tests again to stay honest. I suspect that a mass rebuild of all the tests in Nixpkgs is not so much less painful than a mass rebuild of all packages that the wins from this whole arrangement are as big as I’d like, but maybe something clever could be cooked up.

5 Likes