Geistesblitz: nixpkgs-update to spur flake conversion

blaggacao · April 23, 2021, 5:42pm

I’ll try to be short. Kindly let me know if too short.

Problem

a lot of packages are regarded by the community as unmaintained.

Solution

Let’s try to apply a systemic solution to a systemic problem.

eg nixpkgs-update starts to replace builtins.fetch* with its flake equivalent (where possible)
over time, each package becomes it’s little “mini-flake”
after a while we can write a bot who proposes (via PR) those flakes for upstream inclusion
replace in-tree implementation with a ref to upstream flakes for those who get accepted

Those that get accepted, have a better (systemic) chance of staying maintained.

The underlying assumption, of course, is that we can convince upstream maintainers of the superiority of nix flakes as a component for their packaging efforts. That precondition might not be met (yet), but I think nix community is on a very good path to fulfill that precondition. And we still could intensify our efforts. Note, that it would already be a significant advancement of the “nix cause” if only a relatively small portion of upstreams would buy in.

Let me know what you think! And please so, if I should elaborate more on any aspect of this geistesblitz.

Radvendii · April 23, 2021, 6:34pm

Is this another iteration of the monorepo debate?

One anti-monorepo problem that this does seem to address is that if upstream changes things that break dependent packages, we can just have nixpkgs depend on the older version of the upstream repo. I do think this would be harder to keep track of than the current system (you’d have to go looking in flake.lock files to figure out what package was actually being installed).

But other anti-monorepo problems seem to still apply: Monorepo means packages generally have 1 version in nixpkgs, meaning you don’t have to install a million versions of everything on your system, saving a lot of space. I’m not sure how your proposal addresses this.

colemickens · April 23, 2021, 7:03pm

I don’t understand at all, as a big fan of flakes, I have no idea what they have to do with this problem. And as someone that maintains an external flake of dozens of packages, I don’t think that “external flake is automatically likely to be better maintained” is at all a safe assumption (if I’m correct to read that implication).

blaggacao · April 23, 2021, 7:20pm

An outcome of it would be that for some (probably leaf) packages the ultimate source of truth lies outside of NixOS/nixpks. I doubt it shares significant parts of the motivation of the monorepo debate.

Akin to the monorepo debate, I think that this probably works best for some (to be precisely enough specified) notion of leaf packages.

You are right that an implicit assumption is that an upstream repo that has a flake.nix file at the source of its repo, would tend to either maintain or delete it. Both are significant for our purpose: if “they” maintain it, we might have discovered a more scalable model of what people in this forum have coined as “eating up the world” (marginal package growth way beyond marginal maintainer growth / efficiency). If they delete it that signal would surface timely within NixOS/nixpks. If they let get flake.nix stale, we do not loose anything and (pareto-efficiently) stick with problem.

I want to note that between step 1 & 3 a considerable amount of time passes and that the underlying preconditions and context of this idea are likely to change.

blaggacao · April 23, 2021, 7:52pm

I forgot to answer on this. Apologies!

My baseline assumption is that any external package is provided as an overlay. Which, of course, introduces the failure mode that nixpgks can break an otherwise healthy upstream package. That, too, is a valuable signal for our purposes.

tomberek · April 24, 2021, 3:08am

Random thought: adoption along these lines might be easier if a project maintainer got more than just Nix-related benefits from incorporating flakes. Something like a “nix-generators” where it was easy to get other kinds of build artifacts: tar, rpm, deb, docker, whl, snaps, appimage bundles, dmg, minimal qemu, etc.

So the Nix usage meant they got other things for free and frees them from the burden of learning about all those ecosystems.

danieldk · April 24, 2021, 9:44am

Some remarks:

I don’t remember if flake references if lazily evaluated or not. If not, this would mean that potentially thousands of git repositories have to be fetched. Even worse, if the flakes live with their software, you are not only downloading the Nix files, but the whole project. Even if flake refs are lazily evaluated, this would entail cloning all those repositories if you want to evaluate the package set.
You say that this reduces the package update workload. However, since flake inputs are pinned with a lock file, you are effectively replacing package management by pin management. I am not sure which is more fun .
You have to make very sure that you are overriding nixpkgs in every transitive input of third-party flakes. Otherwise, people will end up with a multitude of nixpkgs checkouts and with that many ‘duplicate’ output paths (e.g. many different versions of glibc).
Outside parties can break all of nixpkgs. Suppose that glibc was an external flake. Making some breaking change to the upstream glibc flake could break a lot of stuff in nixpkgs. The only thing we could do about it is pinning an older version, which might not have security updates, etc.

Nix is probably not even on a single digit percent of Linux machines. I don’t see why upstream maintainers would care. If they do care, they may be nice and add a Flake to their repository. But if they aren’t using Nix, it’s likely that there will be regressions over time (e.g. flakes are not adapted to API changes in nixpkgs). Currently, we can fix such problems ourselves, but then we are dependent on upstream maintainers who may or may not be very active in merging PRs.

I think we should remind ourselves that flakes aren’t even a stable feature yet. I assume that it will go through another RFC, since flakes are a major change and the previous RFC was closed. So, I think it is premature to make changes to nixpkgs that require a Nix version that isn’t stable or doesn’t have an accepted RFC.

Note: I am not against flakes. But I think that the idea of splitting up nixpkgs in thousands of flakes is a pipe dream.

danieldk · April 24, 2021, 9:50am

I am not sure why upstream maintainers would want to do this. This removes the whole reproduciblity benefit and upstream maintainers will be bothered by all breakages.

blaggacao · April 24, 2021, 1:06pm

Reading the previous conversation with a little bit of goodwill, some of your remarks where already summarily adressed and I think pin management is the innate domain of the update bot while upstream motivation has been sumarily addressd as well (presuming the topic still needs considerable effort), but you raise a new point:

I would first turn builtins.getFlake, which, as plain nix, I would not assume, not to be lazily evaluated.

Remains the question about evaluating. In an end-user scenario, I don’t understand why one would want to evaluate the whole package set, while not also wanting to build it, so your argument must be motivated by some intermediate use case (like testing? / hydra?).

I think before trying to underatand your argument further, I’d better ask you to clarify the underlaying (specific) use case that you had in mind. Maybe that would be shifted, too, by this idea?

Re: unstable

If you’ve got a point with the above, we could almost say: God thank it’s still unstable.

danieldk · April 24, 2021, 1:58pm

I can think of a lot of reasons to evaluate at least the top-level attributes of all or a subset of derivations:

ofborg (to verify that nixpkgs still evaluates after a change).
nixpkgs-review (to verify which attributes have changed).
search package descriptions (requires evaluation of the derivation for meta.description).
shell completion

However, there are more fundamental problems. If you currently install, say Firefox, the derivation gets evaluated, Nix constructs the output path from the derivation hash, and downloads the output path from the binary cache.

Now, suppose that the upstream Firefox repo becomes a flake, and we would use that instead. Now, in order to get Firefox from the binary cache, Nix would first have to retrieve the full Firefox source repo to evaluate the Flake, only to find out that it is built already and get it from the binary cache.

These are problems that can be worked around, I think there is a plan to add cached evaluation sqlite databases to the binary cache as well.

I think my more fundamental question is: what does this solve? With the proposed change maintaining nixpkgs becomes maintaining pins. Which has all the problems I mentioned: you lose a lot of control. Upstreams can break nixpkgs, it becomes hard to maintain compatible sets of packages (e.g. Python packages require a lot of version coordination to make them work), how are you going to coordinate stable versions of NixOS, etc.?

blaggacao · April 24, 2021, 2:22pm

Thank you! That reply let me discover a wealth of context.

Yeah, nice! I also think content adressable store is here to help us out. In general, if the inouts of a flake don’t change, (and other builtin.fetch* and similar impure stuff will become deprecated), then we might have the preconditions for effective caching, even for remote repos (since the rev is part of the input spec).

Even worse, our current purpose built community could loose a lot lf it’s current ways and manners of doing things, since that could get shifted upstream.

Are we ready to give up (that part of) our community for the sake of permeating the nix philosophy across the open source wolrd (before somone else does)?

I’m shamelessly overpainting the trade-off, here

On a more practical note, I think we would have to work further on the implicit and explicit preconditions of this idea.

only “leaf” packages (marginal cost of maintenance > marginal importance as a dependency)
we might not have a good answer for all package subsystem (e.g. python), that might be true for subsystems that don’t have reproducability guarantees built in, that is who work with version boundaries rather than dependency hashes.
we need to ensure an overlay-ed approch (as you said). One way to ensure it would be to make it part of the flake spec, one other way is via strong incentives like “only free built farm sponsored by the NixOS Foundation — erm hydra —, if an overlay is provided”.
we still need to maintain a “pinned” collection of nixpkgs/pkgs for the purpose of our (value-added) nixos linux distribution — being pinned also means “free NixOS Foundation sponsored” build farm + cache for upstream projects, so I hope we can generate an incentive here and make overall more efficient use of oir existing infrastructure on a global scale. (ipfs! To save CDN costs!)
we need to maintain any package that does not satisfy any such set of minimum preconditions in-tree.

That might be a gradual process, but even if we achieve a 5% reduction od in-tree packages and at the same time a 2% increase on “maintained” ones, at our scale, that’s a pretty impressive baseline to make the foundations for a new trend. Don’t you think (high-level)?

_{Now I’d love to be able to prototype anything in this direction, but reality hits:}

_{While I might have a slight competitive advantage in conceptualizing things on a high-level / broad scale, virtually everyone in this forum might have a competitive advandage over me at coding things. I’m not a good programmer. I’m no programmer, at all.}

nrdxp · April 24, 2021, 2:48pm

Is this another iteration of the monorepo debate?

Not necessarily, since nixpkgs could easily contain subflakes. Though the ux of subflakes could probably due wth some improvement if this were to move forward. For example, we could completely change their semantic (different canonical outputs for interal flakes), and possibly even naming, to function more like a proper module system.

In terms of how this can help solve update issues, I basically see one avenue where it definitely could.

And that’s because people are lazy, and flakes give us an offical and simplifed mechanism for dealing with hash updates wich could be a bigger reason why packages go unamaintained than we might want to admit (copy pasting hashes after failed evaluatons is boring and tedious).

I realize nixpkgs-update already does this, which is why I said could and not would, but having this feature integrated into the package manager itself instead of an external repo is also a plus for lazy folk like I.

blaggacao · April 24, 2021, 3:16pm

I like the “lazy” and the “benefits” arguments really a lot. Those are touching the nature of human beings at scale (aka. “economics”).

If we only could plot them meaningfully in a way that the lazy nature of ones maximises the benefits of others.

So on the highest level, this avenue would have two fundamental requirements for success:

reduce maintenance overhead (for the nix ecosystem)
bring the USP (unique selling propositions) a la tomberek’s suggestion above & more to upstream projects

Come on folks, isn’t nix a really great idea?

What else do we have to offer (to the world)? And how could we pitch that? (call-to-pitch)

blaggacao · April 26, 2021, 4:22am

I just realized <nixpkgs/pkgs> and flake registries are actual equivalence substitutes. The latter without the pinning step.

We only need the pinning for NixOS, I don’t see how it makes sense for pkgs if there where no NixOS.

I think it is a generally safe assumption that a consumer of a package is capable of doing the pinning and does not need NixOS/nixpkgs to intermediate.

Except for one aspect: maintainers of NixOS/nixpkgs provide the added service of dependency tree harmonization across the package set. The flake equivalence substitute is setting the input.X.follows arguments.

In theory, nixpkgs/pkgs could become a collection of wisely chosem input.X.follows catering to the immediate needs of nixpkgs/nixos that are dricen by the contention of closure size.

That could bring over the core value added of nixpkgs into a distributed flake model.

danieldk · April 26, 2021, 7:13am

How do you propose to handle binary caches? I wouldn’t trust a cache of a random third party, on the other hand if the Nix foundation also caches for third parties, there is the issue of third parties playing fast and loose with licensing. Or worse, caching illegal material.

igel · April 26, 2021, 1:25pm

Isn’t the current situation “the same” with regards to licensing?
It is based on trust to the community/people who create a PR for a package.

there is no automatic check or proof (via blockchain) …
(and the package will get cached …)

There is/would be the need for an package attribute license_url and a bot to check/approximate if on that page at least the mentioned license is mentioned?

blaggacao · April 26, 2021, 2:46pm

I wonder about the implications. But first let’s bisect the question:

How do you propose to handle binary caches?

My current hypothesis is that value added by nixpkgs (over alternative models) is one of dependency homogenization. So in that respect, a homogenized and tested dependency tree is capable of reducing the closure size on binary caches through standard mechanisms of deduplication.

In flake, authorized caches are defined with the top level nixConfig.subsgituters argument, hence a specific homogneized dependency tree targeted primarily for NixOS use and expressed via flakes would define its authorized binary cache. It is my understanding that this continues to be a NixOS foundation provided one.

The mere policy decision, wether to include a specific built binary blob into that cache infrastructure seems an unrelated problem.

The policy needs to be crafted in a way so that it does not violate applicable laws and is aligned with our core values and licensing requirements.

Anything against that policy cannot become a (flake) input to NixOS/pkgs and probably should
eventually also be removed from the flake registry (on the base of another policy decision).

But then there is trustix on the horizon which, maybe combined with IPFS, is capable of pushing the distributed trust model even further.

Until the point where it might (not will) become am option for NixOS to trust specific distributed upstream build efforts.

Re: license_url

Maybe SPDX-License identifier specs have a solution in stock for us. It seems to be in the process of becoming an ISO standard and machine readibility was a specific design goal.

blaggacao · April 26, 2021, 4:39pm

I hope it’s not only me who realizes that a distributed model of trust-worthy reproducible builds is the ultimate realization of what we can estimate as one of the core values of the nix ecosystem.

I think this is a pretty compelling outlook:

A flake registry that defers to a “fanned out” model of maintaining reproducible build instructions (“upstream.flake.nix”) and replaces the indexing function of NixOS/nixpkgs but without the associated pinning effort and a NixOS/nixpkgs/pkgs that focuses on dependency tree homogenization for the purpose of reducing the NixOS closure size to practical limits.

If you look at the clues, it almost seems that our BDFL has us signed up for a plan here.