Nixpkgs's current development workflow is not sustainable

Disowning and (rigidly) removing packages like this sounds like stalebot to me. We have to make sure automation doesn’t abuse or threaten humans (if you don’t respond in n time during your future unrelated crisis).

In the human condition, 2 weeks is nothing. Multiple months I could reason more with.

Yet, the show will go on. Every solution I’ve thought of has major problems, and I’m not involved enough to have a real say here.

From https://hackerspace.design, I’ll mention that something has to be done, choosing anything is better than nothing. Decisions can be monitored for their success, and later changed or improved upon.

13 Likes

a minor point re: pruning unmaintained packages (however that ends up being defined): it would be desirable to have that pruned packaging code still be easily findable without git archaeology, so people who decide to package something that has been pruned don’t start from scratch by accident. maybe just mark such packages as unmaintained (or move them to some designated “attic”) and skip their evaluation, but don’t outright remove their code

13 Likes

My feeling is that if it is an opt-in experiment you don’t need much permission beyond a couple of committers willing to merge the successful trains into master/staging.

3 Likes

Hm, I am not sure that all of the proposed solutions are relevant to the initial problem, and some are likely to make the mess worse. I mean, after-marking-broken sounds useful (and, well, objective, it is broken at the time of marking), but isn’t likely to reduce downstream breakage much.

Splitting Nixpkgs will only «help» as in make this issues harder to observe/summarise. As for selecting a core set ­that have to build — well, there are channels that are blocked by packages not building, and if they get stuck, this is investigated… Or is the goal to make some packages officially «stable-only», with periodic fixing up before the next release?

I support expansion of the practice of keeping the previous versions of some packages in the set and migrating the dependents over a longer time while whatever plays nice gets to use the new version of the dependency quickly. We are not using it enough. (And there is of course the opinion it is already used too much)

As for merge trains, I guess if someone has the build capacity for this, indeed you can just form a pool of committers known to merge relatively carelessly (like me…) to mention when the train needs a merge, in hopes that someone will notice relatively soon?

21 Likes

Strongly agree

There are 200k python projects in PYPI, so 5k is not that big in comparison, 5k is big for us due to our internal architecture, but a good arch should allow to do the 200k in linear effort

Strongly agree

I’ve been experimenting with the multi-version architechture here: GitHub - on-nix/python: Extensive collection of Python projects from PyPI, for Nix! , and I confirm this is the path forward to scalability. There are a few ideas we can take from there and incorporate in Nixpkgs

In some sense this just hides the problem: our arch is not linearly scalable

Ideally an architecture should allow for a package to work for-ever once packaged, the only failure reasons should be external and not internal, this is: a url that went 404, etc. But currently our main failure reason is that touching a package has side effects in other packages, normally due to version constraints and compatibility

An insert-only arch like on-nix/python is linearly scalable, Nixpkgs on the other hand is modify-in-place, and thus unsustainable after some size

17 Likes

One benefit of the current process is that we serve as a crucible for python updates, and can alert respective upstreams about issues which they may not be aware of due to extreme pinning being the norm.

Ideal scenario would be something like this issue where an upstream was alerted, and eventually took care of updating to a maintained fork of a dependency.

However, many python library owners don’t care about how their package interacts with the rest of the ecosystem, so sometimes you get interactions like this.

But, this process of alerting upstreams does create a “only distros pester me with addressing dependency technical debt, obviously the issue is with distros” environment.

Personally I’ve resigned usage of python to scripts which use only the standard libraries, or well maintained dependencies like requests. Anything more, and the python ecosystem becomes borderline unmaintainable given enough time and cpython interpreter versions.

Beware: rant

What do python packages have to do with the maintainability of Nixpkgs?

A few hundred of them are used in just the build process of certain packages, which may even export python related packages; so the dependencies need to be packaged in some manner.

And of the PRs in nixpkgs, 20k out of 143k of them have the topic: python label; so the burden of maintaining the package set (in it’s current state) is quite high. Additionally, python ecosystem doesn’t really allow you do something like, “pin docutils for the sphinx module, and everything should work fine”. No, some other package will bring in docutils; so if that other package and sphinx exist in the same environment and if the unpinned docutils gets ordered before sphinx’s pinned version then things will fail.

People usually rebuttable the pinning with venv-like solutions, but those only work on a per-application basis, venv’s can’t freely compose with other environments with mutually exclusive version bounds.

I personally would be fine doing a mass pruning of the nixpkgs python package set to just enable applications, and move most of the module specific logic into something like poetry2nix or dream2nix; where there’s more freedom to determine “python landscape” since it’s a per-project concern.

Also, the pip version resolver is just a bandaid. It works well for small to medium projects, but for large projects (such as home-assistant), it takes 12+ hours to resolve the dependencies.

26 Likes

IMHO a maintainer is someone who

  • responds to PRs and issues for their packages in a timely manner (eg within a week)
  • (optionally) updates their package as necessary

Oof, what an unbearable attitude! I’m sorry you had to deal with that. People like that give OSS a bad rap.

I notice that the sybil package has no maintainers in Nixpkgs. Out of curiosity, why not just remove it? Or mark it as broken?

I think this is a nice, pro-social thing to do within the OSS community – I do it as well – but I feel we must be honest with ourselves as to whether it brings us closer to achieving our goals in Nixpkgs. Maintaining python in Nixpkgs is hard work already, pitching in to the maintenance of all python packages in existence is nearly impossible.

Also, just to clarify: I’m not proposing that we stop communicating issues upstream; rather that before pushing a change to Nixpkgs, commit authors are forced to confront those exact issues instead of pushing the bugs onto users and other maintainers. Increased stability makes the experience less stressful for everyone involved.

5 Likes

Like most python packages, there’s a few other modules which require it.

$ rg sybil -l | grep python-modules
pkgs/development/python-modules/flufl/lock.nix
pkgs/development/python-modules/testfixtures/default.nix
pkgs/development/python-modules/atpublic/default.nix
pkgs/development/python-modules/sybil/default.nix
pkgs/development/python-modules/scrapy/default.nix
2 Likes

Surely, one of those package maintainers would then be willing to step up as sybil maintainer if it is still important to them, no?

1 Like

I don’t know. Maybe someone would if they would know this was needed. Maybe we should mark the package as broken the next time it breaks and make it a requirement to have a new maintainer to mark it unbroken.

Lots of unknowns and we don’t really know what would be a good solution.

I also doubt most people will look through all the python packages they use and update them and add themselves as maintainers like I recently did.


My suggestion would be to start with a script which finds packages which are failing to build for a long time and mark them broken and then the ones which are marked broken for a long time and we remove them. Other than we can only encourage people to better maintain things they care about and don’t rage when something is broken in master.

10 Likes

Half the python packages I “maintain” are usually from bumping an existing package, which added yet-another-dependency™.

6 Likes

I would be curious to know more details about the problems @samuela is facing. Maybe that would help us come up with more targeted solutions.

1 Like

Allowing for automated control of flake inputs via some mechanism is something I’ve been iterating on. Some ideas being tossed around are to allow some limited computation (fromTOML,fromJSON), programmatic update of the inputs (via some tricks involving self-reflection), and allowing full control over attributes via the url - then exposing that nicely at the use-sites. I’m not sure about how that scales up to something as large as Nixpkgs though… needs some testing and experimentation.

Once I’ve got a better understanding of what should work or not I was planning to RFC it.

6 Likes

Packages that I rely on, directly or indirectly, that have been indiscriminately broken in the last two weeks:

  • jaxlibWithCuda (essential for using JAX)
  • tensorflowWithCuda
  • pytorchWithCuda
  • tensorflow-bin
  • cudaPackages.nccl
  • magma
  • sentry-sdk
  • wandb
  • dm-haiku
  • tensorflow-datasets
  • flax
  • cupy
  • optax
  • augmax
  • aws-sdk-cpp
  • google-cloud-cpp

and those are just the ones that come off the top of my head!

I love nix but I just can’t live like this. What’s the point of contributing to nixpkgs if someone is just going to come along and break everything I’ve built anyways?

9 Likes

Can you explain to me what this merge train you envision would entail? I don’t believe it would be able to solve the problem you want it to. We simply lack the resources to build every commit of nixpkgs for rebuild heavy changes and even for smaller changes checking all reverse dependencies on CI is not done atm (and would be difficult I think). Simple changes can go to master and are easy to test, but hard to automatically judge due to the diversity of nixpkgs: What about this ofborg timeout? Is it supposed to fail (to evaluate) on this platform? …

One area it can help with is for the eval checks ofborg does which always need to succeed. They are quite an annoyance if you fix up something trivial and want to merge right after, as they take a good 20min to complete. A merge train could help us eliminate the need to check back after half an hour here. As far as I know, @domenkozar has applied for GitHub’s merge train beta feature with this use case in mind, but I don’t know what the state of the application is.

We do via the channel blocking jobs (e.g. nixos:trunk-combined:tested and nixpkgs:trunk:unstable). Usually merging staging-next is delayed at least until nixpkgs:trunk:unstable succeeds.

Of course this is ignored sometimes, since otherwise necessary or even security/time critical changes are impossible to make. The “Always Green”-mantra is a recurring topic in the community, with the inevitable conclusion being that it would paralyze nixpkgs development. Our workflow of testing (especially risky) changes to a reasonable extend and then resolving unforeseen regressions on a development branch (either staging-next or master (!)) is, I think, a necessary compromise. Our energy should probably be spent to optimize this process.

I think this is actually the key issue. Hydra used to send out emails to package maintainers for failing builds, but due to the amount of jobsets this caused unnecessary noise. I think we need a proper replacement for this. In relation to staging-next and haskell-updates report generating scripts have been invented as a remedy, but we are still lacking an universal solution.

What would be great would be some sort of maintainer dashboard, showing the build status of maintained packages in the various jobsets on Hydra (maybe related to the corresponding branches of nixpkgs). This could be integrated into Hydra which already has a customizable dashboard feature and all the necessary data or implemented separately (although querying the Hydra API is always problematic).

After splitting up nixpkgs into multiple flakes (per package or package group) I can see two outcomes:

  1. If we force consistent flake versions across all inputs (i.e. overriding the input flakes’ lock files using follows), it’ll become an unmaintainable mess where making any change may break the flakes provided by any number of different repositories. Then making a nontrivial change involves making the change and then figuring out which repositories are affected, preparing PRs, waiting for them to be reviewed and merged, and eventually updating the nixpkgs lock file.
  2. If we accept all input flakes’ lock files, we would probably end up with as many versions of glibc and the whole userland as packages. This would mean an explotion in size of any nixpkgs config or Nix user environment – and Nix already is already space hungry.

So I don’t see how flakes could do anything for us, but make our predicament worse.

We are in no position to be able to do this. Marking packages as broking is alright, as it just saves users the time it takes to find this out for themselves. Removing packages confidently is not possible for us in my opinion, as we absolutely no clue about the downstream usage of packages – Nix has no sort of telemetry feature that let’s us get some idea about downstream usage.

My experience has been that GitHub activity is not necessarily reflective of actual usage of a package. I have become more cautious w.r.t. removing stuff from nixpkgs after inadvertently breaking someone’s workflow by removing something I was sure nobody would miss.

12 Likes

Then we should definitely mark more things broken which are failing to build and did that for a while and are definitely not easy to fix. Less things to worry about would be great.

6 Likes

As far as I understand, the idea is to check that a bunch of PRs as a batch taken together do not break too much. Might save on some rebuilds.

If a package has been marked as broken continuously for significant time, then using required pretty careful overriding, and moving the package to a local codebase is not as much of a change as in case with packages just imported from Nixpkgs.

4 Likes

A lot of packages you are listing are unfree or have unfree dependencies which may explain why you are being hit by regressions excessively hard – we have difficulty providing CI for unfree packages.

10 Likes

I’d go even further than this. As a result of issues being harder to observe and summarise (and, I’d say, understand) integration between the components will be even more difficult while the components themselves will be free to move faster - ie, require even more integration work.

The end result will be nixpkgs and NixOS fracturing into irreconcilable shards where all further contributions are either even more work or even more risky than they are now.

To get a real benefit from splitting nixpkgs up (using flakes or any other mechanism) you need abstractions by which the resulting pieces can interact with each other. In other words, compartmentalization is not abstraction.

Rich Hickey’s “Simple Made Easy” (https://www.youtube.com/watch?v=SxdOUGdseq4) is a great talk on this subject. I don’t know if it offers any direct answers that help nixpkgs development but it has a lot of ideas that could perhaps be synthesized into a solution for nixpkgs development.

17 Likes

This sounds more like a licensing issue, that a technical issue, if hydra cannot build these package, how can CI fail.

Nix likes source code, in fact it really likes it, so providing close source or pseudo open source code, then it’s a challenge!

I didn’t know what a merge train was, but I’ve looked it up, seems like a lot of coordination and effort.

Maybe that effort can be better used to write strongly worded letters to companies proving pseudo open source projects and getting them to actually be open source… (fat chance).

@ samuela , i can sympathize with your situation… but don’t give up, there must be a way around this, technical, social, political or a combination of all them.

For the mean time, I’ll stick to the ‘soul train’. https://www.youtube.com/watch?v=lODBVM802H8

1 Like