Nixpkgs's current development workflow is not sustainable

samuela · April 20, 2022, 9:16pm

I maintain a number of packages in Nixpkgs, perhaps most notably the CUDA and JAX ecosystems. As much as I love Nix and Nixpkgs, I do not think that I can afford to continue maintaining packages within the current development model. Something needs to change.

The goal of this post is not to place blame on any individual or commit. Rather, I hope we can engage in a frank, honest discussion of shortcomings with our overall development workflow thus far.

In Nixpkgs, anyone at any time can break any of your packages. Recently, the latest staging, staging-next, and python-updates branches were merged into master resulting in thousands of new commits breaking about half of all packages that I require in my day-to-day job. And I have no recourse other than pleading with commit authors to fix the breakages they have caused. These changes are made without any substantive form of CI (nixpkgs-review, etc) and oftentimes without even a peer-reviewed PR. By the time these breaking changes reach master, they are bundled with hundreds of other commits (and breakages!) making bisections nearly impossible. How is a package maintainer expected to know when someone else breaks their build? And once they are made aware, how are they expected to sort through these hundreds of commits to debug?

The current system is simply not sustainable for downstream package maintainers, especially considering that these breaking changes come with no prior notice, no migration plan, and no alerting of failures. How is a package maintainer expected to reliably function in this environment?

I also sympathize with maintainers of widely-used packages, and those who work on massive, tree-wide changes. It’s incredibly stressful to understand all of the implications of your PRs. How are those maintainers expected to make changes with confidence?

We need a better system. IMHO a bare minimum is that we need a way to guarantee an “always green” status for at least some set of packages. I hope that this thread may kickstart a discussion.

I don’t claim to have any magical one-off solutions, but a few thoughts have come to mind:

Merge trains. Every large project beyond a certain size uses them. They are fundamental infra for Google, Facebook, the Rust project, and just about everyone else. See this video for an ok explainer. The current staging branch situation is basically like a poor man’s merge train anyhow: Everyone adds commits to a branch and then some human-run, error-prone process takes place to merge them all into master. Merge trains just automate this process and guarantee that master always stays green. What’s stopping us from running an opt-in merge train pilot program?
Is Nixpkgs too large? We have thousands upon thousands of packages, many of them of unclear maintenance status. As more and more packages are added and interconnected, the system becomes increasingly brittle and will eventually break down. Perhaps we need a federated model of some sort? Are flakes the answer?

Curious to hear others’ thoughts. I’m exhausted.

jonringer · April 20, 2022, 10:19pm

The goal of https://github.com/NixOS/rfcs/pull/119 was to help capture what needs to be tested for certain packages. Generally there’s only a few dozen packages that are likely to be sensitive to changes in a fundamental packages, however, some of those immediate downstream packages can have 100’s to 1000’s of downstream packages which can cause failure.

Having a single highly-interconnected body of software is different than have 10’s of thousands of loosely coupled packages. Ideally there’s more maintainers that look to staging, as it’s easier to fix regressions during staging-next than on master.

I agree that there should be some pruning process. The difficulty is that there’s no way to really tell package interest other than how maintainers a package has.

Related:
GitHub - jonringer/basinix: (WIP) Nixpkgs pull request review website was meant to aid in this a bit. Should make it really cheap to iterate on master targeting PRs, thus allowing maintainers more time to focus on staging-next.

samuela · April 20, 2022, 10:47pm

I think it’s a big ask that maintainers proactively pull staging, manually build each and every package they care about, and then report issues up the chain. And do that every day. Oh and do it for staging, staging-next, python-updates, haskell-updates, etc etc etc.

We need some automation here.

An idea that I’ve been bouncing around recently: a bot that periodically checks Hydra logs and checks how long each package has been failing to build. If they have been failing to build for over 2 weeks, send an automatic PR to mark them as broken. We can’t afford to have these unmaintained packages slowing us down…

I prob don’t have time to implement this in the coming months so I encourage someone to go steal this idea!

jonringer · April 20, 2022, 10:51pm

staging-next cadence is around once every 2 weeks. staging isn’t really meant to do a big review frequently because of all the rebuilds.

This seems like a logical approach

Sandro · April 20, 2022, 11:52pm

For python-unstable we used several hydra evaluations to make sure that the breakages are not to big. If just a few hundred packages break that is deemed acceptable if we are building over 5000 in total.

Beyond the first major bump I reviewed almost all changes for the last python-updates run and pushed several fixup commits for things I immediately noticed.

yes, especially python packages which has lots of badly maintained packages. Partly can this be blamed on how python packaging works and the other part is that the package set is to big.

We need to get rid of old and broken packages and probably some which are hard to maintain and to debug. Immediately flask and django plugins come to mind which often hold flask, werkzeug, etc updates back. How about we move them out of pythonPackages into their own little package set where we are more free to have multiple versions and do things different?

And about the literal big elephant in the room: things related to data science and AI…
At some point I started to ignore everything tensorflow related when doing nixpkgs-review because the time it took to build tensorflow was higher than the other hundred packages and I frankly didn’t care because the package was already broken at the time.
That some of these packages use bazel is also not doing us any favor and definitely not making things easier. Maybe we should deactivate tests of the normal build procedure and move them to passthru? I think the build times of most packages are acceptable, just the tests take forever. Maybe the ones which are no dependencies of other packages should be moved out of nixpkgs? I don’t really know.

We have staging runs instead. Most of the PRs we merge are highly unrelated and wouldn’t benefit from this and just consume a giant amount of resources. content addressability could help in this regard by reducing the amount of rebuilds. Also maybe the drv before running tests could be cached if there are just test failures.

That is not how things are supposed to run. staging is just a collection of things that cause giant rebuilds. At some point they land in staging-next. Then you wait a few days until compilers are cached and then you can build your packages and send PRs to fix breakages. If you are interested in python data science then python-updates is maybe something your could participate in and haskell-updates is something that can be usually ignored.

yes please. The next step would be to yank packages which where broken for to long.

nrdxp · April 21, 2022, 12:08am

Since flakes first came out I’ve been a bit frustrated with the fact that we seem to have come up with this great new scheme to reference a package from anywhere via the flake ref, but then we specify our inputs so verbosely.

If we really did want to move in the direction of splitting up nixpkgs, which may be inevitable as the size just continues to grow, it seems a simple list of flake refs would be a lot easier to maintain and quicker to iterate on than the highly verbose attribute set method.

why not just inputs = [ "github:some/flake" "gitlab:some/other" ] Then nixpkgs could just be a massive collection of flakes which are then all packaged together to save time and effort, and pushing and popping flakes could be trivially automated.

Anyway, just my two cents, since I see a huge package set like nixpkgs as nothing more than a giant list of packages in my head. Maybe we are just using the wrong data structure?

Or maybe I am just wrong, but the question is at least worth asking at this point.

Also, the idea of splitting up tests into their own derivation (if I understood Sandro correctly) is a great one. It’s very annoying to have to start a build from scratch. Also, with the new impure derivations, it might allow for more flexibility on tests that want network access without having to make the entire package impure.

I have been in favour of heavy automation of nixpkgs, from merge trains to automated package updates, but there seems to be heavy resistence for reasons I’m still not fully convinced are good enough, considering the sheer amount of code we have to sift through on a regular basis. I know people say there is no replacement for the human eye, but when the work is so overwhelming large, we tend to start overlooking things anyway, in an effort to get things done.

We definitely need more automation of some kind, but I guess which kind is still a hot topic for debate.

samuela · April 21, 2022, 3:12am

Agreed, all human processes are error-prone. That’s just the nature of being human. We should use automation to make our lives easier.

Who do we need to get approval from in order to pilot a small opt-in merge train?

samuela · April 21, 2022, 3:15am

The problem is what those 300 dependencies are and who depends on them. Some packages are 10x more important than others. Based on recent experiences large, important failures are consistently slipping through the cracks. So it doesn’t seem controversial to me that there’s room for improvement.

Pacman99 · April 21, 2022, 3:39am

In regards to splitting up nixpkgs, I think flakes as it is already provides the feature set for such an endeavor. Not to say it wouldn’t be difficult to do.

There could be a flake for just nixpkgs lib with no dependencies. Then a flake for stdenv which just depends on the lib flake. A flake above those two with the core packages which the majority of packages will rely on, like archlinux’s base and base-devel; this could include linux packages or those could be separated into their own flake.

Each language ecosystem can get its own flake with dependencies on those first three flakes. Then nixpkgs itself can include the rest of the user facing applications and depend on any or all of the above flakes. There could be splits for any groups that make sense, like perhaps a flake for all matrix related packages or cuda packages.

NixOS itself can be its own flake like nix-darwin or home-manager. Many of the non-core modules could be moved into their own flakes.

And I’m sure there are many package groups I’m missing from nixpkgs, but I expect the idea will still hold.

This would mean that maintainers of any specific flake, for example cuda maintainers, have full control over when underlying dependencies get updated due to input locking. They could hold off on updating the python packages flake input until they fix any bugs. They can also determine their own testing and merging workflow.

Of course there are various drawbacks, like possible dependency cycles, more hydra jobsets to watch, split documentation, and worse discoverability of projects. And I might be envisioning a split thats way more drastic then necessary. It could just be a matter of pulling out some things from nixpkgs. Either way I think its something to consider as a possible improvement with flakes in mind.

ryantm · April 21, 2022, 3:48am

I think that depends a lot on the implementation details.

samuela · April 21, 2022, 5:26am

I was thinking of a simple bors setup that lets people choose to join a merge train onto master, instead of merging to master directly.

jonringer · April 21, 2022, 5:45am

master PRs were never an issue, most of the PRs are feasible to do a nixpkgs-review on

samuela · April 21, 2022, 6:55am

Agreed, I proposed master since I thought it would be simple first demo of the system, not that it would immediately address the instability of staging. But we could alternatively apply bors to the staging branch?

ajs124 · April 21, 2022, 9:53am

IMO one of the issues related to that is that nixpkgs does not have a well-defined model of what a maintainer actually is or does. E.g. we have lots of packages that list someone as a maintainer that hasn’t touched that package in years, committed anything to nixpkgs or been active on github at all.

We also have a lot of extremely important packages that don’t only lack an active maintainer, but lack anyone that claims to be the maintainer of said package, at all. One example that comes to mind is openssl, for which I’ve been thinking of stepping up, but even if I did that just leads into the next point.

What even are the rights and duties of a maintainer? E.g. even for packages I actively maintain and where I am listed as a maintainer, other committers sometimes just commit stuff without even waiting a day or two for my response, so why am I even listed there, if I don’t have a say in what changes about the package?
At the same time, I’m probably listed as a maintainer for packages that I neglect, because I stopped using them or don’t consider them as important. So am I not doing my duty as a maintainer and should remove myself from that package or be removed? Probably yes.
Both points go hand in hand, because maintainers neglecting packages leads to the expectation that they don’t need to be waited for, because they won’t respond anyways.

IMO if we had all of this figured out, which I think most other distributions do, this would make a lot of things much easier. We could for example just say “This package lacks a maintainer and has been broken for two months/two releases/four years, lets remove it”. Or “r-ryantm performed three updates to your package that you were pinged on but did not respond to in two weeks, so you will be removed as a maintainer from said package”.

The specifics are obviously up for debate and maybe this is just a completely wrong observation, but it’s my theory of something that could help us solve this issue, as a community.

TL;DR: What even is a maintainer?

jtagcat · April 21, 2022, 12:19pm

Disowning and (rigidly) removing packages like this sounds like stalebot to me. We have to make sure automation doesn’t abuse or threaten humans (if you don’t respond in n time during your future unrelated crisis).

In the human condition, 2 weeks is nothing. Multiple months I could reason more with.

Yet, the show will go on. Every solution I’ve thought of has major problems, and I’m not involved enough to have a real say here.

From https://hackerspace.design, I’ll mention that something has to be done, choosing anything is better than nothing. Decisions can be monitored for their success, and later changed or improved upon.

cmm · April 21, 2022, 12:22pm

a minor point re: pruning unmaintained packages (however that ends up being defined): it would be desirable to have that pruned packaging code still be easily findable without git archaeology, so people who decide to package something that has been pruned don’t start from scratch by accident. maybe just mark such packages as unmaintained (or move them to some designated “attic”) and skip their evaluation, but don’t outright remove their code

ryantm · April 21, 2022, 1:16pm

My feeling is that if it is an opt-in experiment you don’t need much permission beyond a couple of committers willing to merge the successful trains into master/staging.

7c6f434c · April 21, 2022, 2:51pm

Hm, I am not sure that all of the proposed solutions are relevant to the initial problem, and some are likely to make the mess worse. I mean, after-marking-broken sounds useful (and, well, objective, it is broken at the time of marking), but isn’t likely to reduce downstream breakage much.

Splitting Nixpkgs will only «help» as in make this issues harder to observe/summarise. As for selecting a core set that have to build — well, there are channels that are blocked by packages not building, and if they get stuck, this is investigated… Or is the goal to make some packages officially «stable-only», with periodic fixing up before the next release?

I support expansion of the practice of keeping the previous versions of some packages in the set and migrating the dependents over a longer time while whatever plays nice gets to use the new version of the dependency quickly. We are not using it enough. (And there is of course the opinion it is already used too much)

As for merge trains, I guess if someone has the build capacity for this, indeed you can just form a pool of committers known to merge relatively carelessly (like me…) to mention when the train needs a merge, in hopes that someone will notice relatively soon?

kamadorueda · April 21, 2022, 6:52pm

Strongly agree

There are 200k python projects in PYPI, so 5k is not that big in comparison, 5k is big for us due to our internal architecture, but a good arch should allow to do the 200k in linear effort

Strongly agree

I’ve been experimenting with the multi-version architechture here: GitHub - on-nix/python: Extensive collection of Python projects from PyPI, for Nix! , and I confirm this is the path forward to scalability. There are a few ideas we can take from there and incorporate in Nixpkgs

In some sense this just hides the problem: our arch is not linearly scalable

Ideally an architecture should allow for a package to work for-ever once packaged, the only failure reasons should be external and not internal, this is: a url that went 404, etc. But currently our main failure reason is that touching a package has side effects in other packages, normally due to version constraints and compatibility

An insert-only arch like on-nix/python is linearly scalable, Nixpkgs on the other hand is modify-in-place, and thus unsustainable after some size

jonringer · April 21, 2022, 8:50pm

One benefit of the current process is that we serve as a crucible for python updates, and can alert respective upstreams about issues which they may not be aware of due to extreme pinning being the norm.

Ideal scenario would be something like this issue where an upstream was alerted, and eventually took care of updating to a maintained fork of a dependency.

However, many python library owners don’t care about how their package interacts with the rest of the ecosystem, so sometimes you get interactions like this.

But, this process of alerting upstreams does create a “only distros pester me with addressing dependency technical debt, obviously the issue is with distros” environment.

Personally I’ve resigned usage of python to scripts which use only the standard libraries, or well maintained dependencies like requests. Anything more, and the python ecosystem becomes borderline unmaintainable given enough time and cpython interpreter versions.

Beware: rant

What do python packages have to do with the maintainability of Nixpkgs?

A few hundred of them are used in just the build process of certain packages, which may even export python related packages; so the dependencies need to be packaged in some manner.

And of the PRs in nixpkgs, 20k out of 143k of them have the topic: python label; so the burden of maintaining the package set (in it’s current state) is quite high. Additionally, python ecosystem doesn’t really allow you do something like, “pin docutils for the sphinx module, and everything should work fine”. No, some other package will bring in docutils; so if that other package and sphinx exist in the same environment and if the unpinned docutils gets ordered before sphinx’s pinned version then things will fail.

People usually rebuttable the pinning with venv-like solutions, but those only work on a per-application basis, venv’s can’t freely compose with other environments with mutually exclusive version bounds.

I personally would be fine doing a mass pruning of the nixpkgs python package set to just enable applications, and move most of the module specific logic into something like poetry2nix or dream2nix; where there’s more freedom to determine “python landscape” since it’s a per-project concern.

Also, the pip version resolver is just a bandaid. It works well for small to medium projects, but for large projects (such as home-assistant), it takes 12+ hours to resolve the dependencies.