Nixpkgs's current development workflow is not sustainable

turion · April 23, 2022, 2:43pm

For starters, all checks that ofborg does (including those triggered by PR reviewers).

Ekleog · April 23, 2022, 2:53pm

Some people (including me) would actually love to see hydra bumping nixos-unstable being replaced by a bors-like system.

There is basically no drawback (no commit will appear in unstable unless hydra passes anyway, the biggest drawback I can see would be that nixos-unstable-small would no longer exist). And it would mean that all PRs ready for merging would actually land even if another PR were to break hydra, thanks to bors’ dichotomy-based bad PR identification strategy. This incidentally means that the need for nixos-unstable-small would probably just go away anyway, as I expect such a hydra to move much faster as it’d basically have automated rollback.

Adding other checks would probably make sense later on, but as a first step it seems to me like this would be a net win, plus it’ll prepare the infrastructure for when we will actually want to implement more checks later.

The one big issue is actually infrastructure, as I seem to understand plugging into hydra is not actually easy, and we probably don’t want to be rebuilding all these things twice. But maybe building enough to check hydra would be green is not actually that expensive and it could reasonably be done outside hydra?

7c6f434c · April 23, 2022, 3:08pm

7c6f434c:

Well, the question then becomes under what condition to promote/revert the batch…

For starters, all checks that ofborg does (including those triggered by PR reviewers).

Note that triggered builds failing is reported as gray/«neutral» in the checks.

bhipple · April 23, 2022, 3:11pm

This doesn’t solve the root problem for the distro as a whole, but as an individual, what I do for a few things is:

Stay on NixOS YYMM for my packages
Use overlays to selectively bump versions of packages I care about when I need something newer
Go through the 2x a year batch fix process when bumping NixOS versions, which usually turns out to be less painful than expected, because the community does a great job coming together for Zero Hydra Failures.

That’s a little more manageable than bumping and potentially fixing every day. But a nice thing about nix is that you can try the bump for a pkg build, and it if fails, just keep using the old channel / nixpkgs commit that you know works.

7c6f434c · April 23, 2022, 3:51pm

This incidentally means that the need for nixos-unstable-small would probably just go away anyway, as I expect such a hydra to move much faster as it’d basically have automated rollback.

No, because whatever you do, a security stdenv rebuild takes more of a forever for full nixos-unstable than for nixos-unstable-small.

(Some people also value nixpkgs-unstable as separate from nixos-unstable for yet other uses)

7c6f434c · April 23, 2022, 3:52pm

That’s a little more manageable than bumping and potentially fixing every day. But a nice thing about nix is that you can try the bump for a pkg build, and it if fails, just keep using the old channel / nixpkgs commit that you know works.

Worse, you can even have a single environment with packages imported from multiple different checkouts (requires care, there are some catches with plugins etc.)

Ekleog · April 23, 2022, 4:43pm

While I agree with you in principle, nixos-unstable-small has been red for days over the last month https://status.nixos.org/ ; which is probably (hydra doesn’t easily give the build time for an eval) worse than the situation where all nixos-unstable tests would need to be performed. As for nixpkgs-unstable, TBH seeing how it’s more red than nixos-unstable while it’s supposed to have fewer tests run, I’m even less sure it’s a good thing to even keep it.

Anyway, there definitely will be use cases that would be less well-served after such a change (https://xkcd.com/1172/ comes to mind), but I don’t think security fixes would be one of them, because if master is red then security updates currently get delayed forever.

On the other hand, IIRC rust-lang’s bors has a “high-priority merge” command that allows attempting a PR merge straight from master by pausing all non-high-priority merges (so if one fails there’s less dichotomy to do), which would likely bring security updates faster than current nixos-unstable-small on average, and not slower.

delroth · April 24, 2022, 1:25am

My primary issue with breakages due to staging-next changes is that it doesn’t even feel like there’s an effort made to involve the maintainers currently. It’s very common as a package maintainer to discover a staging breakage only hours/days after it gets merged to master, not before that. Why? In cases where this happened to my packages, Hydra was clearly showing the breakage before the merge, but there was no attempt made to ping me and give me a heads up. We just merge broken stuff and fix breakages one by one after the fact when users notice. Of course, we likely can’t block on maintainers fixing new breakages while keeping the current velocity, but I suspect giving a 5-7 days heads up before merge would significantly improve the state of master.

But I suspect that many of the problems mentioned here really aren’t about staging but more about the mass-python-module-upgrades that regularly happen there. I maintain a few Python modules (not as much as I used to…) and frankly I just don’t understand how the Python packages set is maintained in nixpkgs. Someone will randomly bump the version of my package as part of a bulk update that goes in staging, without going through the standard review process, without asking for my review as the maintainer of the package, and usually without even checking that it still works. It’s incredibly demotivating when you feel like you do a careful job as a Python module maintainer (making sure you don’t break your dependencies, etc.) but then someone just goes and breaks your work without even talking to you. I don’t understand why we bulk-edit anything that has a maintainer without giving them a few days/weeks to do the work themselves.

Finally, there’s also a vicious circle at play: staging is so regularly broken that I just hate having to do any kind of work there. If it was only having to do a lot of rebuilds, fine, whatever, I’ll waste a bit of compute power. But usually it’s 6h of rebuilds only to notice that something completely obvious has been broken for 3 days and wasn’t reverted, blocking you from doing any kind of work on staging. I suspect that this could be improved by more reverting of broken or semi-broken commits on staging, but I don’t have the experience there to judge solutions.

samuela · April 24, 2022, 8:55am

@delroth, I couldn’t have said it better myself. 100%.

To give just one example from earlier today:

Person A submits pull request (targeting master!) affecting a package I maintain.
I request changes on the content of the PR, and point out build failures.
Person A responds (literally) “I am lazy and normally no one takes a look anyway.”
Person A ignores my requested changes, merges their own PR on my package without my consent (or any other approval reviews!) and before CI has completed.

It’s enough to make anyone go crazy. Why be a maintainer if people are just going to bulldoze my work anyways?

Artturin · April 24, 2022, 10:36am

@Sandro removed the changes which caused the most rebuilds from the pr before merging Python fixes by SuperSandro2000 · Pull Request #169681 · NixOS/nixpkgs · GitHub

samuela · April 24, 2022, 10:55am

I wasn’t aware that changes had been removed before merging. Also, I was intending to not call out the exact PR/individual, but yes that was the PR.

7c6f434c · April 24, 2022, 11:14am

I guess there is also «a specific named thing is a problem, a number of things is a statistical measure» effect. To get an answer when breakages to specific things are removed from a PR, it might help to make requests about the specific things before naming or not naming aggregate counts.

rehno · April 24, 2022, 5:27pm

I think that the community ought to consider packaging and pinning as separable tasks with separate workflows.

Correct me if I’m mistaken (?) but I feel that currently there’s an (implicit?) rule/guideline that there should be one version of every package. However this seems to lead to PRs that cause a lot of contention.

For example, take my recent PR bumping pytorch-bin from 1.10.0 to 1.11.0: pytorch-bin: 1.10.0 -> 1.11.0 by rehno-lindeque · Pull Request #164712 · NixOS/nixpkgs · GitHub

I’m an infrequent contributor to nixpkgs, and to be honest when I do contribute I’m kind of slow and can’t keep up with the rapid changes:

It’s pretty easy for me to contribute new/bump existing packages. E.g. adding pytorch-bin_1_11_0 when pytorch-bin = pytorch-bin_1_10_0; already exists
But you may not want me to do pinning because:
A. I don’t really know much about some other packageset and how its build systems work. It takes a sizable effort to get up to speed.
B. I’m just never really going to have a huge amount of motivation to dive into the downstream effects of my change that I’ll never experience first hand.
C. Even when I do make the effort to make sure everything keeps building consistently I might not have enough experience to do a good job of testing and knowing what I’m missing.

The current situation seems to me leads to huge PRs which has the following problems:

long comment threads and lots of context that you have to read through
much of that context is already stale by the time you read it
there’s a cadence mismatch between authors, maintainers and reviewers - different people work at different times etc
large amounts of back-and-forth and hopping between tasks leads to mental fatigue
nixpkgs itself is a moving target which changes frequently

Instead I’d much rather contribute separate granular PRs for pytorch-bin_1_11_0, torchvision-bin_1_11_0, torchaudio-bin_1_11_0, etc. Then, finally, when all that work is done it’s possible to make a much more focussed effort to roll forward the entire pytorch based ecosystem, involving many more stakeholders.

(I know this “kicks the can down the road” so to speak, and I’m an infrequent contributor as I said. However, I personally think that you eat an elephant one bite at a time.)

Sandro · April 24, 2022, 9:40pm

Because there are so many people involved that you can’t even ping them in one GitHub comment.

Which only happen because people don’t bump their maintained modules and way to many python packages are not even maintained by the maintainer.

First question is: How long was the update out before it got bumped? If it was out for multiple weeks then you’re not very fast updating your packages. Second: Feel free to do PRs against python-updates and link them in the PR or on matrix. Even a revert is fine if the update is just breaking to much yet and the ecosystem still needs to caught up. Third: For making sure the package still works: Make sure you are executing tests and have a pythonImportsCheck. If the Pypi tarball ships no tests just fetch straight from source. python-updates is being build on hydra and regressions are checked and being taken care of if they affect more than one or two packages.

There are some updates which just need to happen and you need to see what is broken and fix it afterwards. For example when glibc, binutils or gcc is updated. You can’t revert those updates just because some library is broken.

Yes, like any PR should that is not rebuilding 500+ packages. I moved the pkgconfig adoption out of it because based on a quick grep I thought that is not going to rebuild that much which didn’t turned out differently.

of packages that where already broken on master and which I didn’t touch in the PR. We cannot expect people to fix per-existing failures of reverse deps.

Yeah, I moved pkgconfig from buildInputs to nativeBuildInputs because I was in the process of adopting pkgconfig and enabling tests on which where disabled with a comment to enable them soon on the next update over a year ago. First I wanted to fix reverse deps but then I went into a rabbit hole of not very well maintained packages which where broken in multiple ways and gave up.

Yes, because python is limited in that way and having different versions of the same package in the same environment causes massive issues which are hard to detect. e.g. packages end up empty because Python thinks they are already installed which technically they are.
Quick fix if you have an application without reverse deps: Move it out of pythonPackages and use older versions there.

samuela · April 24, 2022, 10:42pm

I think this is exactly to @delroth and @rehno’s point though: We need more stability, and those are exactly the kinds of packages that we ought to just have multiple versions of. We (NixOS/cuda-maintainers) have multiple versions of cudaPackages and it works great.

jeff-hykin · April 25, 2022, 1:36am

I’ve been waiting for over two years for this day. I can help solve the discoverability, and we make it better than anything pre-flake packages could’ve dreamed of.

I’ve opened up a separate discussion to talk about discoverability

Sandro · April 25, 2022, 2:27am

Thats only working because those packages are only used by a very small subset of packages compared to for example pytest. We had multiple versions of pytest for a while and as soon as two different versions mixed everything went downhill with weird and hard to debug errors. Or for I think two weeks we had an empty python3Packages.libcst because I messed the inputs up and no one noticed. If you have a good candidate for a package where we need two versions we can discuss that and see if we can find a solution.

One obvious candidate I can think of now is numpy or tensorflow but that would forbid having a package that anywhere in its dependency tree has any other version of the same package.

rehno · April 25, 2022, 2:58am

If you have a good candidate for a package where we need two versions we can discuss that and see if we can find a solution.

I’d say that, in my case, often as not the final expression I care about is something that can’t be found in nixpkgs at all (because it’s work related, etc). So I can override e.g. pytorch-bin = pytorch-bin_1_11_0 in my private overlay (globally) without ever worrying about everything else that would break in nixpkgs.

(I reckon the majority of people using NixOS have an overlay with overrides like this, even at home… probably?)

samuela · April 25, 2022, 3:13am

If every user has to maintain their own overlay, that kinda defeats the point of nixpkgs… which is sad but also reflective of my user experience.

rehno · April 25, 2022, 3:20am

If every user has to maintain an overlay, that kinda defeats the point of nixpkgs… which is sad but also reflective of my user experience

Oh I don’t know about that, I’m very grateful for all the packages nixpkgs provides me that I don’t need to mess around with. Life was hard before nix!

But again, just to reiterate: for me it’s about more granular PR’s, not the pinning process (which is still very valuable to be clear). The overlay is just my bandaid for when I don’t have time to go fiddle around with unrelated stuff in nixpkgs.