New Merge Policy for an always-green hydra

I don’t think merging the break to master and waiting for all of the maintainers of dependant packages to fix it is the best approach here.

Ever tried to bump glibc, gcc or openssl? Getting every single leaf-package to build (and also to work which is a totally different story btw) is just not practical. A better solution would be to fix Hydra’s email notifications and to re-enable it after that.

I agree with @jonringer here. We also have ZHF iterations for a reason IMHO.

Also it seems as Hydra-related impurities are forgotten here: sometimes it happens that a package breaks due to too much load on a builder (I guess) or something fails with some kind of Illegal instruction error depending on the builder’s CPU. Another story are timed out tests or the fact that we don’t have too much aarch64 machines so a lot of those builds get cancelled occasionally.

5 Likes

I still think a mixed-responsibility approach is best here:

  • The author of the change is responsible for notifying maintainers of broken reverse-dependencies before the change is merged and give them reasonable time to fix the breakage.
  • The maintainers are then responsible to implement a fix.
  • The author is responsible to mark remaining packages as broken after that reasonable time.

That should be scalable and maintain the always-green (assuming “mark as broken” counts as green).

Edit: Of course the first step could work together with some sort of staging system and (semi-)automatic notifications. The before is key here though.

9 Likes

I agree. I think with some decent tooling for at least notifications, and ideally also for marking as broken this would be a good solution.

1 Like

Is there an issue or discussion for things like this? I have seen it brought up a couple of times in this discussion and it seems like this is a problem that shouldn’t be too hard to solve.

I would much rather have a master branch that is slightly broken than have many long lived PRs that may overlap in efforts. Nixpkgs is already notorious for PR staleness, I would rather have more ways to get stuff merged in quickly, than try to maintain a fragile house of cards.

Also, people usually discover issues because it’s in master, not because it’s in a PR. This might be remedied through notifications, but that 's still an “opt-in” subscriber model, where-as master is more likely to uncover broken packages.

Most of these situations are from PRs not in a state to be merged, some get forgotten, and rarely some just become massive and those take more time, energy, or resources than a committer is willing/able to dedicate (some of the pytorch PRs, acme restucture and others)

This is a huge problem for a lot of leaf packages. For the python packages, I usually will fix packages if I see that they are broken in another review, but most of the cases are that the maintainer hasn’t been active for years.

At least for me, marking something broken is that it’s non-trivial to get the package back into a building state. Personally, I think this distinction is almost a “marked for removal”, unless someone want to take up maintainer ship. For python packages, I’ve begun removing packages that have been marked broken for more than a release (e.g. python3Packages.zope_i18n: remove due to prolonged breakage by jonringer · Pull Request #95351 · NixOS/nixpkgs · GitHub)

I absolutely agree, and it’s been brought up many times. We should probably come up with 1 or 2 scenarios that people are likely to adopt, and implement it.

Previous discussions:

  • many others, just too lazy to find them all.
5 Likes

Yes, this. It’s hard to see now what breakage there is in staging-next. We should have a an aggregate job set for staging-next and staging where the staging one is smaller in scope.

To be clear, in my scenario “reasonable time” would be something like 1 week. If its still broken after that that’s fine, it will just be marked as such. That will not prevent all breakages, but it will prevent the easily preventable ones. The rest ist just as it is now, just properly marked (avoids wasting everybodies resources).

For me, that is the most frustrating part about nixpkgs. As a maintainer it feels like and endless game of whack-a-mole. Its stressful. I would much prefer to know of breakages before they happen and have some opportunity to fix them.

I am not sure that is a good general policy: If someone wants to re-add a package, its often much easier to start from a broken version than to start from scratch. There is not much cost in keeping those broken versions, and discovery is much easier when they are in the tree. That said, this is not the right place for this discussion and I don’t
care that strongly.

Yes, I agree that is the main issue. There are some parts of the tooling that are likely uncontroversial, but someone would have to step up and implement them. The merge train sounds great, but would likely take quite a while to implement. So it would probably be better to start with lower-hanging fruit.

3 Likes

At which point you still shift work from the package maintainer of the broken package to the core package maintainers. If as a package maintainer you choose to rely on a third party (a dependency), you take a risk and should be able to put in the effort to handle changes. Core package maintainers do not choose to maintain reverse dependencies.

We are a community and in the end it is a matter of being social with each other. We should be careful with expecting too much from others. I think having guidelines on what can be expected when making large breaking changes in Nixpkgs is something that would be good to have. Perfect topic for an RFC!

8 Likes

Maybe I’m just jaded, but that’s just current state of software packaging. And you have spectrums of this, where on one end you have the python ecosystem which is breaking all the time. And then you have rust which can handle many conflicting versions of a dependency in a single build, so it almost never breaks.

For people using release channels, I don’t think is much of an issue. Although still somewhat exists.

For people using unstable, you’re able to do the fix, test using whatever command that failed. Then push a PR with the fixes. Usually small fix-up PRs get merged in a few hours as they are easy to review. Is this the best workflow? no. But it only happens to me maybe once a month. Then again, I don’t run much “exotic” software, so I’m not very likely to be broken

I am commenting without reading the entire thread sorry (nor do I know much about hydra) but it seems to me that if you do that then that’s an immediate low hanging fruit- map from packages to tests so that this can be done in an automated fashion, this would be a stepping stone towards kevin’s goal.

I agree, one’s usually in a mindset when making a change and if a review or failure comes a week later it’s often not feasible to work on it.

We can already do this to a large extend with passthru.tests, which ofborg already builds:

https://github.com/NixOS/ofborg/pull/410

It’s just missing from some derivations. Also, I don’t think it is supported/used by nixpkgs-review yet, though I may be mistaken:

https://github.com/Mic92/nixpkgs-review/issues/77

This is easily the most frustrating part about using nixpkgs for me. My configuration doesn’t build most of the time, and it’s always just one or two packages.

Sure I can merge a fix within the hour. But then it often takes days for the channel to update. And by then something else could be broken.

It would be great to know about the failures earlier so that I can fix them. Long ago, hydra used to send out emails for failing packages (though only to maintainers). But this got scrapped unfortunately.
Ideally, I could upload my nixos configuration somewhere, and would get an email as soon as some dependency fails to build on hydra.

10 Likes

I don’t see why it shifts responsibility to the core package maintainers. I agree with you entirely, package maintainers choose to rely on third parties and should be responsible to keep their packages working. That is exactly what I propose. The only differences to the status quo are that they should be given an opportunity to do that before the package is “burning” on master (which is much less stressful) and that the package is marked as broken if the maintainer does not do that.

I agree that an RFC would be good, but I will not be the one to write it. Whoever wants to write it probably should also be willing to build some tooling to make the process viable (it does not need to be perfect, but it should at least be possible to generate a list of maintainers to ping automatically & mark remaining packages as broken automatically).

6 Likes

I feel like this thread is undergoing scope creep.

The initial proposal only mentions having nixos-unstable be equal to nixpkgs-unstable. In other words, it only suggests that we should use something bors-like to only merge things that would end up in nixos-unstable not moving forward.

Can we try to focus the discussion here on this, and fork the discussions on making master always have only green packages in another thread, so that this initial thread could make progress?

I think it is reasonable to expect that core package maintainers should have to make sure that nixos is able to move forward after their changes — this is far from having to fix all leaf packages, it is only requiring fixing the leaf packages that would be blocking the nixos branch from moving forward, ie. the relatively few packages that are required for release-blocking tests.

Do other people have differing opinions on this, more restricted, question?

4 Likes

Right now hydra for nixpkgs-unstable and nxos-unstable are often stalled for weeks at a time because hydra builds or tests are failing.

That happens about once a year and it’s usually because tracking down the commit is hard, or it was merged part of a huge changeset.

This time you can see the timeline in nixos-unstable is blocked because of failing luks-format1 test · Issue #96479 · NixOS/nixpkgs · GitHub, which is mostly due to extensive package set it has to build to test the changes.

The offending commit was pinpointed quite quickly but never reverted - since it’s fixing another harsh bug.

I think it’s going to be hard to automate this, we do batching via staging and I wonder how that wasn’t caught there?

Are we sure this time huge delay happened due to lack of batching or something else? To really improve our workflow we shouldn’t assume but rather analyze what really went wrong in the process.

10 Likes

Since we’re not running NixOS tests on staging, how would you have expected this to be caught there? It seems to me that the problem is as simple as that—batching doesn’t accomplish anything if we don’t test the batches.

9 Likes

Then it’s clear what needs to be done - breaking master with mass rebuild change means it’s going to take time to fix.

Given I have seen the opposite opinion, too (code looks good and passthru.tests builds? merge), maybe we will end up with splitting the master branch with clear conflict of interest into cutting-edge, cutting-edge-next, and rolling-release, with cutting-edge being everything-goes to reduce merge conflicts, and cutting-edge-next serving to fix just the NixOS tests to have always-green rolling-release

Modifying Hydra to do backouts is not really practical, as discussed before. A glibc update will break many packages, but fixing the failures requires updating those packages, rather than backing out the glibc update.

I think automated, accurate regression testing (bisection) is what is really needed; this ability will benefit even the existing workflow. There is a paper from Google Who broke the build? on using a suspiciousness score to greatly speed up searching for regressions, so that many thousands of commits can be skipped. With reasonably fast regression testing in place (1-2 days to identify culprit commits), we can switch to treating master as another release branch:

  • All commits are first pushed to staging, after passing the current pre-submit level of testing.
  • Hydra builds staging every 1-3 days, then uses the fast search algorithm to look for breakages. Every commit has one of three statuses: fine, broke something, or still under investigation.
  • If the commit seems fine or it is a security update then it is cherry-picked to master.
  • Staging gets merged every few weeks to master, or sooner if there are acceptable levels of broken packages
  • Besides merging staging, regressions in master should be relatively infrequent, since the commits will already be vetted on staging.

This is similar to @7c6f434c’s proposal, cutting-edge=staging, cutting-edge-next=master, and rolling-release=nixos-unstable.

Some form of notification is necessary if a commit breaks something; the current system seems to be “browse the Hydra website and check all the packages you care about”, which is OK I guess. One idea might be to use GitHub issues, so each failing commit is its own issue, and then it pings the author of the commit and the maintainers of the relevant failing package(s).

1 Like

I’d just like to add that the most frustrating part of nixpkgs for me so far as a user and (noob) contributor has been seeing updates that I care about stuck on a backlogged release because dumbPackageIDontCareAbout has broken the build.

To get a sense of scale, what’s the CPU-hours cost of running Hydra? Let’s say on a small change to a “leaf” package vs a glibc upgrade? How much of a speedup should we expect from eg Towards a content-addressed model for Nix?

I do think that as Nixpkgs grows it will be inevitable that some packages will break and be unmaintained. It seems reasonable to me though that maintainers be given some lead time to update their packages before a breaking change is merged, and if they fail to update in time the package should just be labeled broken/unmaintained so that it no longer blocks releases. If the entire package ecosystem is required to move in lockstep, it seems inevitable that Nixpkgs will splinter under the weight of its own success.

As a quick aside, other package managers – npm, cargo, etc – have another solution to this dilemma: they simply version everything. Over time, package versions come and go and older versions are marked deprecated. So my glibc upgrade doesn’t immediately break your foobar build; you can just stay on the same version until you’re ready to upgrade.

2 Likes