New Merge Policy for an always-green hydra

kevincox · September 5, 2020, 10:30pm

I agree. I think with some decent tooling for at least notifications, and ideally also for marking as broken this would be a good solution.

kevincox · September 5, 2020, 10:32pm

Is there an issue or discussion for things like this? I have seen it brought up a couple of times in this discussion and it seems like this is a problem that shouldn’t be too hard to solve.

jonringer · September 6, 2020, 1:31am

I would much rather have a master branch that is slightly broken than have many long lived PRs that may overlap in efforts. Nixpkgs is already notorious for PR staleness, I would rather have more ways to get stuff merged in quickly, than try to maintain a fragile house of cards.

Also, people usually discover issues because it’s in master, not because it’s in a PR. This might be remedied through notifications, but that 's still an “opt-in” subscriber model, where-as master is more likely to uncover broken packages.

Most of these situations are from PRs not in a state to be merged, some get forgotten, and rarely some just become massive and those take more time, energy, or resources than a committer is willing/able to dedicate (some of the pytorch PRs, acme restucture and others)

This is a huge problem for a lot of leaf packages. For the python packages, I usually will fix packages if I see that they are broken in another review, but most of the cases are that the maintainer hasn’t been active for years.

At least for me, marking something broken is that it’s non-trivial to get the package back into a building state. Personally, I think this distinction is almost a “marked for removal”, unless someone want to take up maintainer ship. For python packages, I’ve begun removing packages that have been marked broken for more than a release (e.g. python3Packages.zope_i18n: remove due to prolonged breakage by jonringer · Pull Request #95351 · NixOS/nixpkgs · GitHub)

I absolutely agree, and it’s been brought up many times. We should probably come up with 1 or 2 scenarios that people are likely to adopt, and implement it.

Previous discussions:

many others, just too lazy to find them all.

FRidh · September 6, 2020, 6:20am

Yes, this. It’s hard to see now what breakage there is in staging-next. We should have a an aggregate job set for staging-next and staging where the staging one is smaller in scope.

timokau · September 6, 2020, 8:43am

To be clear, in my scenario “reasonable time” would be something like 1 week. If its still broken after that that’s fine, it will just be marked as such. That will not prevent all breakages, but it will prevent the easily preventable ones. The rest ist just as it is now, just properly marked (avoids wasting everybodies resources).

For me, that is the most frustrating part about nixpkgs. As a maintainer it feels like and endless game of whack-a-mole. Its stressful. I would much prefer to know of breakages before they happen and have some opportunity to fix them.

I am not sure that is a good general policy: If someone wants to re-add a package, its often much easier to start from a broken version than to start from scratch. There is not much cost in keeping those broken versions, and discovery is much easier when they are in the tree. That said, this is not the right place for this discussion and I don’t
care that strongly.

Yes, I agree that is the main issue. There are some parts of the tooling that are likely uncontroversial, but someone would have to step up and implement them. The merge train sounds great, but would likely take quite a while to implement. So it would probably be better to start with lower-hanging fruit.

FRidh · September 6, 2020, 9:16am

At which point you still shift work from the package maintainer of the broken package to the core package maintainers. If as a package maintainer you choose to rely on a third party (a dependency), you take a risk and should be able to put in the effort to handle changes. Core package maintainers do not choose to maintain reverse dependencies.

We are a community and in the end it is a matter of being social with each other. We should be careful with expecting too much from others. I think having guidelines on what can be expected when making large breaking changes in Nixpkgs is something that would be good to have. Perfect topic for an RFC!

jonringer · September 6, 2020, 2:26pm

Maybe I’m just jaded, but that’s just current state of software packaging. And you have spectrums of this, where on one end you have the python ecosystem which is breaking all the time. And then you have rust which can handle many conflicting versions of a dependency in a single build, so it almost never breaks.

For people using release channels, I don’t think is much of an issue. Although still somewhat exists.

For people using unstable, you’re able to do the fix, test using whatever command that failed. Then push a PR with the fixes. Usually small fix-up PRs get merged in a few hours as they are easy to review. Is this the best workflow? no. But it only happens to me maybe once a month. Then again, I don’t run much “exotic” software, so I’m not very likely to be broken

SRGOM · September 6, 2020, 2:59pm

I am commenting without reading the entire thread sorry (nor do I know much about hydra) but it seems to me that if you do that then that’s an immediate low hanging fruit- map from packages to tests so that this can be done in an automated fashion, this would be a stepping stone towards kevin’s goal.

I agree, one’s usually in a mindset when making a change and if a review or failure comes a week later it’s often not feasible to work on it.

danieldk · September 6, 2020, 3:09pm

We can already do this to a large extend with passthru.tests, which ofborg already builds:

It’s just missing from some derivations. Also, I don’t think it is supported/used by nixpkgs-review yet, though I may be mistaken:

gebner · September 6, 2020, 5:49pm

This is easily the most frustrating part about using nixpkgs for me. My configuration doesn’t build most of the time, and it’s always just one or two packages.

Sure I can merge a fix within the hour. But then it often takes days for the channel to update. And by then something else could be broken.

It would be great to know about the failures earlier so that I can fix them. Long ago, hydra used to send out emails for failing packages (though only to maintainers). But this got scrapped unfortunately.
Ideally, I could upload my nixos configuration somewhere, and would get an email as soon as some dependency fails to build on hydra.

timokau · September 6, 2020, 9:21pm

I don’t see why it shifts responsibility to the core package maintainers. I agree with you entirely, package maintainers choose to rely on third parties and should be responsible to keep their packages working. That is exactly what I propose. The only differences to the status quo are that they should be given an opportunity to do that before the package is “burning” on master (which is much less stressful) and that the package is marked as broken if the maintainer does not do that.

I agree that an RFC would be good, but I will not be the one to write it. Whoever wants to write it probably should also be willing to build some tooling to make the process viable (it does not need to be perfect, but it should at least be possible to generate a list of maintainers to ping automatically & mark remaining packages as broken automatically).

Ekleog · September 6, 2020, 10:00pm

I feel like this thread is undergoing scope creep.

The initial proposal only mentions having nixos-unstable be equal to nixpkgs-unstable. In other words, it only suggests that we should use something bors-like to only merge things that would end up in nixos-unstable not moving forward.

Can we try to focus the discussion here on this, and fork the discussions on making master always have only green packages in another thread, so that this initial thread could make progress?

I think it is reasonable to expect that core package maintainers should have to make sure that nixos is able to move forward after their changes — this is far from having to fix all leaf packages, it is only requiring fixing the leaf packages that would be blocking the nixos branch from moving forward, ie. the relatively few packages that are required for release-blocking tests.

Do other people have differing opinions on this, more restricted, question?

domenkozar · September 6, 2020, 10:01pm

Right now hydra for nixpkgs-unstable and nxos-unstable are often stalled for weeks at a time because hydra builds or tests are failing.

That happens about once a year and it’s usually because tracking down the commit is hard, or it was merged part of a huge changeset.

This time you can see the timeline in nixos-unstable is blocked because of failing luks-format1 test · Issue #96479 · NixOS/nixpkgs · GitHub, which is mostly due to extensive package set it has to build to test the changes.

The offending commit was pinpointed quite quickly but never reverted - since it’s fixing another harsh bug.

I think it’s going to be hard to automate this, we do batching via staging and I wonder how that wasn’t caught there?

Are we sure this time huge delay happened due to lack of batching or something else? To really improve our workflow we shouldn’t assume but rather analyze what really went wrong in the process.

andersk · September 6, 2020, 10:36pm

Since we’re not running NixOS tests on staging, how would you have expected this to be caught there? It seems to me that the problem is as simple as that—batching doesn’t accomplish anything if we don’t test the batches.

domenkozar · September 7, 2020, 1:23pm

Then it’s clear what needs to be done - breaking master with mass rebuild change means it’s going to take time to fix.

7c6f434c · September 11, 2020, 4:02pm

Given I have seen the opposite opinion, too (code looks good and passthru.tests builds? merge), maybe we will end up with splitting the master branch with clear conflict of interest into cutting-edge, cutting-edge-next, and rolling-release, with cutting-edge being everything-goes to reduce merge conflicts, and cutting-edge-next serving to fix just the NixOS tests to have always-green rolling-release

Mathnerd314 · September 17, 2020, 8:07pm

Modifying Hydra to do backouts is not really practical, as discussed before. A glibc update will break many packages, but fixing the failures requires updating those packages, rather than backing out the glibc update.

I think automated, accurate regression testing (bisection) is what is really needed; this ability will benefit even the existing workflow. There is a paper from Google Who broke the build? on using a suspiciousness score to greatly speed up searching for regressions, so that many thousands of commits can be skipped. With reasonably fast regression testing in place (1-2 days to identify culprit commits), we can switch to treating master as another release branch:

All commits are first pushed to staging, after passing the current pre-submit level of testing.
Hydra builds staging every 1-3 days, then uses the fast search algorithm to look for breakages. Every commit has one of three statuses: fine, broke something, or still under investigation.
If the commit seems fine or it is a security update then it is cherry-picked to master.
Staging gets merged every few weeks to master, or sooner if there are acceptable levels of broken packages
Besides merging staging, regressions in master should be relatively infrequent, since the commits will already be vetted on staging.

This is similar to @7c6f434c’s proposal, cutting-edge=staging, cutting-edge-next=master, and rolling-release=nixos-unstable.

Some form of notification is necessary if a commit breaks something; the current system seems to be “browse the Hydra website and check all the packages you care about”, which is OK I guess. One idea might be to use GitHub issues, so each failing commit is its own issue, and then it pings the author of the commit and the maintainers of the relevant failing package(s).

samuela · September 17, 2020, 9:08pm

I’d just like to add that the most frustrating part of nixpkgs for me so far as a user and (noob) contributor has been seeing updates that I care about stuck on a backlogged release because dumbPackageIDontCareAbout has broken the build.

To get a sense of scale, what’s the CPU-hours cost of running Hydra? Let’s say on a small change to a “leaf” package vs a glibc upgrade? How much of a speedup should we expect from eg Towards a content-addressed model for Nix?

I do think that as Nixpkgs grows it will be inevitable that some packages will break and be unmaintained. It seems reasonable to me though that maintainers be given some lead time to update their packages before a breaking change is merged, and if they fail to update in time the package should just be labeled broken/unmaintained so that it no longer blocks releases. If the entire package ecosystem is required to move in lockstep, it seems inevitable that Nixpkgs will splinter under the weight of its own success.

As a quick aside, other package managers – npm, cargo, etc – have another solution to this dilemma: they simply version everything. Over time, package versions come and go and older versions are marked deprecated. So my glibc upgrade doesn’t immediately break your foobar build; you can just stay on the same version until you’re ready to upgrade.

kevincox · September 18, 2020, 2:14pm

I think we need to challenge this assumption. Blocking security updates for a long time doesn’t seem like the right solution to merging in big changes. I think the approach we should take is marking the failed packages as broken. This way the channel is not blocked. Of course I don’t think we want to mark half of the tree broken so we will need to continue to do something like staging for large changes, then merge (with marking packages broken) once we think it is at an “acceptable level of breakage”.

There are definitely blockers to this. For example right now even marking packages as broken is a lot of work. AFAIK there is no automated way to do this. I think making this more automated is probably one of the first dependencies of implementing this project.

This sounds very similar to my proposal! The major difference that I see is that you are proposing cherry-picking whereas I was only considering fast-forward style (so you only get merged if everyone in front of you also passes). However I don’t think this is a major difference as if your commit is green then your dependencies are probably fine to merge (maybe not independently, but apparently there are some fixes somewhere in the chain so we can still merge)

To me “acceptable levels of broken” has to at least include “hydra will publish the channel”. Otherwise this still ends up blocking security fixes (as the cherry-pick won’t be able to publish)

I hope to find some time to write up some points here. It is clear that there will be a number of dependencies to get to a merge-queue style setup however they don’t seem insurmountable to me. I think the first step will be to identify the blocking issues then we can start discussing possible solutions for each of those independently. I think even if this whole project doesn’t complete the dependencies can be useful in themselves.

Ericson2314 · September 18, 2020, 6:31pm

There are a benefits, starting from the lowest hanging fruit and going higher:

The first is simple: fewer mass rebuilds if we change one core package and the output is the same.

The second is more subtle: if we do like a stdenv change that shouldn’t change most things, we can speculatively do an all-parallel mass rebuild where we substitute the old versions of dependencies. If nothing in fact changed, we’re done!

Thirdly, we should have a mechanism like “runtime-only deps” so that packages just see headers and not SO contents at build time (e.g. with TAPI file, map file, linker script, etc.). Then any update which doesn’t change headers also should be OK.

And from there other long tail of tricks we can do with increasing difficulty to make things more incremental

I do hope we can get to a point where CI automatically bisects or even builds everything commit “it’s not rocket style” with this stuff. It will save soooo many human hours.