PLEASE fix ALL packages which were broken by the Qt-wrapping-changes

I actually was one of the first people who noticed these issues, and I had a similar sentiment that we should fix all of them semi-automatically before we released. However, the qt wrappers were not a breaking change in the first place. If you read the threads it is because a huge majority of packages were written incorrectly, often not using the qt specific deriver and callPackage function. Because of this it would be entirely unreasonable to ask one single person to fix every single expression for the maintainer who didn’t package it correctly in the first place.
I talked with numerous people on IRC in #nixos-dev and Thomas Tuegel as well, and we got a change that would abort builds that don’t use the supported deriver https://github.com/NixOS/nixpkgs/pull/70691, making all applications that fail at runtime to fail at build time. Then you could easily run a jobset and do a sed operation on all the newly broken packages. The issue for me is I didn’t really have the time to really support this, so I’ve done my very best to leave a very good trail so people can help. In the meantime, on a case by case basis I’ve reviewed probably 100+ submissions to fix expressions and made sure they were backported to stable releases always.

We are going to actually mark every package that doesn’t build as broken at the time of 20.03 release. But things not working at runtime is something we don’t have the infrastructure to determine automatically.

6 Likes

I think this is a key thing and should be done (and puts little additional load on the person executing the top-level change).

Many nixpkgs maintainers are very eager to fix things; if they are notified of the problems ahead of time, they can help fix things before the next release.

7 Likes

Sorry I understand your frustration but please remember that Nix is a volunteer project and we use it for free. Sometimes we may send a few PRs but that’s voluntary too and nothing compared to what we get back. I’d urge you to kindly reflect upon your actions and not demand much of the team, most of whom are some of the most incredibly helpful people I’ve met. We don’t want another “actix” like situation.

Also I’ve often hit the issue and worked around by pinning versions. Maybe you could try that if it works for your situation

1 Like

IMHO the real problem is that many, way too many, packages are not really maintained. Maybe the maintainer i still around but stopped using the package, stopped using NixOS altogether or is on NixOS stable and isn’t aware the package is broken on unstable for some time.

A breaking change on unstable wouldn’t be a problem at all if every package were maintained. Maintainers would fix their small (hopefully) set of packages and the problem would be quickly gone.

Many nixpkgs maintainers are very eager to fix things; if they are notified of the problems ahead of time, they can help fix things before the next release.

A first step in this direction would be to re-enable emails from hydra to the maintainer of a package when compilation used to work but now fails.

Yes, please, reenable the notification! I don’t mind if it spam my inbox: I want to know if my packages are broken. Maybe send a notification only once a week but send it.

6 Likes

I feel “stateful” notifications could be a mild improvement: a single email per “failure”. If a package fails due to the same reason afterwards, no need to resend an email.

Or better yet, create a single github issue and “@” maintainers. If there exists one, don’t create another…

2 Likes

I agree with @rkoe that there is some amount of responsibility attached to doing structural changes to nixpkgs.

A good example of somebody that took this at heart was @bhipple who changed how Rust crates are being fetched. First he created a WIP RP to discuss the change, then wrote a plan on how to migrate all of nixpkgs to the new fetcher, then updated all the rust packages in a multi-step plan. He also kept an eye on failing packages in the meantime while nixpkgs kept moving forward. This to me is the ideal amount of care that structural changes should get.

I don’t know the details about the Qt-wrapping changes and I don’t think that assigning blame is especially useful. Sometimes it is not possible to fix everything by yourself as well but at least all the maintainers of the affected packages should be @ mentioned on the PR.

9 Likes

I believe it was handled the best it could be. Shortly after we realized things weren’t packaged correctly I asked that https://github.com/NixOS/nixpkgs/issues/65399 could be opened. All documentation was updated and the new section was even linked there. There were several plans on how things could have been fixed in a semi-automatic way, but I’m really under the impression that @ttuegel is time limited when it comes to maintenance tasks. So for that I’m pretty thankful in the first place that we have Qt wrappers at all. Hopefully we can get all these packages fixed in 20.09 by finishing https://github.com/NixOS/nixpkgs/pull/70691, and actually deprecating the qt specific deriver.

And on this note, it would be great if we could get more people to help with the QT framework in nixpkgs https://github.com/NixOS/nixpkgs/issues/74355.

5 Likes

I think this is the issue here. The ticket that you mention basically says that if one didn’t use this non-standard Qt5 version of mkDerivation that used to be a no-op, but now had become very-very important, then the fault is their. It’s no surprise that it rubs people wrong way when they are told that they didn’t read the 70 thousand word manual well enough. Meanwhile we can’t merge a sorting for our maintainer list without implementing a CI first https://github.com/NixOS/nixpkgs/pull/77784.

Having said that, I do appreciate the great work that was put in by @ttuegel and others to fix the original issue with Qt in nixpkgs.

Yeah, that does come off very RTFM, but *shrug it is one of the first things in the qt section on how to package a qt app. It probably could have been populated with the relevant/helpful info.

I’m fairly new here, so forgive me if I’m off-topic and barging in an already-much-discussed topic, but personally I would much prefer “including some stale/broken packages” to “reducing the size of nixpkgs”. A stale derivation is a starting point for contributing a fix, and I’d rather look into that than having to hunt for other sources of derivations.

There is of course gray area here: while you can’t expect the core package maintainer to fix all downstream derivations, there’s some responsibility there to think about the downstream impact and not just randomly break things. I didn’t follow this particular change, but my impression is that that is exactly what happened. Of course it is always ‘better’ to ‘go even further’, but you have to stop somewhere.

I’m not sure more ‘formal’ quality control would have helped here, as the change appears to have been made pretty deliberately, understanding that it would break a certain class of downstream packages. Perhaps the impact was underestimated, but I don’t have the impression that it was made ‘thoughtlessly and ignorantly’.

Perhaps you could look into running https://github.com/NixOS/nixpkgs/pull/70691 and taking inventory of what’s still broken?

On a more positive note, I’ve been tracking unstable since October, and only once encountered a package that did not work for which a fix wasn’t already underway. There even already was a github issue about it. From my experience NixOS is doing pretty well here.

3 Likes

Since my previous shorter post was well received, I’d like to expand on my view of the “ideal process”. My goal is to clearly define the expectations for unstable (or master) as follows:

  • failing builds are not acceptable
  • but things being marked broken is (though it is avoided when possible)
  • and runtime issues might happen

You could call this “perpetual ZHF”. I think that is desirable, and we could achieve it with the following process:

New Changes

  1. Test your change with nixpkgs-review and any manual tests that you deem necessary. In any case, nixpkgs-bisect should be the bare minimum requirement.

  2. If nixpkgs-bisect reports any breakage, notify the maintainers of the broken packages (this happens before merge). Optionally you can fix it yourself, but there is no expectation to do so. For this step it is essential that master is already in a “ZHF” state.

  3. From the time of notification, give people at least a week to fix the build of their packages. After that, mark the still-failing build as broken. A week might be a bit short here, but the goal is to give maintainers opportunity to fix things while also not delaying changes unnecessarily. The “broken” status is not permanent, maintainers can still come back after two weeks and fix their package.

  4. Merge.

Breakage Detected

Mistakes happen. Things might break at runtime.

  1. Identify a breakage, git-bisect if necessary.

  2. Revert the commit that caused it first. I think we need to normalize reverts more. If anything breaks anything, we should get master back to a working state ASAP and figure out a better solution afterwards. It should not be taken as any sort of offense or accusation of the original commit author, just part of the normal process. Mistakes happen.

  3. Create an issue, pinging the original commit author.

  4. Restart at step 1 of “New Changes”. The responsibility for the process goes back to the original commit author, not the one doing the revert.

Big Changes

The process would need to be amended a bit for large rebuilds, i.e. everything that goes through staging. Here we should test reasonably-many reverse dependencies, depending on the complexity of a change. If we’re adding a comment to the build script of gcc, not much testing is needed. If there’s any breakage after all. If any unexpected breakage is caught later in the process, follow “Breakage Detected”. On high-complexity, high-rebuild changes someone should create a dedicated hydra jobset. Some community effort might be needed to avoid too much strain on the current staging maintainers, but that’s a separate topic.

Qt-Example

I’m not familiar with the details here. But let’s assume the Qt example falls into the category where no breakage was expected, but breakage was discovered at runtime. Following my suggestion, we would have immediately reverted the change. In the newly created issue, we might have come up with https://github.com/NixOS/nixpkgs/pull/70691. That would have enabled us to identify the breaking packages, notify maintainers and continue with the “New Changes” process.

Let me clarify again that I’m not blaming anybody for acting differently here. We don’t have a consensus currently. But I think we’d be better of if we did.

4 Likes

Saying that we need to run nixpkgs-review all the time is, in my
opinion, a mistake. We have to either make that automatic, or not
require it, or it will be forgotten. (Also, we should provide an
interactive script somewhere that just does all the things that the PR
template requires.)

Making it automatic on the developer’s laptop is not reasonable, due to
the mass-rebuilds that can occur, and for which another process would be
required.

But we already have a set of machines that does these mass-rebuilds, and
it has a feature to automatically ping the maintainers of packages that
are newly broken: hydra.

Does anyone remember why the automatic emails have been disabled, and
why they haven’t been reenabled since? There has been one spam incident
once where hydra started to send thousands of emails to everyone, but as
far as I know that was the exception and not the rule, and it brought
more advantages than disadvantages overall – I’m pretty sure just
reenabling it would help a lot to keep nixpkgs not-broken.

5 Likes

Reverting is fine when there is large amount of breakage but in reality it is often not so black and white. Large fraction of breakages I encounter is breakage of a few leaf packages – it is sad that it broke some packages but reverting the breaking change would be actually a change for worse.

The Qt breakage is actually one such example. Before the introduction of wrappers, every Qt package was broken. If people were lucky and had homogenous environment, it worked for them, but I marked several issues as duplicate every week:

The solution Thomas chose was to make the breakage explicit and give maintainers a way to finally fix the long standing issue. We cannot blame him for not having enough time to also fix the world.

12 Likes

I believe this would be a good process if NixOS had a hybrid structure, where nixos-unstable became an actual rolling distro. Which I feel might be strenuous on the structure of a mono-repo (how should that even look like exactly? more branches can turn into more work). I’m not sure how I feel about keeping nixos-unstable in such a primed state and calling it “unstable”.

I do agree that the “one defined moment” of zero hydra failure initiative isn’t working as good, but I believe that is because nixos-unstable moves so fast and your package failing to build has no notification. The people that are aware of these things quickly are using nixos-unstable and tracking development delicately.

1 Like

And in the case of the Qt applications, the build would still be successful, it’s only when running the application would you see the error.

I disagree. Automating it would be much nicer, but I don’t think its a requirement. There’s a checkbox. There’s not really an excuse to forget it and accidentally check the box. And if it does happen every once in a while, there’s the revert option. It doesn’t have to work perfectly to be an improvement.

Yes, in that case there’s staging.

I agree that we should have hydra notifications (https://github.com/NixOS/nixos-org-configurations/issues/51). But those should be a last line of defense. Fixing my packages after they were broken on master is like putting out a fire. It’s stressful for me and annoying for the users. Fixing them with a week’s notice and additional context is much nicer.

I’m not saying we should block indefinitely because of leaf packages, just let it sit for a week. The status quo has been working until know, it will work for a couple more days. Of course there will be special cases, and maybe the QT thing was such a special case (I don’t have much context), but this still applies to most cases.

My suggestion would not require him to fix the world. It clearly defines who has which responsibilities, and where those responsibilities end. The author of a PR has to

  • make the change
  • evaluate its impact
  • notify impacted maintainers

He does not have to fix everything.

I already use nixos-unstable as a rolling distro and I think a lot of people do. I’m not sure what you mean by more strenuous or why more branches would be needed. I don’t think the name is much of an argument :wink:

Yes, in that case someone would notice the runtime breakage. They’d then revert bisect the issue, revert the change and ping the author. They can then come up with a build-time test (as was done for the QT change) and ping maintainers.

2 Likes

OT: It’s so nice that one grumpy message from @rkoe, after some rocky start, turned into a real discussion. In a lot of other communities this thread could have become a flame war :slight_smile:

10 Likes

It’s not too late. :slight_smile:

5 Likes

Tried to semi-automatically fix some low-hanging fruit, please take a look if you have spare capacity for code review:

4 Likes

First of all: Sorry for the wording – I was quite frustrated, then. I do not want to insult, annoy or displease anyone, and I do not want to devalue others work. Sorry for that.
I really love the technical concepts of NixOS and I’m thankful for the work many people are putting into it. Thanks for that.

The QA-problems nevertheless still exist (and I’ve experienced it again with the recent 20.03 upgrade). I’m really interested in a solution here and am willing to help.
Unfortunately, I did not have time recently to think more about this, but now I’ll soon try to offer a concept for discussion how to solve these issues.

2 Likes