Since my previous shorter post was well received, I’d like to expand on my view of the “ideal process”. My goal is to clearly define the expectations for
master) as follows:
- failing builds are not acceptable
- but things being marked broken is (though it is avoided when possible)
- and runtime issues might happen
You could call this “perpetual ZHF”. I think that is desirable, and we could achieve it with the following process:
Test your change with
nixpkgs-review and any manual tests that you deem necessary. In any case,
nixpkgs-bisect should be the bare minimum requirement.
nixpkgs-bisect reports any breakage, notify the maintainers of the broken packages (this happens before merge). Optionally you can fix it yourself, but there is no expectation to do so. For this step it is essential that master is already in a “ZHF” state.
From the time of notification, give people at least a week to fix the build of their packages. After that, mark the still-failing build as broken. A week might be a bit short here, but the goal is to give maintainers opportunity to fix things while also not delaying changes unnecessarily. The “broken” status is not permanent, maintainers can still come back after two weeks and fix their package.
Mistakes happen. Things might break at runtime.
Identify a breakage,
git-bisect if necessary.
Revert the commit that caused it first. I think we need to normalize reverts more. If anything breaks anything, we should get master back to a working state ASAP and figure out a better solution afterwards. It should not be taken as any sort of offense or accusation of the original commit author, just part of the normal process. Mistakes happen.
Create an issue, pinging the original commit author.
Restart at step 1 of “New Changes”. The responsibility for the process goes back to the original commit author, not the one doing the revert.
The process would need to be amended a bit for large rebuilds, i.e. everything that goes through staging. Here we should test reasonably-many reverse dependencies, depending on the complexity of a change. If we’re adding a comment to the build script of
gcc, not much testing is needed. If there’s any breakage after all. If any unexpected breakage is caught later in the process, follow “Breakage Detected”. On high-complexity, high-rebuild changes someone should create a dedicated hydra jobset. Some community effort might be needed to avoid too much strain on the current staging maintainers, but that’s a separate topic.
I’m not familiar with the details here. But let’s assume the Qt example falls into the category where no breakage was expected, but breakage was discovered at runtime. Following my suggestion, we would have immediately reverted the change. In the newly created issue, we might have come up with https://github.com/NixOS/nixpkgs/pull/70691. That would have enabled us to identify the breaking packages, notify maintainers and continue with the “New Changes” process.
Let me clarify again that I’m not blaming anybody for acting differently here. We don’t have a consensus currently. But I think we’d be better of if we did.