New Merge Policy for an always-green hydra

I think we need to challenge this assumption. Blocking security updates for a long time doesn’t seem like the right solution to merging in big changes. I think the approach we should take is marking the failed packages as broken. This way the channel is not blocked. Of course I don’t think we want to mark half of the tree broken so we will need to continue to do something like staging for large changes, then merge (with marking packages broken) once we think it is at an “acceptable level of breakage”.

There are definitely blockers to this. For example right now even marking packages as broken is a lot of work. AFAIK there is no automated way to do this. I think making this more automated is probably one of the first dependencies of implementing this project.

This sounds very similar to my proposal! The major difference that I see is that you are proposing cherry-picking whereas I was only considering fast-forward style (so you only get merged if everyone in front of you also passes). However I don’t think this is a major difference as if your commit is green then your dependencies are probably fine to merge (maybe not independently, but apparently there are some fixes somewhere in the chain so we can still merge)

To me “acceptable levels of broken” has to at least include “hydra will publish the channel”. Otherwise this still ends up blocking security fixes (as the cherry-pick won’t be able to publish)


I hope to find some time to write up some points here. It is clear that there will be a number of dependencies to get to a merge-queue style setup however they don’t seem insurmountable to me. I think the first step will be to identify the blocking issues then we can start discussing possible solutions for each of those independently. I think even if this whole project doesn’t complete the dependencies can be useful in themselves.

3 Likes

There are a benefits, starting from the lowest hanging fruit and going higher:

The first is simple: fewer mass rebuilds if we change one core package and the output is the same.

The second is more subtle: if we do like a stdenv change that shouldn’t change most things, we can speculatively do an all-parallel mass rebuild where we substitute the old versions of dependencies. If nothing in fact changed, we’re done!

Thirdly, we should have a mechanism like “runtime-only deps” so that packages just see headers and not SO contents at build time (e.g. with TAPI file, map file, linker script, etc.). Then any update which doesn’t change headers also should be OK.

And from there other long tail of tricks we can do with increasing difficulty to make things more incremental


I do hope we can get to a point where CI automatically bisects or even builds everything commit “it’s not rocket style” with this stuff. It will save soooo many human hours.

5 Likes

That would be incredibly useful.

However, people will need to be disciplined about their git history, otherwise there will be false positives.

Oh I meant every commit ought to pass. We can use Git’s --first-parent (or a slightly fancier variations that understand special branches like staging-next) to skip the noise from individual PRs.

1 Like

Hi everyone, I’m sorry for the delay but after the discussion settled I have finally found time to write up a requirements document. The idea here is to gather all of the concerns raised here into an easy-to-read list so that further RFCs can reference them. I have read through the comments here a couple of times but please leave a comment on the document if I missed something or you have thought of something else.

If you are interested in participating send me an email at kevincox@kevincox.ca (Warning, I will be sending a message to everyone who responds so the email you send from will be shared. Do we have a better place for organizing these things?) The next step will be attempting to break down the requirements into a set of smaller projects that can be run through the Nix RFC process and it would be helpful if we could share the load of writing those plans.

1 Like

One more difficulty, assuming something like the “make change, then ping maintainers to fix resulting breakage” approach: Many packages have no formal maintainer. We need to decide who is responsible. One approach would be to go down the graph to the nearest maintained dependency(s).

If you want further input, there is a lot of interesting discussion (including this point) in PLEASE fix ALL packages which were broken by the Qt-wrapping-changes. Its a long thread though.

4 Likes

This is a good point to bring up. I think we should probably go the other direction though. I think that if a package is broken we should try contacting the maintainers of the dependents next. Because their packages will be transitively broken if the issue isn’t resolved. Also pushing packages upstream is a recipe for having the maintainers of “core” libraries and tools slowly getting more and more packages pushed upon them and them getting overloaded. I don’t want the maintainers of our most critical derivations to become a dumping ground for unmaintained packages.

This does raise the question about leaf packages. We can consider doing the following things:

  • Don’t let them block and other changes, they quickly get marked as broken.
  • Notify anyone who has changed the file.
    • We might want some systematic way to filter out tree-wide changes.
  • Notify the maintainers of dependencies or other related packages and ask if they are interested in maintaining the package.

But at the end of the day if the maintainer of a leaf package leaves I think we just need to let it wither (marked as broken) and we should probably periodically sweep broken packages from the repo.

This also doesn’t specify how we determine if a maintainer has abandoned a package. It would be nice to have some sort of automation around this as well. However if we see they have been inactive for an extended period of time (1 month?) we can remove them from packages. Worst case they are on an extended offline period and can revert the PR to claim their packages back.

I think PLEASE fix ALL packages which were broken by the Qt-wrapping-changes - #20 by timokau is the right call. I think that we need to define some policy around this (submitting an RFC is high on my dependency list for the merge queue). I think marking things is broken makes sense and would want to define some sort of standard workflow around this. For example take the following option as a base example for changes that break more dependents than you can, or wish to fix yourself.

  1. The breaking PR is reviewed, but not merged. (The merge queue bot wouldn’t allow it anyways)
    • The PR branch is now “the staging branch” for this breaking change.
  2. Once it is approved maintainers of any broken dependents are notified.
    • They are expected to respond to the breakage. They should create a fix and merge it to master, if it is backwards compatible, or to the staging branch.
  3. After at least 1 week the any unfixed packages are marked as broken.
  4. At this point the staging branch should be “green” and is merged.

To me this sounds like acceptable overhead for the “parent maintainer” assuming that we can mostly (or completely) automate sending notifications and marking packages broken.

3 Likes

Yes, that’s what I had in mind (though not what I said :laughing:). Its still not entirely trivial, since you would have to decide which “reverse maintainers” are responsible (which might be a lot). This is not a blocker though, we would just need to decide on something.

I think this is the way to go. If somebody cares about the package, they can then adopt it after they notice it has been marked as broken and doesn’t have a maintainer.

We need more defined guidelines for package inclusion might interest you.

1 Like

I’m glad we are in agreement and it would be great to have somebody actually work on this. I would recommend to decide on a minimal, useful subset of the process first and ideally also develop some tooling for it. Then we could trial it on a volunteer basis and set it in stone afterwards. I imagine the success probability of such an incremental approach is a bit higher than an all-out RFC. Your call of course, if you want to work on this.

1 Like

It’s me again.

I have broken down a list of policies that we need to agree upon and changes that need to be made in order for us to get the the state where a merge queue system would be feasible. Right now I basically made a list of documents and put in rough summaries for what I think needs to be done/resolved by each document. I hope to eventually complete the write ups and submit them though the Nix RFC process.

If you are interested in helping you can take up any of the following:

  1. Review started documents: While the biggest review will come from the nix RFC process early feedback would be very helpful. Simply leave comments or send me an email. If you log into Dropbox you should be able to subscribe to changes in the Overview document. This will allow you to know when new documents are ready for review.
  2. Ownership of an RFC: If there is a topic that you are passionate about I would be happy if someone else could take ownership. There is a lot of work required to move us towards a merge queue and I don’t have loads of time so all help is appreciated.
  3. Suggest missing RFCs: Do you think that the proposed list of RFCs is insufficient to resolve the requirements? Let me know so that we can figure out how to address it.
1 Like

This is a very interesting post, I wish I was aware of it when it was happening. There are some points that I wish I could have made but they are a bit off topic for this thread. However on topic is that they say that it would be useful to identify idle maintainers. I strongly agree, partially because robots enforcing policy often cause less offense than humans doing the same but mostly because it can be hard for any one person to notice a maintainer is idle. I think that we should have a policy like “a package marked as broken or insecure for 90 days will be deleted”. This is something that I want to address with nixpkgs Merge Queue - Unmaintained Package Removal.

2 Likes

Just in case, I support permanently marking ghostscript as insecure and oppose its deletion.

(It is clearly unlikely to be safe for malicious files in forseeable future, and it is also quite a useful tool for processing files of known-benign origin)

Removing the source code for packages is a whole different can of worms that has also been discussed on discourse a couple of times already. Unfortunately I can’t think of a specific thread right now. Personally I don’t see much gain in it, but some loss (better discoverability of the work, easier to fix up a broken package than creating a new one). In any case, I would tackle one policy change at a time and discuss that separately later :slight_smile:

I don’t think you need RFCs for tools. There isn’t really anything to decide/agree/disagree there. If we agree on the policy, then somebody just has to implement the tools to make it feasible. The tools could even be developed without any RFC, as long as their use is optional.

As I said, I’d postpone the “removal” bit for now, since that seems unrelated to the general goal. The “arch specific” and “flaky” policies seem to be necessary (if I’m not missing something?) to come to any conclusion on the general “breaking change policy”, so those should probably be merged.

1 Like

timokau Great contributor
September 25 |

  • | - |

I don’t think you need RFCs for tools. There isn’t really anything to decide/agree/disagree there. If we agree on the policy, then somebody just has to implement the tools to make it feasible. The tools could even be developed without any RFC, as long as their use is optional.

The thing about these tools is that they will have a lot of policy encoded in them. Like how to notify maintainers and similar. This is why I think it makes sense to discuss beforehand. Maybe they won’t actually be put to the nixos/rfcs repo but I figured they would require some discussion. I also think the most value will come if the tools are not optional (run automatically) so that is another reason we would want some sort of RFC.

As I said, I’d postpone the “removal” bit for now, since that seems unrelated to the general goal.

Fair enough, I don’t think it is actually blocking (once marked broken they aren’t a problem for the merge queue) so we can postpone that discussion to another time.

The “arch specific” and “flaky” policies seem to be necessary (if I’m not missing something?) to come to any conclusion on the general “breaking change policy”, so those should probably be merged.

Do you be merged into one document? The reason that I put them separate is because I think the arch-specific solution will require something technical while the “flaky” will likely be a more social problem. If we be able to solve them with the same solution all the better and we can merge them.

I think the policy (how to notify etc.) belongs into the policy RFC. People will be hesitant to accept an under-specified RFC without those policy details anyway. I agree that those tools would work best if mandatory, but they could still be trialled and proven in an optional manner. If you’re willing to do the work, its your call how to go about it of course :slight_smile:

If I understand it correctly, the breaking change policy requires an always-green master as a pre-requisite. If that is true, its not really possible to proceed without covering the flaky/arch-specific problems at the same time.

1 Like

Regarding the problem of bisecting which commit caused a failure, wouldn’t something like this

cd nixpkgs
git checkout $knownFailing
newHash=$(nix-instantiate . -A $package)
git bisect start $knownFailing $knownWorking
git bisect run sh -c "nix-instantiate . -A $package | grep -v $newHash"

that simply checks which commit modified the package (or its dependencies) at all, cover 99% of all cases? How often are there many changes to a package in a single batch?

On the other hand, if that weren’t the case, I guess the batching wouldn’t gain much?

(Sorry for the slight necro)

1 Like

Yeah. I think something like that is what we would want to do. We could
probably even just check what changes affect the failed derivations.
Then we can treat the commits where they changed as “bisection” points.

I was hoping that we could use Bors but it doesn’t appear to support the
logic to do this. We may need to add the feature or write our own
tooling: Optimized bisection - Support - Bors Forum

Sorry for the delay everyone but life got in the way. I would love to pick this back up by submitting the first document to the nix RFC process. In order to do that I need a co-author to help with the RFC. I would appreciate a volunteer.

2 Likes

You do not need a co-author to submit an RFC.

1 Like