Operations: It’s possible to deploy our own instance fairly easily, or use the SaaS version. The tool is overall well maintained and relatively hackable. If we go the self-hosted route, most of the work is related to maintaining the database.
The best part about the tool is how it uses a “staging” branch to push the bundle, and then relies on the GitHub CI status to determine its success. This is quite elegant how it decouples the CI and mergebot.
Overall I can see this transform how we interact with the repo. A tool like ofborg can check the PR maintainers and give them merge access through Bors. This would allow maintainers into a self-service mode and be more independent. And also encourage more people to take maintainership of packages.
It seems to me that bundling + bisection is critical if we want complete testing for nixpkgs. As I understand we only have enough resources to do 1 complete test per day. Is your concern the output that Bors provides is noisy and confusing or are you opposed to the batching feature in general? And if you are opposed to batching how do you think it could be made to work for nixpkgs?
My other question is that right now Bors-NG doesn’t support any sort of “clever” bisection. It simply runs the full CI process from scratch each time. Since resources are limited in valuable in our case we would probably want something smarter like ensuring that it first attempts to build the failing packages. Or even some sort of “splitter” that performs evaluations to determine what commits affected the failed derivations. (But maybe a “prebuild” of just the affected derivations combined with negative result caching would be enough).
I haven’t heard of Homu until today, but it seems to have a severe lack of user documentation. It seems like the Bors-NG community is much more alive.
I am not sure where we want to run the builds. If the build coordinator is separate from Hydra, we could try to allow some negative caching (remember failing builds as failing) with maybe manual requests to invalidate cache if something actually transient (which looks permanent to Nix) fails.
Indeed, Bors-whichever also sounds nice for automerge.
I expect that at least some of the questions I listed about Mergify will eventually be asked before we can get wrte access for a merge bot, but I hope they should be easier to answer for Bors-NG!
(note that I have very low medium-term expectations about not having a branch with churn on the level of our master and no aspirations of truly always green, but will also support a cleaner workflow for tool-assisted fast-track reverts for changes identified as too breaking to keep)
I think we would want to use Hydra for the builds. It is very good at building Nix things and already holds the resources. Basically this would be shifting the daily nixpkgs and nixos builds to daily (or as quickly as possible) “merge queue” builds.
I haven’t looked far into what that integration would take though. I think the basics would just work because hydra can build a branch and Bors will just point its staging branch at whatever it wants built. However I don’t know what we need to do to get smart bisection.
I wonder if we could benefit here from having a way to document «this auto-test is more than what manual testing typically includes even when it is done». On the other hand, if the merges are done by maintainers, hopefully maintainers already have know what a working build looks like and can judge test usefulness on the spot.
It’s worth noting that I believe that we can still use Bors for automerge with the github checks that we have today. (We just need to ensure that they support running on branches as well as pull requests). We can add the full-evaluation tests later.
However it is unclear to me if Bors support some of the maintainer-policy features we wanted. Such as allowing maintainers to approve updates that affect only their packages. Is this something that we need from Bors, or should we just add a bot on top? I see Bors has bors delegate+ but this can’t be revoked (and even if it would would probably be racy). However we can maybe have another bot that looks for an “automerge” label and adds the bors merge command if it determines that the requirements have been met. (For example it is by the author)
I am not entirely convinced that the complexity of bundling outweighs the benefits. But it’s not a blocker for using the tool. Bundling is quite fundamental in the tool’s design and not likely to go away. I just wanted to point that out.
Also, bundling is not always beneficial. When only 1 PR is in the queue it adds no benefit. And I think that the triaging in a failure case is not strictly bisecting and can lead to N+(N-1) attempts. Vs guarantees N in a straight queue.
One of the confusion related to that is that Bors will post a message whenever a build fails. When it is bisecting a bundle it can lead to multiple messages for a single PR. As a user it also makes it a bit harder to understand what is happening with a given PR. If it was just a straight queue then it would just be somewhere in the index of that queue. With bundling the ordering can change.
Anyways, I don’t mean to paint a bad light. My main motivation it to make the drawbacks apparent in the discussion.
The game is to add enough checks on the PR level, to make it more likely that the PR will get merged. And then tune the amount of computation needed on the “staging” branch.
One important thing I didn’t mention is that Bors will timeout if no CI checks have been returned on the “staging” branch. When the timeout happens, it rejects the PR. The default is fairly low and will likely have to be bumped for our use-case.
It’s been a long time. I remember doing a few UI fixes and that was fairly easy to do. The bisection algorithm is deeper in the code base and more likely to be harder to change.
I tend to conclude, though that this particular steam engine’s endeavour need be decoupled from such more involved discussions about the trust model. I even suggested to interested parties to constitute a SIG Trust model to develop a general and overall consistent perspective on the topic which SIG Workflow automation happily could process as an input.
Yes, single-comment «do something that will lead to eventual automatic merge if CI and builds and whatever pass, and the top commit is still [manually specified or what was going though the checks for the last minute]» provided to committers would already be better than what we have now, and would also be a foundation to build the non-committer maintainer functionality.
I am not entirely convinced that the complexity of bundling outweighs the benefits
I would love to believe that we can run ~100 full builds a day. I don’t think that is the case though.
Bundling is quite fundamental in the tool’s design and not likely to go away
It seems to me that this is the main point of the tool. The other review delegation just seems to be a small feature, but maybe I’m wrong.
Also, bundling is not always beneficial
Yes, batching can be an issue if you have a high failure rate. If the second commit of each batch is “bad” then I think your math is right. Of course if the bisection tries are faster than regular runs it also mitigates this.
That does sound like a questionable design choice. Maybe we can raise an issue with the developers? It seems like this is unactionable noise.
For sure. Failures in the merge queue are expected to be rare. We should have relatively cheep pre-merge checks that catch most issues. Right now I think our pre-merge CI is fairly good and for large changes bors has a try command to run the full CI before merging.
@zimbatm Would you be willing to present this idea of it’s own right as a “I have an idea” to the SIG Workflow automation? (the Geistesblitz: prefix is part of an intended meta-branding of structured discussion)