Steam Engine: Bors for Merge Train

Work Package

Over the coming one-two weeks, I’m going to investigate the bors configuration and render a GAP analysis with regard to @kevincox 's draft RFC effort for an always green hydra. I plan to resume my conclusions in a draft bors configuration PR to Nixos/nixpkgs which I will link here.

Further Context (optional)

  • RFC79 about stricter protected branch policy gathered momentum for implementing bors-like
  • @zimbatm generously gave me a mentoring session and helped me articulate a path to getting things done (and not let die off the momentum)
  • @ryantm and @kevincox together with me have tossed around the idea of staging a merge bot test in the wild in combination with r-ryantm update bot
  • @7c6f434c has promised his support for it (I can’t find the comment right now, but I take this as a seriously encouraging signal :wink: )


8 Likes

As the commenter who brought up Bors on the RFC 79 ticket, I feel I should clarify, for those (likely many) who don’t know, that there are actually two or three merge-bots named “Bors”:

  1. There’s the Bors currently used by Rust, which I think is currently an instance of a self-hosted Python program named Homu. This seems to be actively maintained.

  2. There’s the Bors-NG to which you linked, which is a SaaS reimplementation in Elixir. (Their bot has the same GitHub username as the Rust Bors but a different avatar.) This seems to be maintained.

  3. There’s Rust’s original Bors, which is of mainly historical interest nowadays. This has seen two commits since 2014, of which one was a merge commit.

I don’t know much about how these bots work, but I felt I should “head off” any confusion between them.

3 Likes

Here is my experience with Bors-NG (2).

Operations: It’s possible to deploy our own instance fairly easily, or use the SaaS version. The tool is overall well maintained and relatively hackable. If we go the self-hosted route, most of the work is related to maintaining the database.

Usage: One of the biggest outstanding issues is that it only supports merge commits. No squashes of fast-forward merges. In terms of user experience, the most confusing part is the merge train algorithm. It tries to bundle PRs together in order to improve throughput, but that leads to confusion on the state of an individual PR. If the bundle fails, it then goes on to bisect the bundle which leads to more messages being posted. If I would be writing my own bot, I would probably remove that feature.

The best part about the tool is how it uses a “staging” branch to push the bundle, and then relies on the GitHub CI status to determine its success. This is quite elegant how it decouples the CI and mergebot.

Overall I can see this transform how we interact with the repo. A tool like ofborg can check the PR maintainers and give them merge access through Bors. This would allow maintainers into a self-service mode and be more independent. And also encourage more people to take maintainership of packages.

2 Likes

It seems to me that bundling + bisection is critical if we want complete testing for nixpkgs. As I understand we only have enough resources to do 1 complete test per day. Is your concern the output that Bors provides is noisy and confusing or are you opposed to the batching feature in general? And if you are opposed to batching how do you think it could be made to work for nixpkgs?

My other question is that right now Bors-NG doesn’t support any sort of “clever” bisection. It simply runs the full CI process from scratch each time. Since resources are limited in valuable in our case we would probably want something smarter like ensuring that it first attempts to build the failing packages. Or even some sort of “splitter” that performs evaluations to determine what commits affected the failed derivations. (But maybe a “prebuild” of just the affected derivations combined with negative result caching would be enough).

I haven’t heard of Homu until today, but it seems to have a severe lack of user documentation. It seems like the Bors-NG community is much more alive.

1 Like

I also meant to ask if you had any idea how easy it would be to contribute smarter bisection to Bors, since you seem to have at least looked at the code a bit.

I am not sure where we want to run the builds. If the build coordinator is separate from Hydra, we could try to allow some negative caching (remember failing builds as failing) with maybe manual requests to invalidate cache if something actually transient (which looks permanent to Nix) fails.

Indeed, Bors-whichever also sounds nice for automerge.

I expect that at least some of the questions I listed about Mergify will eventually be asked before we can get wrte access for a merge bot, but I hope they should be easier to answer for Bors-NG!

(note that I have very low medium-term expectations about not having a branch with churn on the level of our master and no aspirations of truly always green, but will also support a cleaner workflow for tool-assisted fast-track reverts for changes identified as too breaking to keep)

I think we would want to use Hydra for the builds. It is very good at building Nix things and already holds the resources. Basically this would be shifting the daily nixpkgs and nixos builds to daily (or as quickly as possible) “merge queue” builds.

I haven’t looked far into what that integration would take though. I think the basics would just work because hydra can build a branch and Bors will just point its staging branch at whatever it wants built. However I don’t know what we need to do to get smart bisection.

I wonder if we could benefit here from having a way to document «this auto-test is more than what manual testing typically includes even when it is done». On the other hand, if the merges are done by maintainers, hopefully maintainers already have know what a working build looks like and can judge test usefulness on the spot.

It’s worth noting that I believe that we can still use Bors for automerge with the github checks that we have today. (We just need to ensure that they support running on branches as well as pull requests). We can add the full-evaluation tests later.

However it is unclear to me if Bors support some of the maintainer-policy features we wanted. Such as allowing maintainers to approve updates that affect only their packages. Is this something that we need from Bors, or should we just add a bot on top? I see Bors has bors delegate+ but this can’t be revoked (and even if it would would probably be racy). However we can maybe have another bot that looks for an “automerge” label and adds the bors merge command if it determines that the requirements have been met. (For example it is by the author)

Note that you could end up deploying a different controller machine but share builders and store with Hydra.

Here is an example initial bors configuration:

  • Current committers are allowed to instruct bors to merge. (Maintain status quo)
  • We add a bot that can also instruct bors to merge.
    • Similar to https://github.com/palantir/policy-bot
    • It will determine which derivations have changed and apply a PR status based on the maintainers of the affected packages and the current approvals.
    • Once the policy is reached and the maintainer has approved on github it comments bors merge to queue the PR to the merge train.

(The status dance avoids a race where the PR is modified as the bors merge comment is being posted)

I am not entirely convinced that the complexity of bundling outweighs the benefits. But it’s not a blocker for using the tool. Bundling is quite fundamental in the tool’s design and not likely to go away. I just wanted to point that out.

Also, bundling is not always beneficial. When only 1 PR is in the queue it adds no benefit. And I think that the triaging in a failure case is not strictly bisecting and can lead to N+(N-1) attempts. Vs guarantees N in a straight queue.

One of the confusion related to that is that Bors will post a message whenever a build fails. When it is bisecting a bundle it can lead to multiple messages for a single PR. As a user it also makes it a bit harder to understand what is happening with a given PR. If it was just a straight queue then it would just be somewhere in the index of that queue. With bundling the ordering can change.

Anyways, I don’t mean to paint a bad light. My main motivation it to make the drawbacks apparent in the discussion.

The game is to add enough checks on the PR level, to make it more likely that the PR will get merged. And then tune the amount of computation needed on the “staging” branch.

One important thing I didn’t mention is that Bors will timeout if no CI checks have been returned on the “staging” branch. When the timeout happens, it rejects the PR. The default is fairly low and will likely have to be bumped for our use-case.

It’s been a long time. I remember doing a few UI fixes and that was fairly easy to do. The bisection algorithm is deeper in the code base and more likely to be harder to change.

Thank you all for the feedback!!! :heart:

This is indeed a problem that also @zimbatm mentioned when we spoke. bors delegate= is sticky. So a malicious actor could hijack an initially inocuous PR easily.

Bors-NG has a vibrant RFC process for exactly this.

I tend to conclude, though that this particular steam engine’s endeavour need be decoupled from such more involved discussions about the trust model. I even suggested to interested parties to constitute a SIG Trust model to develop a general and overall consistent perspective on the topic which SIG Workflow automation happily could process as an input.

Yes, single-comment «do something that will lead to eventual automatic merge if CI and builds and whatever pass, and the top commit is still [manually specified or what was going though the checks for the last minute]» provided to committers would already be better than what we have now, and would also be a foundation to build the non-committer maintainer functionality.

1 Like

I imagine it’s used mainly by the Rust infrastructure team and the documentation may be mainly in their heads.

I am not entirely convinced that the complexity of bundling outweighs the benefits

I would love to believe that we can run ~100 full builds a day. I don’t think that is the case though.

Bundling is quite fundamental in the tool’s design and not likely to go away

It seems to me that this is the main point of the tool. The other review delegation just seems to be a small feature, but maybe I’m wrong.

Also, bundling is not always beneficial

Yes, batching can be an issue if you have a high failure rate. If the second commit of each batch is “bad” then I think your math is right. Of course if the bisection tries are faster than regular runs it also mitigates this.

That does sound like a questionable design choice. Maybe we can raise an issue with the developers? It seems like this is unactionable noise.

For sure. Failures in the merge queue are expected to be rare. We should have relatively cheep pre-merge checks that catch most issues. Right now I think our pre-merge CI is fairly good and for large changes bors has a try command to run the full CI before merging.

Yeah, definitely will :sweat_smile:

1 Like

It would be good to assemble all the use-cases so that we have answers for them, for the different stages of the deployment process.

How I see it, the first stage would be:

  • add bors-ng to the repo
  • and some simplistic CI check on the “bors/staging” branch.
  • add the bors.toml config file with some sensible defaults.

At this point it’s very much an alpha test and only a select few users would be added with capability to merge using bors. Then augment.

In no particular order:

  • extend ofborg to delegate PRs to maintainers.
  • or extend ryantm-updater to delegate PRs to maintainers.
  • move some of the ofborg capability to GitHub Actions if possible.
  • add more CI checks to the “bors/staging” branch.
  • add a bors-staging jobset to Hydra that gets triggered and reports back build statuses. Ideally a “success” status would be something like “builds at least the same number of packages as before”.
  • add tooling to automate marking packages as broken.
2 Likes

That sounds good to me.

This will probably get some drawback. I think this is where the trust model discussion has to happen.

If you mean bors delegate I’m not sure we can do that because we can’t trust that they don’t add more to the PR that is malicious/affects other changes.

It seems that we will need a status-based policy check anyways and at that point we can in theory open up to a wide audience. (Although that depends on how much we trust bors).

What is the goal here?

2 Likes

@zimbatm Would you be willing to present this idea of it’s own right as a “I have an idea” to the SIG Workflow automation? (the Geistesblitz: prefix is part of an intended meta-branding of structured discussion)