Steam Engine: Bors for Merge Train

I wonder if we could benefit here from having a way to document «this auto-test is more than what manual testing typically includes even when it is done». On the other hand, if the merges are done by maintainers, hopefully maintainers already have know what a working build looks like and can judge test usefulness on the spot.

It’s worth noting that I believe that we can still use Bors for automerge with the github checks that we have today. (We just need to ensure that they support running on branches as well as pull requests). We can add the full-evaluation tests later.

However it is unclear to me if Bors support some of the maintainer-policy features we wanted. Such as allowing maintainers to approve updates that affect only their packages. Is this something that we need from Bors, or should we just add a bot on top? I see Bors has bors delegate+ but this can’t be revoked (and even if it would would probably be racy). However we can maybe have another bot that looks for an “automerge” label and adds the bors merge command if it determines that the requirements have been met. (For example it is by the author)

Note that you could end up deploying a different controller machine but share builders and store with Hydra.

Here is an example initial bors configuration:

  • Current committers are allowed to instruct bors to merge. (Maintain status quo)
  • We add a bot that can also instruct bors to merge.

(The status dance avoids a race where the PR is modified as the bors merge comment is being posted)

I am not entirely convinced that the complexity of bundling outweighs the benefits. But it’s not a blocker for using the tool. Bundling is quite fundamental in the tool’s design and not likely to go away. I just wanted to point that out.

Also, bundling is not always beneficial. When only 1 PR is in the queue it adds no benefit. And I think that the triaging in a failure case is not strictly bisecting and can lead to N+(N-1) attempts. Vs guarantees N in a straight queue.

One of the confusion related to that is that Bors will post a message whenever a build fails. When it is bisecting a bundle it can lead to multiple messages for a single PR. As a user it also makes it a bit harder to understand what is happening with a given PR. If it was just a straight queue then it would just be somewhere in the index of that queue. With bundling the ordering can change.

Anyways, I don’t mean to paint a bad light. My main motivation it to make the drawbacks apparent in the discussion.

The game is to add enough checks on the PR level, to make it more likely that the PR will get merged. And then tune the amount of computation needed on the “staging” branch.

One important thing I didn’t mention is that Bors will timeout if no CI checks have been returned on the “staging” branch. When the timeout happens, it rejects the PR. The default is fairly low and will likely have to be bumped for our use-case.

It’s been a long time. I remember doing a few UI fixes and that was fairly easy to do. The bisection algorithm is deeper in the code base and more likely to be harder to change.

Thank you all for the feedback!!! :heart:

This is indeed a problem that also @zimbatm mentioned when we spoke. bors delegate= is sticky. So a malicious actor could hijack an initially inocuous PR easily.

Bors-NG has a vibrant RFC process for exactly this.

I tend to conclude, though that this particular steam engine’s endeavour need be decoupled from such more involved discussions about the trust model. I even suggested to interested parties to constitute a SIG Trust model to develop a general and overall consistent perspective on the topic which SIG Workflow automation happily could process as an input.

Yes, single-comment «do something that will lead to eventual automatic merge if CI and builds and whatever pass, and the top commit is still [manually specified or what was going though the checks for the last minute]» provided to committers would already be better than what we have now, and would also be a foundation to build the non-committer maintainer functionality.

1 Like

I imagine it’s used mainly by the Rust infrastructure team and the documentation may be mainly in their heads.

I am not entirely convinced that the complexity of bundling outweighs the benefits

I would love to believe that we can run ~100 full builds a day. I don’t think that is the case though.

Bundling is quite fundamental in the tool’s design and not likely to go away

It seems to me that this is the main point of the tool. The other review delegation just seems to be a small feature, but maybe I’m wrong.

Also, bundling is not always beneficial

Yes, batching can be an issue if you have a high failure rate. If the second commit of each batch is “bad” then I think your math is right. Of course if the bisection tries are faster than regular runs it also mitigates this.

That does sound like a questionable design choice. Maybe we can raise an issue with the developers? It seems like this is unactionable noise.

For sure. Failures in the merge queue are expected to be rare. We should have relatively cheep pre-merge checks that catch most issues. Right now I think our pre-merge CI is fairly good and for large changes bors has a try command to run the full CI before merging.

Yeah, definitely will :sweat_smile:

1 Like

It would be good to assemble all the use-cases so that we have answers for them, for the different stages of the deployment process.

How I see it, the first stage would be:

  • add bors-ng to the repo
  • and some simplistic CI check on the “bors/staging” branch.
  • add the bors.toml config file with some sensible defaults.

At this point it’s very much an alpha test and only a select few users would be added with capability to merge using bors. Then augment.

In no particular order:

  • extend ofborg to delegate PRs to maintainers.
  • or extend ryantm-updater to delegate PRs to maintainers.
  • move some of the ofborg capability to GitHub Actions if possible.
  • add more CI checks to the “bors/staging” branch.
  • add a bors-staging jobset to Hydra that gets triggered and reports back build statuses. Ideally a “success” status would be something like “builds at least the same number of packages as before”.
  • add tooling to automate marking packages as broken.
2 Likes

That sounds good to me.

This will probably get some drawback. I think this is where the trust model discussion has to happen.

If you mean bors delegate I’m not sure we can do that because we can’t trust that they don’t add more to the PR that is malicious/affects other changes.

It seems that we will need a status-based policy check anyways and at that point we can in theory open up to a wide audience. (Although that depends on how much we trust bors).

What is the goal here?

2 Likes

@zimbatm Would you be willing to present this idea of it’s own right as a “I have an idea” to the SIG Workflow automation? (the Geistesblitz: prefix is part of an intended meta-branding of structured discussion)

Let’s shelve that discussion, for now, it’s a bit out of topic.

2 Likes

Not if its restricted to r-ryantm’s PRs. I like that idea a lot, since it can improve throughput without a lot of hairy trust discussion. Its a nice way to dip our toes in the water.

4 Likes

Ah I see. That makes sense! I was thinking of marvin for some reason :see_no_evil:

As secretary of the SIG, I’m factoring out some of the interesting discussion stumbs from this “Steam Engine” as so called “Geistesblitze”:

Please feel free delivering your own “Geistesblitz” through the templates provided in the footer of the SIG Workflow automation description page. Keeping the discussion going is an important intrinsic motivator for all of us. Thank you all for your contributions! :heart:

Public note to self:

Owing to those damn megalomaniac tendencies of mines, across investigating bors & friends, I was constantly looking into ways to tackle this whole topic from a different, better (a.k.a more megalomaniac) angle, that also happens to be fundamentally in line with other (ad)ventures I’m currently working on.

I stumbled over zeebe.io (no personal stake in it, just a nastily great tool for SIG Workflow automation). For which I recently started working on a <wip> golang DDD code generator </wip> that is/will be able to bootstrap an event driven golang backend based on clean architecture and domain driven design with a very crude OPA policy interface.

After having given away myself by those links as a very terrible programmer (I’m not actually one!), there shall be merits to this thinking. In my chasing mind, those concepts amalgamate:

… amalgamate into a clean ecosystem workflow automation platform that is superior to other solutions by means of leveraging two-birds-one-shot principles wherever possible, such as BPMN2.0 workflow graphics being dual use (doc and implementation) or having policy decided and implemented upon the same PR (OPA’s rego policy language).

Let me know what you think. Isn’t that too megalomaniac, insane & whatever, but actually a nice idea?

I admit that I’m not a NixOS dev, so I don’t truly have a stake in this, but can I give some unwarranted advice?

  1. Trying to make your workflow agnostic to your issue tracker and code review platform is a fool’s errand. I’ve tried. You either use something other than GitHub (like GitLab, Jira, Jenkins, Gerrit, etc), and couple your workflow to that instead, or GitHub-isms leak into it anyway, and you’ve wasted a bunch of time on “platform agnosticism” that doesn’t actually allow you to switch platform easily.

  2. Think of it as if you had a “weirdness budget”, which is really just a proxy for how much time you expect contributors to spend learning your idiosyncratic tooling. They already have to learn Nix. Do you really want to force them to learn your code review system, too, or your very weird way of using them?

Oh thanks for your feedback. I think you’re on the page on advocating for the best possible on-boarding experience. I’d say on-boarding UX continues to be a construction site. I agree with you, that we should use as much github (that is “well known”) for front-end as possible.

As far as the workflow back-end is concerned, an emerging nix security and trust model might require more policy expressiveness than off the shelve tooling might allow us to.