A CI merge strategy

eraserhd · October 30, 2019, 4:16pm

Yesterday, I had an issue where pango failed to build on couple-day-old checkout of master because a sha256 hadn’t been updated, so I needed to pull. Luckily, the problem had been fixed.

But it got me thinking about how CI worked for the original Rails app at Groupon, which had over 300 committers. I know, that’s small beans to this, but hear me out…

Basically, if you want code merged to master, you would push it to a branch named merge/master/my-branch or rebase/master/my-branch. CI would pick it up, run tests locally on the branch, then fail the branch if the tests failed. So far, this is pretty normal…

But then, it would merge or rebase to a speculative master, and rerun the tests. If the tests failed, it would leave master untouched; otherwise, it would make the new master available. The developers technically had the ability to push directly to master, but that was not a thing normally done.

I’ve been thinking that the obvious drawback of this is that merges to master are serialized based on long-running tests, but I just realized that this isn’t necessarily so…

Since we are only testing branches which have already passed tests, there’s a low probability of failure. This means that it is possible to run a “speculative master”. We stack merges on the speculative master sequentially and test each merge commit in parallel. As the earlier ones pass, we push those commits to the real master. If one breaks, this does invalidate the tests of any commits stacked on top of it (even if passed?), and the offending merge is removed, and we restart on the speculative master with those remaining jobs.

Thoughts?

deliciouslytyped · November 2, 2019, 12:32pm

I was talking to someone at NixCon who raised a similar point, I can’t remember who it was though…

layus · November 4, 2019, 2:09pm

I guess it means integrating Zuul with Hydra ? Or possibly with Hercules CI ? That would be tremendous ;-). And I have no idea how hard this can be. We could get insights from openstack infrastructure if need be.

matklad · November 4, 2019, 10:48pm

In the rust community, this is known as the “not rocket science rule”: graydon2 | technicalities: "not rocket science" (the story of monotone and bors). A possible implementation is https://bors.tech/.

jonringer · November 5, 2019, 7:00pm

This was raised during the maintainer-bot rfc, and some other discussions. There was concern that someone could just edit the tests along with a popular package to upload malicious packages.

Although there could probably be an intersection here, maybe a bot could analyze your expression, and if it’s a simple version bump and sha change. Then it should be able to build the package (and maybe build all the dependency like nix-review?) and then merge when successful.

I think this problem is just harder in nix because you can have one package that affects 1,000’s of others, and in most cases you want someone reviewing the changes before they get committed.

misuzu · February 17, 2020, 10:39am

I think “master that never breaks” is ortogonal to reviewing process. With something like op suggested we can update unstable channel more frequently without blocking on some failing tests.