Yesterday, I had an issue where pango
failed to build on couple-day-old checkout of master because a sha256 hadn’t been updated, so I needed to pull. Luckily, the problem had been fixed.
But it got me thinking about how CI worked for the original Rails app at Groupon, which had over 300 committers. I know, that’s small beans to this, but hear me out…
Basically, if you want code merged to master, you would push it to a branch named merge/master/my-branch
or rebase/master/my-branch
. CI would pick it up, run tests locally on the branch, then fail the branch if the tests failed. So far, this is pretty normal…
But then, it would merge or rebase to a speculative master, and rerun the tests. If the tests failed, it would leave master untouched; otherwise, it would make the new master available. The developers technically had the ability to push directly to master, but that was not a thing normally done.
I’ve been thinking that the obvious drawback of this is that merges to master are serialized based on long-running tests, but I just realized that this isn’t necessarily so…
Since we are only testing branches which have already passed tests, there’s a low probability of failure. This means that it is possible to run a “speculative master”. We stack merges on the speculative master sequentially and test each merge commit in parallel. As the earlier ones pass, we push those commits to the real master. If one breaks, this does invalidate the tests of any commits stacked on top of it (even if passed?), and the offending merge is removed, and we restart on the speculative master with those remaining jobs.
Thoughts?