Well, this merger bot can have a component to collect this data, I don’t think it is mutually exclusive to build a merge bot and have the data flow inside the merge bot even if this merge bot ends up just running in dry-run, we want to have a tool that can be improved continuously and it makes sense to me to colocate data collection towards better merging decisions with the thing that will make the merging decisions.
I have spent some time studying the problem and I think there are some interesting problems, namely that GitHub API with its current ratelimits is not suited to collect this type of data (in reasonable timeframes) and can cause issue if it’s not coordinated with other people. Also, there’s not only collecting data on “PRs reviewed by maintainers” but also there is a big big chunk of work on how we can clone and copy the GitHub data in another data warehouse to make use of it, for e.g. mergebot but also other things.
Therefore, I am in favor of a dry-run mode based on this work even though I agree there was serious issues that you raised and are not OK.
I know and I agree with you that getting faster merges would require solving: faster CI, better CI, more tests, proper automation, etc. What I don’t agree is when you say that “the current situation shows that PRs are not that reviewed by maintainers themselves for ryantm updates”, I think it’s also a complicated argument: how do we know that the situation is not caused by the fact that we do not have “merge button pushers” in a reasonable timeframe (e.g. to avoid merge conflicts, etc.) and that maintainers that have been here (but not committers) for the last N years have grown tired of the situation and gave up on giving those courtesy (or not) approvals. Therefore, we cannot conclude that not having maintainers approvals is indeed the first thing we need to go after.
Maybe, we need to talk with the maintainers userbase and understand what would make their life easier to start approving those PRs or any PRs related to their stuff.
Here’s a suggestion that I think would be a useful dependency for the mergebot:
Reviewer recommendation and management, i.e. have the bot select randomly relevant reviewers to the PR, ask them to review in a reasonable timeframe, if nothing happens, select new reviewers and have some algorithm to escalate until it gets to some very active contributor that will answer and say “OK, this is cannot be reviewed, let’s close this or here’s what you can do to drive your change in a place where we can let the normal review process happens”. This thing doesn’t happen because I believe we can see on the Discourse a lot of instances of people saying they feel confused and abandoned in the whole Nixpkgs contribution process when they are only maintainer-level, also because we have so many informal ways to bump the priority for a review, bump the priority for a merge, etc.
Of course, that does not solve “increased reliability” and more tests coverage and all that stuff.
But by having it in dry-run, maybe posting the information in some other website so we don’t pollute GitHub, we can already have things like “Oh OK, this PR could have benefited from X or Y” and we can analyze better the merge checks we want, the automation we want for reviewer selection | action item recommendation for the user | etc.
In the end, I am really sorry for the people who are starting to get more and more annoyed of this whole debacle, I truly think there was no bad intention but an unfortunate excessive eagerness, I would like to understand if there was a way to move forward this work in an interesting way for both parties involved and that is respectful of both parties. I would like to understand @delroth and anyone who disagrees with the dry-run idea if there is anything we could provide in a reasonable timeframe that would make it okay to enable the bot with dry-run mode.
In dry-run mode, I’d like to explore:
- correlation between mergebot checks (OK, this could be merged by user request) and actual quality of the PR
- extracts them as test cases or models we can use to develop further the mergebot
And behind those two items, there is already a lot of stuff involved and related to actual infrastructure constraints, GitHub won’t give you a way to download the 100K of PRs we had and having a bot out there that can start collecting this data for the new stuff and maybe have the same people behind the mergebot write something that will collect the old stuff and make it available to everyone out of GitHub closed garden, would be, IMHO, for me, freaking awesome.
So personally, I would like to find a way to unblock/enable them to serve the community via this role and I still very much agree that any production mode and operations has to go through community discussions and has to collect a serious amount of consensus and we have to ensure that even the dry-run mode has the properties we want and is not disruptive for anything.