Announcing nixpkgs-merge-bot

piegames · October 18, 2023, 1:05pm

I’d like to see this go live as soon as possible but without giving the bot actual merge rights yet. Instead, continue relying on existing committers to do the actual merge.

While this won’t reduce the workload at the moment, it gives us a safe way to explore different rule sets and gather experience.

delroth · October 18, 2023, 1:25pm

Could you list a few examples of what you’d hope to explore and learn from by running this bot in dry-run mode, and how running the bot in dry-run mode helps in said exploration/learning?

Let’s exclude things related to the operation and implementation of the bot itself from that list, since that’s kind of obvious (any new infrastructure needs some mileage to figure out edge cases) but also a circular problem (we don’t need the merge bot to be flawless and bug free if we don’t have a use case for it).

There’s already a ton of data to look at re: PRs reviewed by maintainers, and I haven’t seen any evidence so far that anyone has really studied it. Do we expect running the bot to collect different data that can’t already be easily derived from nixpkgs history?

Infinisil · October 18, 2023, 1:46pm

To be fair, the argument that it wouldn’t be very useful because not many maintainers are currently writing accepting reviews for r-ryantm PR’s doesn’t really work out entirely, because such a bot changes the dynamics:

Maintainers would be incentivised to make sure the updateScript exists and works, and to review the PRs from r-ryantm, because this is how you can get your package updates fast and without needing any extra person.

Avoiding the extra step of requiring a committer hit the merge button might seem small, but it can change the incentives considerably. So I’m generally in favor of having such a r-ryantm-limited merge bot.

delroth · October 18, 2023, 1:49pm

Given that this is already true except for the “without needing any extra person” part (which to me seems like an implementation detail of “fast”, not an actual goal?) - can we better advertise this?

@ryantm wdyt about adding some text to nixpkgs-update PRs explicitly reminding maintainers to use the GH approval feature, saying that “this will lead to your PR merge getting prioritized”?

(Using this footer to note once more that I’m not opposed to a bot that merges r-ryantm PRs approved by the appropriate maintainer.)

RaitoBezarius · October 18, 2023, 1:53pm

Well, this merger bot can have a component to collect this data, I don’t think it is mutually exclusive to build a merge bot and have the data flow inside the merge bot even if this merge bot ends up just running in dry-run, we want to have a tool that can be improved continuously and it makes sense to me to colocate data collection towards better merging decisions with the thing that will make the merging decisions.

I have spent some time studying the problem and I think there are some interesting problems, namely that GitHub API with its current ratelimits is not suited to collect this type of data (in reasonable timeframes) and can cause issue if it’s not coordinated with other people. Also, there’s not only collecting data on “PRs reviewed by maintainers” but also there is a big big chunk of work on how we can clone and copy the GitHub data in another data warehouse to make use of it, for e.g. mergebot but also other things.

Therefore, I am in favor of a dry-run mode based on this work even though I agree there was serious issues that you raised and are not OK.

I know and I agree with you that getting faster merges would require solving: faster CI, better CI, more tests, proper automation, etc. What I don’t agree is when you say that “the current situation shows that PRs are not that reviewed by maintainers themselves for ryantm updates”, I think it’s also a complicated argument: how do we know that the situation is not caused by the fact that we do not have “merge button pushers” in a reasonable timeframe (e.g. to avoid merge conflicts, etc.) and that maintainers that have been here (but not committers) for the last N years have grown tired of the situation and gave up on giving those courtesy (or not) approvals. Therefore, we cannot conclude that not having maintainers approvals is indeed the first thing we need to go after.

Maybe, we need to talk with the maintainers userbase and understand what would make their life easier to start approving those PRs or any PRs related to their stuff.

Here’s a suggestion that I think would be a useful dependency for the mergebot:

Reviewer recommendation and management, i.e. have the bot select randomly relevant reviewers to the PR, ask them to review in a reasonable timeframe, if nothing happens, select new reviewers and have some algorithm to escalate until it gets to some very active contributor that will answer and say “OK, this is cannot be reviewed, let’s close this or here’s what you can do to drive your change in a place where we can let the normal review process happens”. This thing doesn’t happen because I believe we can see on the Discourse a lot of instances of people saying they feel confused and abandoned in the whole Nixpkgs contribution process when they are only maintainer-level, also because we have so many informal ways to bump the priority for a review, bump the priority for a merge, etc.

Of course, that does not solve “increased reliability” and more tests coverage and all that stuff.

But by having it in dry-run, maybe posting the information in some other website so we don’t pollute GitHub, we can already have things like “Oh OK, this PR could have benefited from X or Y” and we can analyze better the merge checks we want, the automation we want for reviewer selection | action item recommendation for the user | etc.

In the end, I am really sorry for the people who are starting to get more and more annoyed of this whole debacle, I truly think there was no bad intention but an unfortunate excessive eagerness, I would like to understand if there was a way to move forward this work in an interesting way for both parties involved and that is respectful of both parties. I would like to understand @delroth and anyone who disagrees with the dry-run idea if there is anything we could provide in a reasonable timeframe that would make it okay to enable the bot with dry-run mode.

In dry-run mode, I’d like to explore:

correlation between mergebot checks (OK, this could be merged by user request) and actual quality of the PR
extracts them as test cases or models we can use to develop further the mergebot

And behind those two items, there is already a lot of stuff involved and related to actual infrastructure constraints, GitHub won’t give you a way to download the 100K of PRs we had and having a bot out there that can start collecting this data for the new stuff and maybe have the same people behind the mergebot write something that will collect the old stuff and make it available to everyone out of GitHub closed garden, would be, IMHO, for me, freaking awesome.

So personally, I would like to find a way to unblock/enable them to serve the community via this role and I still very much agree that any production mode and operations has to go through community discussions and has to collect a serious amount of consensus and we have to ensure that even the dry-run mode has the properties we want and is not disruptive for anything.

hmenke · October 18, 2023, 1:55pm

Not a great idea. Homebrew scrapped their automated system completely after it was compromised.

delroth · October 18, 2023, 2:06pm

To be completely honest: because at this point I’ve lost trust in any process that would stop this from going from dry-run-mode to active without consensus. The team behind this change seems happy to bypass all the mechanisms that exist currently in our community to build consensus and collect feedback, in favor of just going with their preferred solution. I’m personally not willing to make any more concessions than “auto-merge the r-ryantm PRs approved by maintainer” without seeing an actual plan.

Of course, I have as much authority as any other merger there, which is basically “none”. So you can do with this as much as you want to

How is that not something that can already be done by looking at, for example, a random sample of 100 PRs from Pull requests · NixOS/nixpkgs · GitHub ? Do you expect this would lead to significantly different results? If so, why?

(I think there is an argument to be made that the results could be different - maintainers might put more effort in their reviews with awareness that an approval means a merge. But 1. I’m not convinced that we’ve established a way to build that awareness; 2. I’m not convinced this would make the results significantly different. I suspect looking at the data would help confirm/reject this?)

Running the bot in dry-run mode won’t give you data from 100K of historical PRs either, so this point is entirely moot. Any kind of GH rate limiting is significantly faster than waiting for new data to trickle through live via your new bot.

RaitoBezarius · October 18, 2023, 2:56pm

Well, for starters, this is biased towards “package maintainer”, it doesn’t take into account things like: (a) NixOS module maintenance (b) non-package maintenance which are also very much things we would like to have stories for.

In addition, I think GH queries are not exact (as in : GitHub has been losing our data and sometimes hide it.) so it’s unclear to me that the random sampling is representative of some statistical reality without someone going and verifying we can trust it.

Also, I believe there are many “categories” of PRs and you can classify them into multiple scenarios, e.g. the PR that is blocked because it introduces a subtle thing but no test to test it, etc.

Based on my very quick dabblings in GitHub - RaitoBezarius/understanding-nixpkgs: Python notebooks on nixpkgs statistics a long time ago (which the repo does not reflect the full stuff I wrote, some stuff was not pushed because people were absolutely uninterested when I started working on that), I believe we need to work on the classification of “why is this PR blocked?” and I would not focus only on the “approved by package maintainer” lens, I don’t believe package maintainers are enough to build the confidence we want for a mergebot and I think a lot of folks have valuable inputs while not being package maintainers, e.g. just doing a drive-by change on a PR and showing up in the git blame / git log.

Getting this data (including the Git repository data itself) to analyze if we could had a git-log/git-blame person rather than a package maintainer approve the PR would be interesting to me.

I think you are also bringing the interesting topic of what does that mean for a person to be a package maintainer and what are our expectations for them.

I think I would rather do two things:

figure out an extension of the maintainer model for having a “trusted set of maintainers” which will get authority and responsibilities for package we deem to be critical and we will have different expectations for them
figure out a way to deal with package that are maintained without that level of expectation and figure out a path for them to be reviewed, polished and get to the high level of quality we would want and communicate on how being a simple package maintainer gives you no responsibility (or very very low ones) but also no fast track / no fast pass to merge things and it takes time and you have to accept it.

So we can create incentives for package management to step up and be more professional to a certain extent.

Of course, all of this has to be discussed with community and we have to see if this is the model we want to go for.

No no, but this is the right place to collect and distribute the data IMHO while waiting a better thing.
I am big believer that for such complicated problems we need all the help we can get to do baby steps on the matter and find a way to still make it playful for people to do it, otherwise, everything becomes kind of chores and then… this is a completely different model we will get in.

I think the team behind the change thought this was a uncontroversial change, this is now controversial and requires consensus for many things, including the dry-run mode.

This is now the discussion that stopped this, so IMHO, the process worked.

And you have my work that as long as I am here with them IRL, there won’t be any dry-run-mode → active without consensus from you folks. I don’t think I need to say they were scolded and are now in this position where they’re like “I don’t know how to go from this situation to somewhere where we can restore the lost trust” and I am writing this because harm was fortunately not done but I would beg anyone involved in this to give a second chance to the idea and we work on getting this in a place that makes us all happy and not just people who want the green button being pushed.

I think the “automerge the r-ryantm PR approved by maintainer” is already accepted by the team, they just want to understand if they can already enable dry-run-mode in practice and I believe we will figure out the plan for moving to r-ryantm PR approved by maintainer policy (if people agrees) and then we can also push a plan on how to extend it and this will happen slowly in the next weeks rather than in a rush now.

delroth · October 18, 2023, 3:29pm

Short reply, not point by point, because I have to be gone in a few minutes. Let me try to set up some boundaries that I would be comfortable with re: efforts in this area:

I’m still fairly convinced that useful analysis can be done offline, and that we have the data, just not the tools and analyst time to look at it. But if you want to use an online / “live” tool to collect data, I’d want it to be done by something that preferably doesn’t have access to a token with write permissions to the nixpkgs repo, and definitely not merge permissions. I don’t understand why it should cohabit in a codebase that contains a “merge PR feature” (aka. the merge bot), but I don’t strongly care.
If we do have a bot that has merge rights (e.g. for r-ryantm PRs), I’d strongly prefer if its codebase was not an experiment grounds. Otherwise we can’t easily review changes that go to production, and we increase the risk of introducing bugs and vulnerabilities into something sensitive. So if this shares codebase with whatever you want to run an analysis, the analysis part would be done in a fork, not in the main nixpkgs/ repo.

Are these controversial boundaries to set? IMHO the rationale behind them is fairly simple to understand, and I don’t think they set significant blocking constraints either.

What these boundaries do make clear though is my main source of confusion: why do we suddenly want to put data analysis feature in the same codebase as the same merge bot that would be merging r-ryantm PRs? This seems unnecessary and dangerous to me, especially when I don’t even think that a new collection mechanism is necessarily required to get the data we want (rationale: the conclusions we want to establish on each PR - why it was blocked - have to be mainly established by a human, since the whole point is validating whether the policy we want to enforce via code makes sense, so the bottleneck is humans looking at PRs, and that doesn’t seem like something that’s bottlenecked by tooling or rate limits or whatever else, fwiw).

ACK re: the point on this now being controversial and being paused. Thank you for clarifying this.

However, requiring contributors to drop everything they’re doing to have an emergency debate on merge bots for several hours of their time is not the process “working”. Or at least I hope it isn’t considered to be, because I’ve been spending the last day and a half significantly stressed about this and I’ve had to drop some IRL commitments yesterday evening to handle an emergency that didn’t need to be one.

RaitoBezarius · October 19, 2023, 10:32am

Through private feedback I received, apparently, the GraphQL GitHub API has much higher rate limits and can download everything, I did not try yet but I will explore this now and stand corrected, thank for making me aware of this possibility !

cafkafk · October 19, 2023, 5:04pm

This might be cool, but there’s obviously a security implication, and it might mean people will be less willing to let others co-maintain their packages. I certainly will not be as welcome to having others able to arbitrarily self merge stuff on things. I don’t think it’s wrong to widen the net of trust, but I fear this will just add more friction down the line. Merging a new package would no longer just mean “do we trust that this PR is safe/correct/up to standard”, it would also mean “do we trust this person not to push potentially malicious changes”.

There is an implicit trust that nixpkgs is somewhat verified, and that would be lost if this was implemented. I’d prefer if this self merge privilege was given out with a little more screening, and not just for all maintainers.

RaitoBezarius · October 22, 2023, 12:42pm

Done in Data dump of all nixpkgs pull requests !

Melkor333 · October 28, 2023, 11:36pm

My proposal would be:

start defining some rules - tested on offline data.

Run it in dry-run mode commenting on found PR’s so that the merger can simply if he thinks the bot decision was bad!
merged prs without any other interaction except the merge count as a win.
A merger can and still just merge the PR if they think the rules itself were not enough and it required another element.
every contributor could add a note with something like @bot note that blabla.

If a rule has more than 3 “losses” it autodeactivates → it must be improved.
If a rule has more than maybe 100-200 (or maybe more? depends on the target package-set I guess) wins and no loss: It’s a success!

If there are 5-10 successes, make a RFC.

I think this would allow to start with simple rules and collectively improve on them over time.
Mergers can just opt to ignore them.
It gives very low barrier for contributors to give insights.
It (hopefully) builds trust for the users.
When the RFC happens, people will have seen and hopefully interacted with it.
People would have already experienced the possible benefits (or losses) and can build a better opinion on the RFC. (If they’d only been 'ing, then that’s a good reason to deny the RFC!)

IMO dryrun makes perfect sense for these reasons.

On another note some kind of endorsement system where a committer can e.g. add a line endorse @xx in a comment might help here.

(warning: I have no clue about such “trust systems”, my hope is that people with more experience will pick this up )

this can then be used by above rules. e.g.:

is leave package
has more changes than just version/hash, but only changes to the package itself
no maintainer changes
is approved by a maintainer with X endorsement

endorsement sounds kinda “weak” but if done in proper way (maybe similar to a ELO-system - I still have no clue ) it would probably be roughly as good as the current (afaiu not really defined) system to get the commit bit. And in comparison it’s completely transparent and traceable (it could be tracked in a public git repo which is updated once a day or something the like…)

(obviously rules including endorsement points would have to be monitored at least long enough for the endorsement system to establish)

Sandro · November 13, 2023, 12:34am

This also brings the bittersweet side effect that I now need to choose between auto updates being auto merged and creating PRs manually were I might slightly improve the packaging.
I don’t have an idea how to avoid this and maybe this is the acceptable outcome of this experiment and we agree on that this is alright, I just wanted to share the thought.

IMO security would greatly benefit if we had a small group of people per package area which could auto approve and merge PRs without allowing them to merge everything. IMO then 08/15 package additions to eg pythonPackages could easily be approved by some group of people hopefully bigger than the current group.

paperdigits · June 28, 2024, 5:57pm

Did anything ever come of this? I was looking forward to being able to merge updates for package I maintain.

rhendric · June 28, 2024, 6:52pm

It’s in service: Pull requests · NixOS/nixpkgs · GitHub

paperdigits · July 1, 2024, 6:25pm

oh nice, i missed that somehow, thanks!