Most contributors to nixpkgs are probably well-aware of the huge open PR count on nixpkgs. I’ve decided to gather some statistics to determine how bad the situation really is.
Here you can see all PRs grouped by state over time:
what’s the net rate of open PRs? it seems that for a long time, PR count has never really decreased, and I feel like having some concrete numbers would help confirm or deny that. doing so for issues as well would be doubly cool, though please don’t feel obliged to do ither, just something I’m curious about.
Depends on how you define “really”, we have reduced the number of PRs from 6700 to 6200 in the past few weeks, and from 6200 to 5700 a few months ago. Of course, the degree of completeness of the online committers varies at different times. It was relatively complete in the last two weeks.
ah, I wasn’t aware of those ongoing effots. to be clear, I didn’t mean there was no effort, just from my own recollection, I hadn’t remembered ever seeing a substantial reduction in open PRs. I’m very glad to have been proven wrong though, thanks to everyone on the committers team for their hard work!
I think people place way too much importance on this metric. I don’t find it to be of any usefulness other than perhaps a measure of growth.
Let’s take a simplified example to show why I think that way: Let’s say there was exactly one PR opened every hour and each opened PR was resolved (closed or merged) exactly 24h after it was opened.
In this scenario, you’d expect to have 24 open PRs at any given moment because there’d be one PR opened and another one closed in every hour. (After an initialisation period of one day of course.)
If the rate of PRs opened and resolved were to rise to 10 times as much but all of them were still processed and each still processed in the same amount of time, you would naturally expect to see 10 times as many open PRs at any given time too.
IOW the number of open PRs at any given moment is directly proportional to the rate of PRs processed if the time it takes to process any given PR remains constant.
It’s much like real estate vacancy in the real world: The more real estate there is, the more of it will be vacant just by the nature of being in the state transfer between residents.
Just like it’s to be expected for a building to be vacant for a little while after its previous resident has moved out, it’s also to be expected for a PR to take a little while to resolve.
We should therefore not measure our success at processing PRs by how many PRs are open at any given time but rather by how long it takes to process any given PR.
If we are able to keep up, you’d expect the time to remain constant. If we are no longer able to keep up, you’d expect that time to rise, eventually to infinity.
I’d consider it a great success if we managed to keep the time to process each PR constant. It’d be even better to slightly improve it but I’d consider that a “stretch goal” and I’d expect diminishing returns to come into play here quickly.
What I also think is important to consider here is that not all PRs are created equal. Getting a refactor merged quickly is not nearly as important as getting an important security or other bug fix merged quickly.
I’m no statistician, so I can’t be of any help at defining these “time to process a PR” metrics in detail or deduce historical data. I am not ignorant of the fact it that it likely isn’t trivial though.
Setting in motion an effort to rigorously categorise PR complexity and priority using issue/PR labels would be something I could help with and it has been on my TODO list for a while but I haven’t had the spoons yet. This would be a rather cheap way to help us prioritise PRs to get the important ones landed quickly and measure our effectiveness in doing it.
A few months ago (~late July) someone had posted relevant stats, indicating that the average PR was merged in about a day. The average PR also came from r-ryantm, which means it’s mostly trivial stuff getting merged, there wasn’t any info on how the situation was when excluding bot PRs. I searched and couldn’t find that thread now, though I think @Aleksanaa also commented on that thread?
(made with for page in {0..4}; do gh api 'https://api.github.com/orgs/nixos/teams/nixpkgs-committers/members?per_page=100&page='"$page" | jq .[].login -r ; done | xargs printf " -involves:%s")