Statistics on nixpkgs PRs over time

Most contributors to nixpkgs are probably well-aware of the huge open PR count on nixpkgs. I’ve decided to gather some statistics to determine how bad the situation really is.
Here you can see all PRs grouped by state over time:

PRs created each month grouped by (current) state, relative to total:

About 90% of PRs are resolved in less than six weeks.

You can inspect the plots yourself at Github Pull Request Statistics. I’ll publish the source code for the analysis tool soon.

21 Likes

What dataset did you use/how did you acquire this data?

1 Like

I acquired the data from the official Github API. You can check out the full program here: GitHub - FliegendeWurst/pr-stats

5 Likes

what’s the net rate of open PRs? it seems that for a long time, PR count has never really decreased, and I feel like having some concrete numbers would help confirm or deny that. doing so for issues as well would be doubly cool, though please don’t feel obliged to do ither, just something I’m curious about.

Depends on how you define “really”, we have reduced the number of PRs from 6700 to 6200 in the past few weeks, and from 6200 to 5700 a few months ago. Of course, the degree of completeness of the online committers varies at different times. It was relatively complete in the last two weeks.

5 Likes

ah, I wasn’t aware of those ongoing effots. to be clear, I didn’t mean there was no effort, just from my own recollection, I hadn’t remembered ever seeing a substantial reduction in open PRs. I’m very glad to have been proven wrong though, thanks to everyone on the committers team for their hard work!

2 Likes

I think people place way too much importance on this metric. I don’t find it to be of any usefulness other than perhaps a measure of growth.

Let’s take a simplified example to show why I think that way: Let’s say there was exactly one PR opened every hour and each opened PR was resolved (closed or merged) exactly 24h after it was opened.
In this scenario, you’d expect to have 24 open PRs at any given moment because there’d be one PR opened and another one closed in every hour. (After an initialisation period of one day of course.)
If the rate of PRs opened and resolved were to rise to 10 times as much but all of them were still processed and each still processed in the same amount of time, you would naturally expect to see 10 times as many open PRs at any given time too.

IOW the number of open PRs at any given moment is directly proportional to the rate of PRs processed if the time it takes to process any given PR remains constant.

It’s much like real estate vacancy in the real world: The more real estate there is, the more of it will be vacant just by the nature of being in the state transfer between residents.

Just like it’s to be expected for a building to be vacant for a little while after its previous resident has moved out, it’s also to be expected for a PR to take a little while to resolve.

We should therefore not measure our success at processing PRs by how many PRs are open at any given time but rather by how long it takes to process any given PR.
If we are able to keep up, you’d expect the time to remain constant. If we are no longer able to keep up, you’d expect that time to rise, eventually to infinity.

I’d consider it a great success if we managed to keep the time to process each PR constant. It’d be even better to slightly improve it but I’d consider that a “stretch goal” and I’d expect diminishing returns to come into play here quickly.

What I also think is important to consider here is that not all PRs are created equal. Getting a refactor merged quickly is not nearly as important as getting an important security or other bug fix merged quickly.

I’m no statistician, so I can’t be of any help at defining these “time to process a PR” metrics in detail or deduce historical data. I am not ignorant of the fact it that it likely isn’t trivial though.

Setting in motion an effort to rigorously categorise PR complexity and priority using issue/PR labels would be something I could help with and it has been on my TODO list for a while but I haven’t had the spoons yet. This would be a rather cheap way to help us prioritise PRs to get the important ones landed quickly and measure our effectiveness in doing it.

6 Likes

A few months ago (~late July) someone had posted relevant stats, indicating that the average PR was merged in about a day. The average PR also came from r-ryantm, which means it’s mostly trivial stuff getting merged, there wasn’t any info on how the situation was when excluding bot PRs. I searched and couldn’t find that thread now, though I think @Aleksanaa also commented on that thread?

Here is the number of days required to merge X% of PRs created each year (only counting PRs ever merged, including bot PRs).

Year 25% 50% 75% 90% 95% 98%
2012 0 0 4 13 22 34
2013 0 0 2 9 19 44
2014 0 0 2 12 30 52
2015 0 0 2 10 25 71
2016 0 0 2 8 18 48
2017 0 0 2 11 28 81
2018 0 0 3 11 26 65
2019 0 1 5 21 47 124
2020 0 0 5 23 58 138
2021 0 0 3 13 31 78
2022 0 0 4 17 40 100
2023 0 0 2 13 31 86
2024 0 0 4 13 27 55
8 Likes

One interesting statistics is the rate of PRs being essentially ignored by committers.

https://github.com/NixOS/nixpkgs/pulls?q=is%3Apr+is%3Aopen++-involves%3Azimbatm+-involves%3Athoughtpolice+-involves%3Aryantm+-involves%3Amarcusramberg+-involves%3Ajwiegley+-involves%3Akirillrdy+-involves%3Arickynils+-involves%3Amatthewbauer+-involves%3Agarbas+-involves%3Amkaito+-involves%3Abbigras+-involves%3Apeti+-involves%3Atfc+-involves%3Ajoachifm+-involves%3Aprusnak+-involves%3Athefloweringash+-involves%3Aphiliptaron+-involves%3Aedef1c+-involves%3Aedwtjo+-involves%3Awmertens+-involves%3ANobbZ+-involves%3Abaloo+-involves%3Atomfitzhenry+-involves%3Aadisbladis+-involves%3Atalyz+-involves%3Acdepillabout+-involves%3Aviric+-involves%3Astigtsp+-involves%3Agrahamc+-involves%3ApSub+-involves%3Akalbasit+-involves%3Adavidak+-involves%3AMic92+-involves%3Aasymmetric+-involves%3Ammahut+-involves%3Amadjar+-involves%3Azakame+-involves%3Afpletz+-involves%3Afabaff+-involves%3Aandresilva+-involves%3Aacowley+-involves%3Adomenkozar+-involves%3Amweinelt+-involves%3Araboof+-involves%3Abjornfor+-involves%3Apbsds+-involves%3Ar-vdp+-involves%3ADanielSidhion+-involves%3Awizeman+-involves%3Atomberek+-involves%3Aflokli+-involves%3Aviraptor+-involves%3Aaszlig+-involves%3Abendlas+-involves%3ADamienCassou+-involves%3Asrhb+-involves%3Aanthonyroussel+-involves%3Asamuela+-involves%3Asmancill+-involves%3Acollares+-involves%3Aveprbl+-involves%3Alukegb+-involves%3Adrupol+-involves%3Alovek323+-involves%3Alilyinstarlight+-involves%3Aturion+-involves%3Afabianhjr+-involves%3Agebner+-involves%3ARaitoBezarius+-involves%3Acolemickens+-involves%3Abachp+-involves%3Ajbedo+-involves%3Akira-bruneau+-involves%3AK900+-involves%3Adywedir+-involves%3Anh2+-involves%3AEkleog+-involves%3Amatthiasbeyer+-involves%3ANickHu+-involves%3Aetu+-involves%3Aimincik+-involves%3Ashlevy+-involves%3Akevincox+-involves%3Aroberth+-involves%3Aehmry+-involves%3Attuegel+-involves%3Aofflinehacker+-involves%3Aaforemny+-involves%3ALassulus+-involves%3ANeQuissimus+-involves%3Aarianvp+-involves%3Alayus+-involves%3Aandir+-involves%3AyorickvP+-involves%3Ajnsgruk+-involves%3Asikmir+-involves%3Ajtojnar+-involves%3Apeterhoeg+-involves%3Alsix+-involves%3ALeSuisse+-involves%3Aonny+-involves%3Arycee+-involves%3Arisicle+-involves%3AdtzWill+-involves%3Athiagokokada+-involves%3Aexpipiplus1+-involves%3Ataku0+-involves%3Ateto+-involves%3Akubukoz+-involves%3ARaghavSood+-involves%3AEricson2314+-involves%3AZimmi48+-involves%3Aedolstra+-involves%3Asvanderburg+-involves%3Anbraud+-involves%3Aabbradar+-involves%3Adguibert+-involves%3Aminijackson+-involves%3Aromildo+-involves%3Apicnoir+-involves%3Aajs124+-involves%3Ajokogr+-involves%3AGabriella439+-involves%3AStunkymonkey+-involves%3AElvishJerricco+-involves%3Arasendubi+-involves%3Abenley+-involves%3Ahrdinka+-involves%3Aglobin+-involves%3Ackauhaus+-involves%3Atoonn+-involves%3Arhendric+-involves%3Aulrikstrid+-involves%3Alukateras+-involves%3Amaralorn+-involves%3Akhaneliman+-involves%3Avcunat+-involves%3AJulienMalka+-involves%3Ayurrriq+-involves%3Agador+-involves%3A7c6f434c+-involves%3Aadamcstephens+-involves%3Abhipple+-involves%3Amattpolzin+-involves%3AFRidh+-involves%3AturboMaCk+-involves%3Ajagajaga+-involves%3Aalexfmpe+-involves%3Adonovanglover+-involves%3Avbgl+-involves%3Aalyssais+-involves%3Arnhmjoj+-involves%3Acorngood+-involves%3AProfpatsch+-involves%3Asternenseemann+-involves%3Abennofs+-involves%3Afelschr+-involves%3Anlewo+-involves%3Atimokau+-involves%3Atjni+-involves%3AConnorBaker+-involves%3Ancfavier+-involves%3AdasJ+-involves%3Awillcohen+-involves%3AJohnAZoidberg+-involves%3Ahappysalada+-involves%3AMa27+-involves%3Acab404+-involves%3Africklerhandwerk+-involves%3Alf-+-involves%3Adotlambda+-involves%3ASynthetica9+-involves%3Athufschmitt+-involves%3Alovesegfault+-involves%3ASuperSandro2000+-involves%3Areckenrode+-involves%3Aprimeos+-involves%3Amkg20001+-involves%3Aaanderse+-involves%3Afgaz+-involves%3Asiraben+-involves%3A0x4A6F+-involves%3Awegank+-involves%3ASomeoneSerge+-involves%3AMindavi+-involves%3Adoronbehar+-involves%3Aleona-ya+-involves%3Aerictapen+-involves%3AScrumplex+-involves%3Aambroisie+-involves%3Asymphorien+-involves%3ABr1ght0ne+-involves%3Aemilytrau+-involves%3Apiegamesde+-involves%3Atie+-involves%3Ah7x4+-involves%3ANickCao+-involves%3ANetaliDev+-involves%3ASebTM+-involves%3Abalsoft+-involves%3Aemilazy+-involves%3AAtemu+-involves%3ARossComputerGuy+-involves%3Abobby285271+-involves%3AWilliButz+-involves%3Ainfinisil+-involves%3Aazahi+-involves%3Awineee+-involves%3AOPNA2608+-involves%3Ac0bw3b+-involves%3Anatsukium+-involves%3Ack3d+-involves%3AGrahamcOfBorg+-involves%3AMatthewCroughan+-involves%3Amarkuskowa+-involves%3Acole-h+-involves%3Alegendofmiracles+-involves%3AGaetanLepage+-involves%3Auri-canva+-involves%3Ajopejoe1+-involves%3Afigsoda+-involves%3Aorivej-nixos+-involves%3AIvarWithoutBones+-involves%3AAleksanaa+-involves%3Akatexochen+-involves%3AJohnRTitor+-involves%3Ar-burns+-involves%3Aemilylange+-involves%3AKranzes+-involves%3AArtturin+-involves%3Azowoq+-involves%3Anikstur+-involves%3ATredwellGit+-involves%3ATomaSajt+-involves%3Atomodachi94+-involves%3Ayayayayaka+-involves%3Ajian-lin+-involves%3Aabysssol+-involves%3Awinterqt+-involves%3AJanik-Haag+-involves%3Apennae+-involves%3Ayu-re-ka+-involves%3Acafkafk+-involves%3Auninsane++-involves%3Ar-ryantm+-label%3A"2.status%3A+merge+conflict"+-label%3A"2.status%3A+stale"

(made with for page in {0..4}; do gh api 'https://api.github.com/orgs/nixos/teams/nixpkgs-committers/members?per_page=100&page='"$page" | jq .[].login -r ; done | xargs printf " -involves:%s")

2 Likes

Also interesting could be how effective asking for review really is

https://github.com/NixOS/nixpkgs/pulls?q=is%3Apr+is%3Aopen+"https%3A%2F%2Fdiscourse.nixos.org%2Ft%2Fprs-ready-for-review%2F"

or how effective the checkboxes in the PR template are

https://github.com/NixOS/nixpkgs/pulls?q=is%3Apr+is%3Aopen+"-+[x]+Tested+basic+functionality+of+all+binary+files"

3 Likes

https://ossinsight.io/analyze/NixOS/nixpkgs#pull-requests

3 Likes

To me, these numbers don’t show anything bad really, at least not by themselves.

I’d like to see more clearly articulated norms around time-to-review and time-to-merge.

For instance, @hexa chastises me in this PR for failing to give time for code owners to sign off. (I think this is fair; I’m using it for illustration.)

The core problem is the coordination and expectations.

  1. Who’s “supposed” to review? In what time period?
  2. Who’s “supposed” to merge? In what time period?
  3. When is self-merging your own PR rude or unacceptable? When is it acceptable? (@K900 does this often, and I think nixpkgs is better for it.)
  4. When is merging without a GitHub PR :white_check_mark: acceptance acceptable?
  5. When is merging with a GitHub PR :x: request changes acceptable? How long to wait for the original reviewer to dismiss their own review?

Most of these things have to do with timing. “Indefinitely” is a terrible answer for project cohesion and fun. No work gets done in the Sejm if there’s a Liberum Veto. But “immediately” is also a bad answer, because it means the first person to say yes (who likely has not thought that much about it!) determines the outcome for the project.

What I hope we can agree on and write down somewhere is norms for Nixpkgs in this area. Here are a few I would like to see.

  1. PR authors stating who they expect to review their patches. “Do not merge unless X approves.”

  2. PR reviewers stating time-to-merge expectations. “I will merge this sometime today unless someone objects strongly.” Doubly so for anything that’s being self-merged.

  3. PR reviewers using the Request Changes status to note what’s blocking. “I request changes for reason 1, 2, and 3; every other comment is not blocking.”

  4. If a reasonable person would believe that the requested changes have been implemented, other PR reviewers should feel free to dismiss their review status.

What do others think?

4 Likes

I don’t think self-merges are particularly necessary, if the PR was that urgent, then someone else would’ve noticed and merged. If it languishes, so be it… Self-merging with no approvals are the worst, it means the author got no feedback and is certain to break something that won’t get caught for (possibly) 4+ days.

Though I’ve tended to dimiss request-change nits as soon as I see them, as they clearly detract from contribution. Same goes for concerns that were addressed.

I think that is a pretty strong word for essentially asking for more time for codeowners in the future.

That being said, I’m all for self-merges for people who’ve done their due dilligence.

4 Likes

more time for codeowners in the future.

I think that’s the rub. For me, it’s not clear what “more time” actually means, and it’s also not clear who exactly is the codeowner from your perspective. According to ci/OWNERS, I’m an owner of some of the code in the PR! Yet it’s clear you didn’t see me as such. (Which is fine!) My commitment to nixpkgs wanes and waxes based on other time commitments, so there are months where I’m less responsive.

That mismatch in expectations drives my desire for clearer norms.

What is “more time”? When has it elapsed?
Who is the code owner? When does the timer on their attention run out?
Who should have merged the PR, from your perspective?

Again, I think your “Hey, please adjust your behavior” message is both appropriate, kind, and relevant. No worries on “strong words” from my side.

1 Like

The substantial code in that PR was python related, so the relevant codeowner I was in fact waiting on was natsukium.

That is a good question. Certainly not before 24 hours, rather some time after 48 hours.

Ultimately I think of “code ownership” as a tool to make sure that maintainers get a chance to opine on the things that they spend a lot of time on.

3 Likes