Stats/trends on github issues&prs

Hi,

@musicmatze mentioned on Mastodon that we recently hit 4000 open nixpkgs issues, which made me curious about the trends in those numbers. I wrote a quick script to get them from the API and break them down per month, giving:

Issues

The below graph shows cumulative open issues (blue, positive) and closed issues (purple, negative) over time.

So last month was really good (a spike in opened issues, but an even larger spike in closed ones), but overall both seem to grow pretty linearly.

(columns are: opened in that month, closed in that month, difference, cumulative open, cumulative closed)

https://arnout.engelen.eu/issues.gnumeric

PRs

Actually I think PR’s are a more interesting metric. it’s kinda fine when obscure issues hang around for a long time, while a contribution going unmerged is more of a problem.

Since nixpkgs accepts so many PR’s I cut the graph in 2 so we can see trends more clearly, but keep in mind the scales are different:

https://arnout.engelen.eu/prs.gnumeric

The number of open PR’s is growing at what seems like slightly faster than linear rate, likewise for the number of closed PR’s.

That is somewhat problematic. I guess the solution space is mostly:

  • make it easier for contributors without merge rights to participate in reviewing
  • lower the risk of merging something that shouldn’t have been merged

More automated checks and tests (as people have been discussing and implementing) might help for both of these. What do you think beyond this?

8 Likes

An interesting metric would be to also match PRs with requested changes, PRs which need changes due to merge conflicts, and PRs that are WIP/Drafts.

There is a bunch of PRs that are open-but-technically-unactionable, which pad the numbers a bit.

I’m not saying they, by themselves, account for the upward trend, but they are a factor.

2 Likes

I think the major thing that nixpkgs is missing is quick triaging. There’s a lot of PRs that could be merged if they followed CONTRIBUTING.md and the nixpkgs manual more closely. It’s very time consuming to review ~20 PRs on small items, and then to have them and 30 other PRs get pushed onto my “notifications” backlog.

I tried to start with some “educational” videos on nixpkgs on the PR process:

Another issue that there’s no consensus as to what is “good enough” for nixpkgs. So I’ve also seen many times where an author gets pulled by 3 different reviewers.

Ultimately, we just need more people reviewing PRs (not necessarily committers, but it can be frustrating to have a PR polished and no committer to push the button). But the work is largely unrewarding (unless it affects you), and can be very time-intensive.

12 Likes

There is also RFC30, which aimed to make the status of PRs clearer.
https://github.com/NixOS/rfcs/pull/30

1 Like

And there’s @timokau which is working on a bot that should help us manage PRs state more verbosely:

1 Like

Another problem is that many maintainers seem to be inactive. Since they have added themselves to a package, they are probably interested in the software and possibly stakeholders, so they are the ideal reviewers. But my feeling is that the majority of PRs (I may be wrong here, it would be nice to get some statistics) are never reviewed by the maintainers that ‘own’ the package.

I often feel a bit reluctant to review updates to such packages, since I don’t want to get in the maintainer’s way. I now usually look when they last committed and if it’s a longer time ago I’ll review the PR anyway.

I am not sure what the solution to this is, but it would be nice if we could convince maintainers to stay active over longer periods. It would be nice have an idea why they became inactive. Do they not use nixpkgs/NixOS anymore? Did they lose interest in the packages and forgot to remove themselves? Do they find getting their reviewed PRs merged to frustrating?

1 Like

It would be great if it was possible to “subscribe” to a package more dynamically. Interest doesn’t map 1:1 with the git history. Sometimes I package a thing because I want to go in a direction, then loose interest.

If there was a tool that would go over all the packages that I use and publish those somewhere, I would be more than fine with it. It could then be used for all sorts of things; package popularity, notify of security issues, and let me know when a package that I use has an issue.

4 Likes

I wonder if we could extend hydra to make this happen, you can register to follow failures of certain package names, and get emails if there’s a failure on a jobset/branch of your choice.

The other issue/question becomes, how is this different than the current meta.maintainers, and why do we have two similar but different abstractions for package maintenance.

1 Like

Once upon a time (until ~March 2018), this new feature you want was actually the definition of meta.maintainers. It has been disabled for reasons I am certainly not going to be the best at explaining but, I think, boil down to “too many false positives and potentially buggy implementation.”

One idea that has been spitballed is to make one RSS feed per package so that people would be able to subscribe to the ones of interest to them, an idea which would have less catastrophic failure modes than emails (I personally received 194 emails on 2018-03-11, the day it’s been disabled, and AFAIR it stopped at 194 only because someone manually killed postfix)… but it needs someone actually implementing it.

1 Like

It’s been brought up by multiple people that weren’t aware that a package they maintained was broken, and would have liked to have been notified.

I think there is a need for maintainers to be aware of breakages. How it’s implemented I think is the hardest question. I’m not a big fan of using email either, as my personal email is listed under my maintainer entry, and the first thing i would probably due is make a rule to put all the failed notifications in a directory I’ll rarely or never touch.

Edit:
However, if i was able to visit nixpkgs:trunk on hydra, and it told me which of the “still failing packages” were mine, I think that would be useful.

4 Likes

How about a single GitHub issue for (package, current version, problem) tuple automatically created by builder?

  1. Check if the issue exists, if open or closed (implying false alarm), do nothing.

  2. If doesn’t exist, create and add maintainers as notifyees.

  3. Remove maintainers as watchers immediately.

3 is important because it’s too much noise (which you can expect to see on the issue from causual observers looking to get their problem solved).

The tuple could even be just (package, current version).

The goal is to make only so much noise that a maintainer notices. Given the deluge of emails, it might be better to not notify the maintainers than notify too much.

I’m not the biggest fan of leveraging github issues. My github notifications are already increasing at a unmanageable rate being a python codeowner, anymore and I’m likely to backburner even more PRs.

Not to mention that nixpkgs issues is already at 4000+ open issues

4 Likes

i’m a newcomer to nix, maintainer of three packages today, but for a long time maintainer of just one package. when some months ago i first received an email requesting a review to a pr updating that package, i was in doubt if i really should post a review or if that request was an automatic message not really addressed to me.
since i’m new on this community, with few contributions and no built trust, i was doubtful if my review would be either useful or expected. i had believed that only the reviews of committers or experienced users would be considered. just when i read some posts/documentations asking to even new users make reviews, i was sure to make one.
i cannot speak on behalf of other maintainers, but at least for me, if i had sooner received some message or read in someplace that my reviews would be useful even i being a newcomer, i would had started to post reviews to my packages earlier.
Maybe we could make this information more explicit the newcomers.

6 Likes

Thanks for the work. People raising the question of high count on different channels almost always met with “it’s because repo is too active, X amount of issues closed just last month” argument. Looks like the trend reflects the feeling.

At the time of writing around 12.5% of all issues were automatically generated by @ckauhaus, and about 7.5% of all PR’s were generated by r-ryantm. Another 6% of issues are packaging requests.

The Grafana offset diffs are useful to get a sense of net flow:

https://status.nixos.org/grafana/d/nUq1ufyZz/github-issues-and-prs-offset-diff?orgId=1&refresh=30s&from=now-6M&to=now

as well as the totals:

https://status.nixos.org/grafana/d/v-86aB-Zz/github-issues-and-prs?orgId=1&refresh=30s&from=now-6M&to=now

2 Likes

I wonder if it makes sense to give maintainer reviews more prominence by adding a “accepted-by-maintainer” label once one of the maintainers has submitted an approving review. First of all, such a label would signal that reviews by maintainers are appreciated. Secondly, such a label may result in quicker merges, since they indicate that a PRs is already validated by a domain expert of the package.

9 Likes

Maybe it is just time to acknowledge that github does not scale for nixpkgs anymore.

1 Like

Or maybe we could flip this around and say:

Maybe the mostly-organic volunteer-based nixpkgs process doesn’t scale with the ease of github contributions.

3 Likes

Not sure if this is directly relevant but there is also a factor of “cleanliness begets cleanliness” I think?

I wonder if it would help if, say, only actual (or potentual) bugs existed on GitHub and questions and feature /packaging requests were suggested to be posted here or programmatically migrated here,?

Does GitHub have a way for admins to set a “default view” for issues and PRs? It might also help to only have bugs show in the default view?

Or perhaps close issue tracker to everyone but actual commiters (+ experts ) and redirect people to discourse for everything… Verified problems can be graduated to actual bugs on GitHub by people with permissions. (GitHub doesn’t have this feature though… , Or a way to limit issue opening at all, I will research if amy other bug-tracker has restrictions, although here the relative lack of usage of other platforms might act in favor of the concept )