How many people are paid to work on Nix/Nixpkgs?

kvtb · October 7, 2020, 9:41pm

This is probably the main reason that I’m still a ‘user’ and not a ‘contributor’.
Other than the hurdle of having to create a Github account to contribute, just seeing the number 2.3k pull requests feels as a ‘red flag’

ryantm · October 7, 2020, 10:49pm

I think people thinking these numbers are big are mistaken! We are about to cross 100k issues + PRs. So another way to look at it is 4.4k + 2.3k outstanding out of 100k equals only 6.7% outstanding issues. Another way to look at is we merged 1,711 PRs in the last month, so the backlog is only about 1 month big.

Also, don’t forget that NixOS is a complete Linux distribution.

For comparison, here’s a screenshot from Ubuntu’s bug tracker.

This is exactly the kind of sentiment we need to figure out how to dispell so we can encourage more helpers! Any ideas what would convince you @kvtb?

blaggacao · October 7, 2020, 11:23pm

I consider prominent and massively useful label automation (not necesarily merge) would help combined with a dashboard. And I think you are right, the answer is to be sought affter in the eye of the spectator.

fricklerhandwerk · October 8, 2020, 5:47am

I had the exact same impression about the number of PRs as @kvtb, but was simply not aware of comparable numbers or the global view. The latter certainly helped dispel the impression, but I suspect it’s main origin is how GitHub presents those numbers in the first place. I‘m not sure though if issue or PR count (or how many per month are closed/merged) is very informative about the momentum of nixpkgs. My gut feel is much of it is automated bumps and thus noise. It just reflects how up to date the given package collection is. So I concur with @blaggacao that some kind of evaluation of the labeling would better reflect what is actually going on; new packages, bugs fixed, refactorings, architectural changes, …

danieldk · October 8, 2020, 7:44am

I think another nice way (though an abstraction) of how we are doing is the Repology freshness graph:

https://repology.org/graph/map_repo_size_fresh.svg

Current screenshot below. We have the package set with almost the largest number of packages (AUR has a bit more) and we have by a wide margin most fresh (up-to-date) packages. Though it would be nice to have a ranking of freshness per package. PRs do linger sometime, but as the graph shows, we are doing really well in adopting packages and maintaining them.

Anyway, we can debate things until the end, but the primary way to do even better is to go out and actually review PRs. We could could probably avoid most of the backlog if we had a dozen more people reviewing 5-10 PRs per day. Automation can solve a lot of our problems, but not every problem.

Also, I spent some time a while ago going through the long tail of old PRs, and many of them are not simple version bumps (though those slip through the cracks as well), but have substantial changes that require someone familiar with the software/domain to review.

jonringer · October 8, 2020, 8:12am

If you exclude “unique” packages, nixpkgs has significantly more packages 33,000 (Arch + AUR) vs 45000.

Although, even this isn’t fair as there’s still packages available on the NUR, which aren’t accounted for in this.

j-k · October 8, 2020, 9:21am

Another way to look at is we merged 1,711 PRs in the last month, so the backlog is only about 1 month big.

Can a graph/badge in the README show how long the backlog is? This is an interesting stat

nixinator · October 8, 2020, 11:07am

nice graph, is it about how many…

is it about longevity of successful compilation and software reliability without resorting to prebuilt binary containers.

is it about ‘the number of packages available’ or the number of actually packages that can be installed from source on a single operating system with out using containers?

competing with metrics of other distributions, doesn’t really show how nix/nixos is different… so nixos is an organge, and comparing it to an apple make might not be exactly what it needs.

raboof · October 8, 2020, 1:46pm

This is definitely true, and I don’t think we should point at that graph when trying to ‘sell’ Nix/NixOS to them.

However, here it was used as an (IMHO convincing) argument that even though there are ‘many’ open PR’s, this definitely does not mean we are ‘lagging behind’: we’re just “active”.

doronbehar · October 8, 2020, 3:14pm

I think that the graph is an excellent selling point of NixOS. Perhaps it should even on the website. That was the main argument that led me to use NixOS - seeing that so many packages are available, that GitHub is in use (no mailing lists!) and that there’s a guy who runs a bot that updates them all ( @ryantm) ¹.

jonringer · October 8, 2020, 7:32pm

Agreed,

But you could also say that nix allows for us to do more with less resources. The declarative style is a “force multiplier” in terms of man hours.

Still blows my mind that I’m able to use the power of git to review changes, which are usually identical to the changes intended by the author.

blaggacao · October 9, 2020, 4:49am

The specifics of the PR template (and review process) do sometimes mismatch with the specifics of a particular change. I can remember a case, where this introduced “burocratic overhead” in the likes of somebody manually checking a macos build for some go tooling (or something of that kind). If I’m not mistaken, a variation in mac build compatibilitiy does not depend on variation of the proposed changes to nixpkgs as far as go is concerned.

I think nixpkgs-review is a promising project (and a low hanging fruit!) to lower the friction.

nixpkgs-review should be arranged for self-handling all non domain-specific aspects of a review in the hands of the proponents themselves. For the domain specific aspects, the proponents are anyway mostly the best available experts on any specific tool.

Positively nixpks-review’d changes should enjoy a trust bonus, e.g. via a label. → Arrange for nixpkgs-reviewed label with ofborg · Issue #137 · Mic92/nixpkgs-review · GitHub

I also think a stalebot should up the pressure and triage serious commitments from less serious ones by introducing the notion of time constraint and thereby revealing contributor’s true priorities. (If finding a reviewer is part of the game, so be it!) — that would also put soft constraints on reviewer’s deliberation to not exceed reasonable conern.

blaggacao · October 9, 2020, 5:44am

Addendum: maintainers should be allowed to become teams, akin to what github’s research department figured in CODEOWNERS — together with ofborg requesting reviews from those people.

For this to become somewhat near to effective some semantic reorganization of the repo that roughly fits interest-conform review teams would be necesary together with such folder level overrides for default reviewers per folder.

That would make ist so easy for people to jump on and off the train as a reviewer by simpliy PRing discoverable and well known places, and also encourage narrowly scoped entry-level reviewer “positions” on topics of their interest and expertise. — divide and conquer!

@timokau When combined with github’s own notification system, wouldn’t that lift the need for (maintaining and promoting) marvin?

Is this an RFC-worthy proposal?

raboof · October 9, 2020, 7:13am

Wouldn’t that encourage contributors to ‘pressure’ maintainers for reviews? I’m not sure that’s healthy. I have some pretty old PR’s that I’m definitely still serious about, but on the other hand I don’t want to burn out the maintainers by nagging them…

I like that idea - we can get partly there today by simply encouraging non-maintainers to test and review PR’s in fields they are interested in (even though they won’t be able to merge).

timokau · October 9, 2020, 7:58am

Also not a fan of this idea. Open source work should not be made any more stressful than it already is, for both parties. People are volunteering their time, we should try to be respectful of that. The proposal doesn’t sound very welcoming to new contributers, I’m not sure I would ever have gotten involved in the project if that had been the process at the time.

I don’t think it does. It does not distinguish between PRs that are blocked by the author vs PRs that are blocked by review. It does not check up on PRs gone stale, finding a new reviewer. It doesn’t give the reviewer control over the volume of PRs they will have to review.

I’d happily be proven wrong though, if your proposal works better I certainly wouldn’t mind giving up the additional responsibility

blaggacao · October 9, 2020, 2:34pm

Well (temporarily) stale PRs, even if closed are still PRs and one can happily be serious about them. But at the very least they do not add up to marginal complexity towards to the set of non-stale “LastMile/LastMeter” PRs.

The dynamic usually is as follows:

As an author, I am the defendent of my PR and it is presumed I want it to get merged.
Github’s builtin notifications system gives me all the tools I need to effectively pursue that goal.
There is no need for a reminder: i can simply star a github notification tonthat end (copy it to taskwarrior or whatever).
I can re-request reviews from any reviewer if i feel the need for it (needs-reviewer). Riviers get notified accordingly and semantically rich directly through github.

Argument 1 alleviated.

If there is exactly one team “responsible” of reviewing a particular subfolder, they are all pinged by github. Anyone can approve or request changes. There can be a policy for a merge (eg. two approvals). This policy can even be enforced directly through githib on protecred branches. If non of the “responsible” reviewers get’s to review an assigned PR, that’s an inherent problem with this particular team that no bot can ever solve. A stalebot would just put in place the right incentives for the contributor to escalate the situation, for example here in this forum. (Stalebots can be made very friendly and can always be blocked).

Argument 2 alleviated

I’m not sure if this function will actually be properly working. A lot of people will have time one week but no time for three weeks in a row, and it should be pretty unpredictable how much time each one can afford to review. With sufficient division of labor (in this scenario: folders!), Github would suggest some PRs for review which are guaranteed to be in your area of interest and where you can pick as you like.

Argument 3 also alleviated?

I strongly feel, if not absolve, this would largely simplify marvin and make it a friendly nudger (as a stalebot subsitute) — if that promotes effectiveness.

blaggacao · October 9, 2020, 2:41pm

Unfortunately, I think it is not very realistic for this UX to have any effect at scale.

It is similar to what @timokau suggest with delegation in marvin’s documentation. But delegation can never unfold the same levels of engagement as opt-in/opt-out.

blaggacao · October 9, 2020, 2:43pm

I was the stalebot here:

https://github.com/NixOS/nixpkgs/pull/83630#issuecomment-705759095

In a stalebot scenario I would have written:

/stalebot unstale re-open
... explanation ...

timokau · October 9, 2020, 3:53pm

I don’t have time/energy to argue this in detail right now (and its a bit off-topic for this thread anyway), but we mainly seem to differ on how much work/expectations we want to put on PR authors. I don’t think its reasonable to expect them to move their PR forward on their own, especially if its one that is not “sexy” to review. It can be draining, it can feel wrong to ping someone over and over.

But again, I don’t mean to discourage you. Might be that you are right, and I would gladly be proven wrong if you want to implement such a system.

kvtb · October 9, 2020, 8:34pm

Your explanation (in terms of backlog size in months) is already sufficient, thanks for that.

Another threshold to contributing (at least for me) is that I’d like to contribute ‘small’ stuff without becoming a true ‘developer’. Sometimes I create simple default.nix to package something this is not yet in nixpkgs, and I would like ‘give back’ to the community by sharing it, but without creating a PR (with a Github account I don’t have because I’m not a dev) and becoming an official maintainer.

Sometimes I post the code on reddit (in the public domain) and hopefully someone else can use it.

It could be that the Nix community does not favor ‘contributer light’ style contributions (because I don’t commit to creating a PR every time there is a new upstream version), that’s also fine, then I’ll just continue posting code on reddit.