Nixpkgs has been the largest repository for months

Turns out that a few thousand packages hidden in subsets were simply not included in the JSON file we export and Repology imports.

Now that tarball: backport nixos-search's subsets for JSON generation by Atemu · Pull Request #105857 · NixOS/nixpkgs · GitHub is merged and the channel has advanced past it, nixpkgs-unstable has surpassed the AUR:

You all rock.

17 Likes

We are off the charts! :wink:

10 Likes

Strangely enough i made gourse simulation of the growth of nixpkgs. Near the end it it almost breaks my machine :-).

Check it out (with -recursive)

isn’t working on AMD based GFX cards right now due to some issue with nixos that i’m trying to get to the bottom of.

The Repology dataset is mostly made of “distribution-like” generalist package repositories, but it also includes a few application-specific package managers / repositories: CPAN / CRAN / crates.io / Hackage / MELPA / PyPI / RubyGems. However, it doesn’t include the biggest player in the field: npm. According to this blog post, the package count on npm is over 1.3 million. Also, some other repositories listed on Repology might show truncated numbers. This page claims that there are 164,202 Ruby gems but Repology only counts 3526. That being said, these numbers are not comparable because package authors upload directly to npm or RubyGems and use them as a primary distribution source. So the “work” is much more distributed and unmaintained packages are not generally removed. On the contrary, nixpkgs (like most “distribution-like” package repositories) is curated and maintained by volunteer maintainers that are mostly independent of package authors.

A good source of data on application-specific package managers is https://libraries.io/. According to their stats, there are 7 registries / repositories above 100K packages and 10 above the 59K bar set by nixpkgs unstable.

6 Likes

I don’t think it’s a goal of nixpkgs to compare ourselves against npm. I think it is insightful to compare ourselves to other distributions which have a high degree of package overlap. Nix is a truly unique way to approach software package management, and being able to gauge ourselves against other established repositories (even those which have funded dedicated teams) gives a good metric of our viability as a repository.

Also, including 1.3 million node packages, and 164k ruby gems would increase the size of the nixpkgs repo greatly. The nixpkgs repo is already at 206M unpacked in its current state.

8 Likes

statistics can give you always what you want.

you could take a closer look to quality (not quantity)
count the broken packages as well:
e.g. R
pkgs= import (fetchTarball https://github.com/nixos/nixpkgs/archive/nixos-20.09.tar.gz) {}
total:__ 18476
broken: 18472


on day two you could ask about semantics/variants/long term strategy?
like python38 vs python38Full

what about the sub ecosystems depending on nix/nixpkgs?


on day three you could ask about process assurance

  • how many packages are getting create automatically // have to be manipulated by human effort

I’m not sure the point you’re trying to make. I’ll agree that some ecosystems in nixpkgs have less support, but nix is usually a solid foundation in which to build abstractions. And some of those abstractions (e.g. pythonXFull) have been around for a long time: Nixpkgs will be celebrating its 18th birthday in Mar 2021.

Scenarios that core contributors really care about (zfs, wayland, system configuration, reproducibility, etc.) probably has some of the best user experience across any linux distribution, while many less used technologies and tools may have poor to non-existent support in nixpkgs.

For a package repository, we just need to package the software as close as upstream intended the software to be used. In discord, the majority of package issues that people have are from the upstream source, and not so much nixpkgs’ packaging. But maybe I have some perception bias in this regard.

7 Likes

I think a more useful metric would be “What is the chance that a package a user wants to use is available and working in nixpkgs?”

This would take into account how popular a package is, since more users will want to use it. So a distro only packaging all 1.3 million node package wouldn’t have a higher score than most others.

13 Likes

I don’t want to MAKE a certain point
→ it is more important to see a total picture than a stats figure
think yourself what your needs are and what a adequate solution could be :slight_smile:

Out of curiosity, I’ve just taken the top 5000 packages of “Statistics by source packages (max)” sorted by “inst” from https://popcon.debian.org/ ; and grepped the nixpkgs source code for these package names. I then eliminated packages that contained python or node, as we probably have them and the name would never match. This is quite obviously a bad evaluation (eg. just in the first page of the csv slang2 is not found as “in nixpkgs” because nixpkgs calls it slang), but whatever, it may still be interesting to have a look at.

It left me with 4518 packages, of which 1748 are not in nixpkgs. Obviously there are still a lot of false positives (eg. ripgrep is rust-ripgrep for debian…), and I’m not totally sure what this list precisely is about, but I feel like it gives an overall idea: we probably have around 500-1500 packages missing to have the debian-popcon top 5000.

Note that this comparison is quite dumb though, as popcon is voluntary, my testing is really really simple, I also count debian-specific packages like debconf, etc. But I think it does show that we still have a way to go to have all useful packages.

(FWIW, I’ve opened Introduce some kind of popularity contest/popcon data for packages? · Issue #159 · repology/repology-webapp · GitHub to see whether repology might be interested in introducing something like that directly on there, as it has all the normalization logic already they probably could do much better than I could by blindly grepping through nixpkgs)

2 Likes

It left me with 4518 packages, of which 1748 are not in nixpkgs. Obviously there are still a lot of false positives (eg. ripgrep is rust-ripgrep for debian…), and I’m not totally sure what this list precisely is about, but I feel like it gives an overall idea: we probably have around 500-1500 packages missing to have the debian-popcon top 5000.

If anyone wants to try making sense of that list, I guess I would recommend for each missing package to see what executables it provides, look them up in nix-index, and report how many of the packages provide executables that we do not have. I do not have it in me to do this right now, though. If someone wants to go fancy, they could find out what are the top source packages ordered by the popularity of the most popular corresponding binary package that provide binary packages with binaries that we do not have.

I wonder how many packages are actually debian-specific configuration tools, but that is probably harder to check automatically.

1 Like

I think the AUR is at a slight disadvantage here in that it isn’t meant to be a complete repository containing every package, but to complement Arch Linux’s main repository with additional packages which aren’t in the main repository. While there is some overlap where e.g. the AUR has a different version or flavour of a package which exists in the main repository, it would probably make more sense to consider Arch + AUR together, in that that represents the total range of packages available to an Arch Linux user. If we do a quick and dirty calculation ignoring any overlap, that would give 58222 + 9370 = 67592 total packages packaged for Arch Linux, which is quite a bit more than nixpkgs unstables’ 58617 packages.

(That wasn’t meant to be a reply to @Ekleog, but now I can’t seem to change it and if I delete it and try to re post it as reply to the thread it complains that the body is too similar to what was recently posted :unamused:)

5 Likes

Also remember that the AUR is inflated with “variants”. Most of the really popular packages will have a <pkgs>-<svc tag> variant (e.g. ripgrep-git). When comparing different repositories, I think the non-unique value is a better reflection of its package availability (as a package with that same name needs to appear in at least one other repository). And the same applies with nixpkgs. This would be 48154 for nixpkgs unstable, and (27117 + 8788) = 35905 for Arch+AUR.

Not to mention that the NUR is also not calculated in this, it may be an order of magnitude or two less popular than AUR; but there are still a significant number of packages in it.

7 Likes

Repology accounts for such variants in their ‘Projects’ metric.

2 Likes

https://github.com/NixOS/nixpkgs/issues/110348

We are back on top after the emacs issue was resolved.

Screenshot from 2021-08-22 10-40-48

Source: Repository statistics - Repology

5 Likes