Why is it, that Nixpkgs' freshness isn't closer to its maximum?

Nixpkgs is impressive. But how is that gap accounted for, anyways?

(Graphs - Repology)

3 Likes

I’d be skeptical of any answers that aren’t backed up by an analysis of what’s behind in nixpkgs, and whether the set of laggards is fairly stable or rapidly changing.

With that big fat asterisk, a few thoughts:

  • AFAIK, repology learns about new package versions when it first discovers that at least one repo has updated to it. Many of the repos that are the least out-of-date, per repology, are smaller language/ecosystem package manager repos that are more or less the canonical/reference repo for their corner of the packaging universe. I haven’t tried to analyze, but I assume updates to these packages almost always appear in these repos first (and may be updated by a developer or automatically updated?)

    A general package repository is going to have a hard time keeping up with repos that are canonical or near-canonical for most of the packages they contain. We won’t ever be the first to know about these unless we set up infrastructure to independently monitor source repositories for all of their packages.

  • I’m not sure it would be meaningful to try to beat/tie those repos on individual updates, because I think it’s normal or normal-ish for us to batch update these (as in rPackages: CRAN and BioC update ¡ NixOS/nixpkgs@7ed992a ¡ GitHub) on some interval. For example, this search shows that the CRAN update appears be included in a broader r package update every 6 weeks: Search ¡ repo:NixOS/nixpkgs "rPackages: CRAN and BioC update" ¡ GitHub.

    I’m not intimately involved in any of these, but I imagine they happen this way due to limitations on human time/energy.

  • AFAIK, something like r-ryantm is driven by what repology knows is the latest update. For the packages the bot is able to update, there’ll be some amount of lag time between when repology notices, r-ryantm notices, the PR can make it through review, the package update can propagate back to unstable, and repology can in turn notice that we’re up to date.

  • Even if r-ryantm knows about an update via repology, it isn’t going to open a PR if the package build fails for some reason. If the maintainer or someone else in the community doesn’t notice and raise a flag or try to fix it, it’ll go untouched until someone does.

If you really want to make the line go up, I’d guess the 3 most-meaningful places to look are:

  1. figuring out whether additional resources can shrink the intervals at which parts of nixpkgs are bulk-updated from some other source safely without creating some other bottleneck (reviewing, ofborg, hydra, etc.)

  2. developing some automated tooling to recognize and surface packages that repology knows we’re out of date on that r-ryantm or any other update bot are unable to try to update and fix any that are easy

  3. developing some automated tooling to recognize and surface packages that r-ryantm is failing to update and see if nudging the maintainer or surfacing that information for users is sufficient to get it fixed

    (I’m not sure it’d be a good use of time to send people who are unfamiliar with a given package out into the world to try and fix all of these just to make the line go up. People being interested enough in the package to ask/report/PR seems like a good filter…)

8 Likes

Probably because this way of “measuring” freshness assigns equal weight to binutils and left-pad.

6 Likes

The graph charts the number of total packages and fresh packages for each repo. Then the angle from the origin to the point represents the percentage of fresh packages in that repo. Hypothetically, if a repo lies on the line then all its packages are fresh, and if a repo lies on the x axis then none of its packages are fresh.

So the question posed above is: Why are less than 100% of packages in nixpkgs fresh?

There are some other interesting things to note about this chart:

  • Repos that are their own source of truth will always lie on the line, e.g. CRAN, Hackage, CPAN, Ruby Gems, crates.io, etc, because what is a fresh package is defined by it being published in that repo.
  • The absolute number of fresh packages in nixpkgs is still 2-3x most other repos

I agree with abathur that this really needs a detailed analysis of which packages are behind before making any conclusions, including the conclusion that this is something that needs to be ‘fixed’.

3 Likes

It looks like repology exposes a week of database dumps at https://dumps.repology.org.

Given that they list when each new version appears in a package’s history tab and when each repo caught up, I assume that the database contains sufficient information to help answer some basic questions…

  • It may not have a sense of what repo is ~canonical, but I imagine it’s possible to cluster packages that always appear first in a given repository
  • get a sense of the typical update interval for every package in unstable
  • ballpark what impact halving the interval of a given bulk update would’ve had on freshness over the last year
  • figure out what was out of date on every day of the year and whether it’s a constantly-churning pot of 2-week update cycles or whether there’s a big core of packages that are almost always out of date
  • maybe try to tease out sets that are out of date because they release extremely often (much effort to keep up) vs those that went 4 years without a release and suddenly sprung to life (anyone who cares may no longer be watching carefully for updates)?
5 Likes

It would be cool if we could get nixpkgs master onto Repology. I’m not sure if they’d be amenable to it since it isn’t really a recommended way to consume packages.

1 Like

Channels are usually within 2–3 days behind master, so I don’t expect it would make such a big difference for the numbers.

3 Likes

Repology doesn’t use Nixpkgs’ Nix expressions directly but the packages.json we generate for the channels.

Master wouldn’t really work out.

1 Like

It doesn’t necessarily, but humans can be impatient, and a distribution like the Arch/pacman db often strives for day 1 updates, which our current infra sort of rules out by default.

Of course, from a point of reason I could say that this really doesn’t matter, since its typically only a few days behind anyway; but humans are not always reasonable either :sweat_smile:

That’s not to say that there aren’t real ways that we could probably improve this so called “freshness” metric. Systems like Debian & Arch, which I most familiar with from previous experience, typically have a clean split between what are deamed “officially supported” packages, and those that are community maintained. We somewhat have this split in nixpkgs too, we have the nixpkgs repo itself, and there are plenty of other repositories on GitHub containing Nix code, take NUR, or the collection of flakes out in the wild.

Some of those packages even have their own repo specific packaging, and a more stable version in nixpkgs. But I digress, the point is that our version of this official/unofficial split is more ad hoc and a lot of packages that aren’t, and maybe have never been well maintened end up in nixpkgs anyway, and conversely, there are some good nix packages out in the wild that don’t exist in nixpkgs. I believe, to some extent, that this clear boundary in other projects makes it easier to focus on freshness where it really counts, i.e. “officially supported” packaging.

Of course, there is also the simple fact that packages that more individuals find useful will typically just naturally recieve more attention at the packaging layer, and so its for packages that don’t really have this natural advantage that could likely use the most help, at least if they are to be officially supported in nixpkgs.

I’m not saying I know exactly what we should do about it at this point, but we could maybe start by trying to make that official/unofficial distinction clearer than it currently may be, even if it means removing some poorly maintained packages for now.

I wondered the other day how many outdated packages my system has compared to the proverbial 89% freshness and wrote a nix-olde hack that measures just that. For packages in my system it reports 325 of 1425 (22.81%) installed packages are outdated according to https://repology.org.. Which is a lot worse than 89% fresh.

A bulk of these stale packages are:

  • xorg.*, which does not have an auto-updater
  • python:* packages (trail something?)
  • perl:* packages (trail something?)
  • haskell:* packages (trail stackage? expected to lag)

Random factoid:

  • when ran master tool reports: 372 of 1432 (25.98%) installed packages are outdated according to https://repology.org.
  • when ran staging tool reports: 325 of 1425 (22.81%) installed packages are outdated according to https://repology.org.

staging → master → channel does not make too big of a difference WRT outdated packages at least on my system. Many of packages are just stuck on old versions and need manual intervention.

5 Likes

Nah. It is not a general recommendation to stick to master channel. The unstable channels provide the rolling-release-nixos enough.

1 Like

For me a bigger concern is that there are bugs which block some package updates for pretty long. In other words, a small group of packages is really outdated.

I think the community should schedule a NixOS release where the focus is mostly on removing cruft, refactoring and cleaning up. Apple did this at some stage (Snow Leopard) and it was very successful. Intel is also refactoring architectures every other release cycle. It’s good to keep technical debt under control.

However, in general, I find channels to be fresh and Nix quite pleasant to use.

4 Likes