Automated package updates

jakegillberg · May 5, 2021, 6:33pm

I see a few processes around here that grab packages from another package manager and “nixifys” them. See https://github.com/NixOS/nixpkgs/blob/2149ad5bdfa57b29135c1bf4ffa54d4eaa629730/doc/languages-frameworks/r.section.md#updating-the-package-set and I'm retiring from Haskell maintenance in Nixpkgs

Is there a way to automate these sorts of pull requests that consist of running a script to update a package set?

My process today was: notice an R package is broken and needs updating, realize that to update an R package the “easiest” thing to do is to bump all packages pulled from cran. Realizing that this process hasn’t run since January, and in the meantime there was a patch introduced as a “quick fix” for one package without updating the whole set, which broke the tool to update the whole set.

I’m worried that this sort of thing would turn away users who aren’t already 100% sold on the nix mission. I ended up “undoing” the quick fix and updating the whole package set (resulting in bump R packages by Jake-Gillberg · Pull Request #121819 · NixOS/nixpkgs · GitHub), but this was really more time than I was expecting to spend on nix today, and I still don’t have the correct dependency included in my project because overriding in this scenario seems a bit complicated

It sounds like @peti had some personal automation going on already. Would it make sense to extend this strategy and automate some of these processes “officially”? What would that take? I’m willing to help, but don’t know enough about the current processes of nix release / maintenance to know where to start.

jonringer · May 5, 2021, 7:13pm

There is some automation through nixpkgs-update. However there’s a few criteria that need to be met before a PR is open:

nixpkgs-update needs to be aware of it, so if it doesn’t show using nix -qa, then nixpkgs-update won’t attempt an update
The file needs to be written in a way where the hashes and version number can be updated in a non-ambiguous way. This is usually only an issue where a file contains many derivations.
The package needs to build successfully after the update.

That would probably have to have some manual steps (like an individual creating a PR). Generally large ecosystem updates (such as python or haskell), has someone ran the automated part, then deal with breakages in a PR. And this still requires someone to dedicate a significant amount of time.

Yea, the haskell update process consisted of running a tool which would populate everything to the latest. He would push it to a branch which had a hydra job. And then his streams were most him resolve build issues that were identified in the hydra jobset.

Only if someone is there to deal with the fallout of breakages. This was peti for haskell, but now a looks like a team of people stepped up. I don’t think r has anything similar.

raboof · May 5, 2021, 9:45pm

For me personally, I would really like it if those tools allow more ‘surgical’ updates rather than just mass-updating everything to the latest. I know I have been in the situation in the past where I wanted to contribute an addition/upgrade, but decided not to when it turned out it’d require also updating all kinds of things I wouldn’t be comfortable testing/troubleshooting. How realistic that is of course depends on the ecosystem, and even in ecosystems that make this relatively easy someone still has to step up and actually build it .

jbedo · May 7, 2021, 3:57am

I’d be keen on an automated update process, or at least hydra building the packages, and would be willing to help maintain the tree and deal with breakages. Having up to date CRAN and bioconductor trees is crucial for science and data mining areas, and Nix brings a lot of reproducibility to the table.

jbedo · May 7, 2021, 3:59am

For CRAN/BIOC I don’t think individual updates are helpful, we really need to be in sync with upstream. Especially for BIOC which has a release cycle, and packages should all come from the same release.

DavHau · May 7, 2021, 5:13am

For me, it sounds like the nixpkgs R architecture should be improved. If most of the update work is done via an automated tool, then the automatically generated expressions should be separated from the manual changes. All manual changes could be done via overrides, so the auto generated code is never touched by anything else than the tool.

To save some space in nixpkgs, it would probably be enough if the automation tool just dumps some json data into nixpkgs which is interpreted by some nix function generating derivations from it. I think these massive amounts of auto generatd nix expressions have low entropy and just bloat up the total size of nixpkgs. (BTW, I did some benchmark a while ago and nix seemed to be 3 time faster parsing massive amounts of json, than parsing the same data in a nix expression.)

Ultimately, in the long run, I think the best solution might be to never have any autogenerated code in nixpkgs if the resource that is used to generate code can be fetched reproducibly. That resource could as well be fetched by nix itself and interpreted on the fly. Of course this would require IFD to be enabled which would in turn require proper evaluation caching and a community shared evaluation cache to work efficiently I guess. Therefore, I’m not sure, if this is ever going to happen.

jbedo · May 7, 2021, 5:29am

This is roughly how it works currently: an automated tool generates nix expressions for the packages in cran and friends, and then specific manual overrides are made in the top-level default.nix to fix things. The auto generated code is never touched.

What’s missing currently is to automate running the update script, and to build the packages as part of the CI so we know what’s broken and needs attention. Currently the update script gets run (manually) when packages get old enough for somebody to notice as it impacts their work.

asymmetric · May 7, 2021, 1:10pm

Could you clarify what you mean here?

alexv · May 7, 2021, 5:48pm

There is currently some WIP on improving the infrastructure for R packages in nixpkgs (e.g. r-modules: add GitHub only r packages by cfhammill · Pull Request #93883 · NixOS/nixpkgs · GitHub). I have my own versions of the the scripts because I need to support other types of repositories like internal mirror of MRAN, proprietary packages, etc. I have also added a way to pull certain packages from different MRAN snapshot dates due to a few cases of buggy CRAN releases or archived CRAN packages. It looks like MS’s interest in R is waning so I am planning to add support for the new MRAN-like repository hosted by RStudio.

I can share my code if anybody is interested.

DavHau · May 8, 2021, 2:35am

I mean, if there is a parser, that allows us to process some data and generate nix expression from it, then why don’t we just implement that parser in nix language. A nix function can directly generate derivations on the fly without needing to generate code as an intermediate step.

Example. There is a third party repo that provides data about it’s packages in json format. Instead of running the data through a tool that generates tons of nix code, we could just fetch that json via fixed output-deriavtion and have a function that knows how to generate derivations from it on the fly. That would be easy to update and there is no boilerplate code. Just the evaluation costs are too high to do that currently because of IFD.

But dumping the data in compact json format in nixpkgs and having an on-the-fly parser would already be possible and probably cheaper in some ways I think.

toraritte · October 17, 2021, 12:29pm

There is an ongoing effort to standardize these approaches in Status of lang2nix approaches (linking it here so that it is also reflected there).