How to deal with waiting for the channel in production environments?

SynQ · September 23, 2023, 12:48pm

I run a few Mastodon servers on NixOS, and for this I use the ‘nixos-23.05’ channel (seems the most versatile and stable channel to me). In general the experience is pretty great: to deploy a new server costs a matter of minutes, to upgrade ‘nixos rebuild-switch --upgrade’ and everything just magically works (no need to manually run ruby upgrade scripts or restarts).

The current version available in that channel is 4.1.7 for which an update came out (4.1.8) on Tuesday with important security fixes. It got PR-ed in nixpkgs within an hour or so (or at least the same day).
I would expect that to get ‘available’ in 23.05 the day after, but it didn’t since there are ‘build problems’ in Hydra (as can be seen in https://status.nixos.org).

The result is that now (about 4 days after a critical patch came out) my NixOS based production Mastodon server is still running 4.1.7 which has a serious problem in it. This makes using NixOS for this purpose (to put it mildly) a less suitable solution.

How should I handle such a situation?

Is running NixOS the ‘stable’ way (i.e. no flakes and use nixpkgs) secure enough to run services like Mastodon?

Is there any ‘this is the way to handle this’ documentation (deemed stable or not, I just need practical advice) that I have missed?

mightyiam · September 23, 2023, 1:07pm

Just a noob mentioning what may be the obvious first comment. You could build yourself… don’t have to wait for the channel. Perhaps an overlay that includes the packages you need a newer version of.

SynQ · September 23, 2023, 1:14pm

I understand the reaction and I actually thought a bit about that, but it seems counter-productive.

Let me explain why:
If I have to keep my eyes on which packages are outdated myself that makes NixOS an unsuitable system to run publicly internet-facing software on top of.

Other distributions (like debian) have a specific channel or other way to handle security updates that do not get ‘hold up’ by the ‘normal’ updating process.

Perhaps there is some room for a specific channel that is maintained by a dedicated group of people to run ‘Fediverse Applications’. Or perhaps this is better handled by creating something that resembles such a channel using flakes.

I am trying to find a solution here, because I want the advantages of NixOS and a way to deal with the disadvantages that makes sense.

xfix · September 23, 2023, 1:30pm

In server environments, it may make sense to use -small channel instead. It gets updated faster, but the tradeoff is it sometimes may not have all applications necessitating the compilation step (that said, some packages are guaranteed to be available without compiling - see https://github.com/NixOS/nixpkgs/blob/2af64a3d1d65c28d8760c0b2e46ec3324b14344c/nixos/release-small.nix for the list).

However, even if a package needs to be compiled because Hydra didn’t build it, it usually doesn’t take long - most server applications are not Chromium or Firefox or LibreOffice or something that takes a long time to build.

SynQ · September 23, 2023, 1:45pm

Thanks @xfix that is the kind of practical information that makes sense. So yes I will change those servers channels to -small.

But for now it did actually not solve my problem, since the nixos-23.05-small channel also contains just 4.1.7 (and not 4.1.8 or 4.1.9) of the mastodon package.

I also does not answer the question totally of course, since the -small channel experiences the same issue that sometimes you just cannot wait for Hydra to run through all the checks (especially if it takes 4+ days for your package to come through).

xfix · September 23, 2023, 2:02pm

Currently nixos-23.05 has Mastodon 4.1.8, while nixos-23.05-small has Mastodon 4.1.9. Note that you need to update channels to see the newest channels with nix-channel --update (will update channels) or nixos-rebuild switch --upgrade (will upgrade the operating systems and channels) executed as root.

Also, nix-channel executed as a regular user affects channels used by an user, but not by nixos-rebuild. Each user has its own Nix channel list.

vcunat · September 23, 2023, 2:08pm

For reference, the big channels usually lag behind the top of their branch by two or three days. It’s really rare to get significantly more delay. I’m watching that.

Infinisil · September 23, 2023, 2:11pm

I’m at least partially responsible for the channel being blocked this time, sorry for that! The cause was this PR, which introduced a non-deterministic test failure (which is why I didn’t find out about it right away), which I only just fixed today.

Thanks to @vcunat and @hexa for watching the Hydra jobs and restarting them a couple times until the non-deterministic test happens to succeed. Because of that the 23.05 channel just got unblocked a couple minutes ago!

vcunat · September 23, 2023, 2:25pm

Wait, mastodon-4.1.8 is already on the latest nixos-23.05 channel. Moreover, if I read the timestamp right, it was so yesterday already.

EDIT: fixed link to the timestamp.

SynQ · September 23, 2023, 5:54pm

I found my current problem!

This was in my configuration.nix (someone helped me on this earlier, probably to mitigate the same problem I am posing here):

services = {
  mastodon = {
    package = unstable.mastodon;
    enable = true;
    configureNginx = true;
  };
};(edited)

Once I removed the ‘unstable.mastodon’ line I got what was in ‘stable’.

My question remains: what would be the best way to deal with packages lagging in the build process that cannot wait because of security updates?

vcunat · September 23, 2023, 5:59pm

It’s not individual packages. The whole channel may be lagging. But as I wrote, normally just for a few days. The -small channels were added primarily for “security updates”, as mentioned above.

Infinisil · September 23, 2023, 6:25pm

I can imagine a future with paid teams maintaining specific security-critical applications, with a separate Hydra channel for the fastest updates. This team would be tasked with packaging updates and fixing breakages.

SynQ · September 23, 2023, 6:30pm

I took this to the Nix/NixOS matrix channel and I got the following advice:

The best thing that could happen is if the Nix project would have more dedicated people to fix the build pipeline when (if) it breaks. That is expensive and it takes time to do.
The thing you can do for yourself (or for a group of people that need the same stuff) is create your ‘own’ nixpkgs that follows the stable channel but updates the packages you need faster.

Answer 1. Needs dedication from organizational (procedural) people and money (or other incentives) to put to the right people and have them work towards the right goals.

Answer 2. This can be achieved by picking the PR’s you want to have in the channel ‘now’.
Links on how to go about this (with examples):
https://www.ertt.ca/nix/patch-nixpkgs/

To react to @Infinisil what you are suggesting is the combination of 1 and 2:
A team that maintains a ‘security’ channel that has as it’s focus to ‘patch’ security holes by pulling those PR’s that are security related into a channel quicker. This will probably also involve communication with the package managers to keep subsequent PR’s working.

vcunat · September 23, 2023, 7:25pm

Also, sometimes the big channels can take a day or two just because of the current infra, whatever the humans do.

vcunat · September 23, 2023, 7:27pm

And I’m not even mentioning mass-rebuild fixes. Those can get delayed by a couple weeks, though we’ve proven that critical ones are usually doable within a few days if we want.

wamserma · September 24, 2023, 7:18pm

If there is a reason you can not wait for the channel to update and you know that the update of package XY will not break any downstream packages you rely on, you can
use system.replaceRuntimeDependencies as detailed here with a security-hotfix for OpenSSL: OpenSSL 3.0.7 update (2022-11-01) FAQ