Aggressive Kernel removal on EOL in NixOS

Well, the linux kernel listing itself speaks about “Longterm maintanance”, it is reasonable to use “LTS” as a universally understood shorthand.

2 Likes

Just opened nixos/manual: document kernel backporting policy and implications of it by Ma27 · Pull Request #204780 · NixOS/nixpkgs · GitHub to document the current state.

While we have a different opinion on that I guess we can all agree that it’s good to have the status-quo documented (and change that in case we agree on a different policy).

8 Likes

FWIW, the new zfs 2.1.7 release (made 5 days ago, available in nixos-unstable now), says:

  • Linux : compatible with 3.10 - 6.0 kernels

This does not directly solve the core issue here, which will presumably come up again … eventually¹.

However, it does give us two things to work with:

  • less pressure in conversations (like this one) about how to solve the issue and the urgency of doing so / difficulty of the immediate situation
  • some practical idea of the kind of grace period that might be useful: this thread was opened 28 days ago.

In particular, let’s look at the coincidence of timing for this particular cycle, because it’s perhaps a worst-case example:

  • 2 Oct Linux 6.0 released
  • 4 Oct ZFS 2.1.6 released. Clearly the zfs release was already underway and in testing when the kernel release happened, just about as awkward a coincidence as possible.
  • 24 Oct Linux 5.19.17 released, 5.19 EoL
  • 27 Oct OpenZFS updates metadata to make Linux 6.0 compat official (in their next release).
  • 1 Nov NixOS drops 5.19 and the latest... package alias reverts to 5.15
  • 9 Nov This thread opened, after above merged through branches and impact felt.
  • 2 Dec ZFS 2.1.7 released with support for Linux 6.0

That’s not entirely fair either. It’s also not the zfs project’s problem if today’s latest upstream kernel was still in rc at the time of their last release. They are not going to make a release with official support in anticipation of an unreleased kernel².

It’s the same kind of issue as NixOS not offering official support for a recently-EoL kernel: there may be no issue running the combination today, but that could change tomorrow if there’s a last-minute change before kernel release / a new vulnerability found in the now-unmaintained kernel.


[1]: The core issue (as I’m choosing to focus for the purposes of this point) is not the requirement to use a newer kernel for new hardware, nor even zfs support for that same kernel, but it’s the coincidence or beat frequency between the cadence of kernel and zfs releases.

[2]: As noted, even if the kernel was technically released a day or two before. It’s hard to imagine a worse-case coincidence of timing. With a few extra days (or weeks) shift in either release, it might not have been an issue at all this time around. As I noted previously, the META update was about 3 weeks after kernel release. There weren’t many other changes needed for actual support, that had already been done based on rc’s, so this likely mostly reflects shake-out time for test suites to run, and perhaps some lower imperative without release timeline pressure in this case.

3 Likes

Still, it seems to me that the release candidates provide OpenZFS with plenty of opportunity to be ready for the 6.0 release, so they could make a compatible ZFS release shortly thereafter. The fact that it took them two months is rather strange to me. AFAIK OpenZFS doesn’t use any kind of release schedule.

While it may look like an unlikely coincidence, this unfortunately isn’t the first time it happened and I don’t think it’ll be the last. As mentioned previously, the same thing happened in the 5.10 era with (IIRC) 5.11 and 5.12 taking similar roles of 5.19 and 6.0 this year. 6.1 will be out next week.

Thank you for laying out the timeline so clearly though.

I don’t expect them to either. The point was that 6.0’s ABI they need to target was clear for multiple months already when 5.19 was dropped (6.0-rc1 was cut on 2 Aug).

If the ZFS project can’t manage manage to produce a bugfix release in which they fix ABI compat for that kernel in that time frame (even if that’s all it contains) then yes, that’s on them and IMO a clear sign that using anything but LTS kernels isn’t really a supported use-case of ZFS in practice.

1 Like

The intent of laying out the timeline was also to show that it’s essentially inevitable, even where no additional changes are needed / have already been made based on RC’s, though maybe for shorter windows of inconvenience in most cases. It can happen on any kernel release other than the one after a new LTS is chosen, and even if it’s avoided with a timely zfs release following, that kernel will in turn probably be EoL before the next zfs release. They also have FreeBSD release cycles to work with.

Advocating for different release policies from either the kernel or zfs projects might, or might not, be worthwhile — but it’s not going to be so here.

Here, we need to understand the external reality and decide how to handle it in NixOS. There are good concrete suggestions in the thread, around improving documentation, and facilitating ways of keeping kernels from going backwards (directly or indirectly, at user risk).

1 Like

I don’t think it’s possible for latestCompatibleLinuxPackages to be both 1) useful and 2) unable to cause the kernel to go backwards. If it only ever points to the latest LTS, then it’s not useful because that’s already the default in NixOS. If it points to a non-LTS at any point, then it will go backwards if that kernel is EOL before ZFS supports the next one.

3 Likes

After there has been no answer that I consider valid in the matrix chat so far, let ask me again here:

Why do we have a grace period of many years for python 2.7, while kernels get removed on the day the EOL?

I’m not asking for keeping a kernel alive for another 3 years, I’m asking for another months, with a warning behind insecure or similar.

Make people aware that there might be a problem hitting them soon, rather than just pulling the rug under their feet… When the kernel gets downgraded without a word of a warning, affected users will possibly not even realize that they have a problem before the next reboot!

Py2 is hold tightly, despite a majority of python2Packages is not buildable anyway, because its removal might affect some IoT companies, that I never heard about before in IoT, nor nixpkgs context. But kernels are removed right now, affectiing real users, who ask how to solve thier computer not booting anymore.

I might sound a bout like a child that doesn’t get their candy, but I am just trying to understand why such different policies are applied.

5 Likes

I think you got a pretty good answer from @K900 on Matrix:

The reasoning is really that different packages have different maintainers and there’s no standardized policy and getting people to agree on a standardized policy is how you get depression

If you ask me, python 2 probably should be removed from nixpkgs. But I’m sure there are other nixpkgs contributors who would disagree.

I don’t really core whether it’s removed or not, though if some team argues against it’s removal because of potential use in industry, we can argue the same for the kernel, though there it’s not potential use, it’s effective use, and effectively blocking people when they get pulled the rug from below their feets.

I’m fine with either policy. I just want to understand.

And it would be fine if there would be some documentation, rather than rural knowledge different per ecosystem.

1 Like

As linked above, there’s now (per-ecosystem) documentation since nixos/manual: document kernel backporting policy and implications of it by Ma27 · Pull Request #204780 · NixOS/nixpkgs · GitHub

because there is no entire ecosystem build on top of kernel 3.13 and completely breaks with anything newer.

that’s mostly because the python team does not care to much for python2 and more and more unused python2Packages things got removed over the time.

or GIMP. Also one of the node gyp tools refused to switch to python3 for a long time.

1 Like

I’m sure that’s just a throwaway number, but it’s not 3.13 (which is >8 years old), it’s 5.19 (a couple of months ago) and maybe 6.0 very soon (unless it gets picked as new LTS, I’m not sure what the plans are there yet).