Aggressive Kernel removal on EOL in NixOS

NobbZ · November 9, 2022, 9:29am

Currently EOL’d kernels get removed quickly from nixpkgs and users that rely on them get confronted with an error on updates.

The recent removal of 5.19 now leaves nvidia and ZFS users behind with 2 options:

Not upgrading for who knows how long
Downgrade to 5.15

The latter might not be an option for users with sufficiently new nvidia cards.

Can we agree as the community on a longer grace period for EOL’d kernels? Or at least a way to opt-in for EOL kernel consciously via a nixos option? What would be the necessary steps to be taken to apply those to the nixpkgs repository?

Solene · November 9, 2022, 10:05am

Hello

What are the reasons for being so fast at removing kernels? Are there any issues keeping them for a while?

Can we programmatically check if newer kernels are usable for nvidia or ZFS users?

uep · November 9, 2022, 10:10am

There’s a mechanism for this in the zfs infrastructure already, but it only moves when there’s a new zfs release with announced support for newer kernels: Releases · openzfs/zfs · GitHub

hexa · November 9, 2022, 10:26am

The kernel people are well-known for not flagging potential security issues and these stable kernel releases quickly don’t receive updates anymore.

New NVIDIA card and ZFS (and it’s latest compatible attribute) are just an unfortunate match.

Ideally ZFS would release more often to match up with stable kernels, but here we are.

NobbZ · November 9, 2022, 11:17am

Thats fine and understandable. Still allow the affected users to remain on the EOL kernel for a while.

Make them set something like nixpkgs.allowEOLLinuxKernel = "5.19" or make the option even more inconvinient to use.

But give them the opportunity to remain on a recentish kernel.

A viable policy could be to keep a kernel in the EOL state for at least a full release cycle. Meaning, on each branch off, remove all the kernels that have been EOL on the previous branch off.

Really, I am aware of the potential issues EOL kernels can cause (thats why I am also fine with arbitrary complex options to be set), but really, giving users 2 options that both are not compatible with their hardware is worse than that…

Sandro · November 9, 2022, 12:15pm

I don’t see a reason to make this option so granular. EOL kernel yay or nay is enough.

I would like to add that EOL kernels do not get any patches backported or fixes for kernel modules, meaning it is essentially frozen in state. I do not want us to get into a state like other Distros where they carry plentiful of patches.

NobbZ · November 9, 2022, 12:20pm

Thats why I explicitely mentioned the granular option and repeatedly agree to make it inconvinient.

People need to be aware that they are on their own and that there won’t be any support. But this gives them some grace period until the next kernel does support their system.

raphi · November 9, 2022, 12:35pm

For me (nvidia user until I do my next hardware upgrade) the most convenient option would be to have something like linuxKernel.packages.linux_nvidia. linux_nvidia would then be an alias to the latest kernel that is both supported by the kernel devs and by the nvidia driver.

This would imply a kernel downgrade if 5.19 gets removed.

Not sure how implementable this is.

Shawn8901 · November 9, 2022, 12:38pm

@raphi Thats what was done for ZFS, see Merge pull request #199754 from Ma27/backport-5.19-removal · NixOS/nixpkgs@df2bcbb · GitHub

vcunat · November 9, 2022, 1:38pm

We could use e.g.

  meta.knownVulnerabilities = [ "unsupported upstream since YYYY/MM" ];

Then you have various options to bypass this disablement: Nixpkgs 23.11 manual | Nix & NixOS

adamcstephens · November 9, 2022, 1:39pm

I understand the security concerns around an EOL kernel and potential vulnerabilities. At the same time, breaking hardware compatibility for a user that prevents them from upgrading can also expose them to security vulnerabilities.

It may not be ideal to have users running on EOL kernels, but I agree with Nobbz that having an option to let users proceed with updating the rest of the system is a better solution.

uep · November 9, 2022, 4:06pm

There is another option; the zfs mechanism also offers a setting to allow using a newer kernel than the officially supported one.

In the zfs repo, the META file was updated to include 6.0 kernels 14 days ago. From a (very) quick look, there doesn’t seem to be any major commits fixing 6.0 compat issues before then since the previous release. It would probably work fine.

Now, to be clear, I’m absolutely not making any recommendations or suggestions here. I really only did a quick search for commits, am no expert, haven’t tested anything, etc.

Instead the point I’m making is that there are risk trade-offs here. An unstable ZFS+kernel pairing might eat all your data. An EoL kernel might be missing a critical fix. That risk assessment might change over time (say as new vulns are released).

Unfortunately, the current policy makes it strictly worse: by removing the kernel entirely, it means one side of the trade-off is instead an outdated entire system, with possible userland vulnerabilities as well, because the channel needs to be held back.

The first question is around policy: should we remove these kernels completely, or just mask them until a path forward is available, regardless of the mechanism and syntax. We don’t have to do that for every kernel; as noted the combination of nvidia and zfs schedules is currently awkward, and it mostly only happens for the versions just prior to the next LTS until everyone (zfs in this case) catches up.

As for mechanism, I like @vcunat’s suggestion. I also assume, without having tried, that there’s a way to reinstate the removed kernel with an overlay, or a local nixpkgs fork. Again, the policy question is whether we should push users to have to resort to those kinds of mechanisms.

7c6f434c · November 9, 2022, 4:50pm

Well, technically one can already reinstate the old kernel by importing from a pinned nixpkgs tarball from a specific commit. Not much difference with a local overlay, sure

clhodapp · November 12, 2022, 9:35pm

Perhaps NixOS should continue to carry an EOL kernel as long as it’s listed on the main page of kernel.org. If I understand the situation correctly, the current policy of dropping kernels as soon as they go EOL actually results in the final release in any kernel series essentially being dead-on-arrival for NixOS, as it tends to be the case that the kernel folks drop the final release and declare the series EOL at the same time.

Atemu · November 13, 2022, 7:32am

Why would an old kernel be an issue with new Nvidia cards? That only matters for nouveau and you’re obviously not using that if you’re trying to install the proprietary drivers.

Users of out-of-tree modules should use the most recent LTS kernel at the newest.

Longer I’m okay with but certainly not indefinite and not a lot longer than a week or two maybe.

We don’t want even more dead code in Nixpkgs.

I think we should keep an EOL kernel up until the next wave of stable kernel releases come out where said kernel doesn’t receive an update in anymore.

Week n - 2: Regular supported release
Week n - 1: Regular supported release
Week n: Last supported release
Week n + 1: Removal

NobbZ · November 13, 2022, 9:43am

It was said to me, that some newer nvidia cards did not work with the stable release due to the older kernel. And as I am not using nvidia myself, I can not verify this but have to trust in what they say.

If this is the suggestion, why does zfs.latestSupportedKernel point to a non LTS? Why do all the sources suggest to use exactly this kernel package?

This doesn’t seem to be a viable grace period. Users complain longer than that for their respective dependency to catch up.

Atemu · November 13, 2022, 3:30pm

That was probably Nouveau then.

Why other sources suggest to use that kernel, I cannot answer but it wouldn’t make sense for that to point to the latest LTS kernel firstly because of its name and secondly because ZFS has always supported the latest LTS kernel, so might as well just use linuxPackages.zfs; you don’t need a special pointer for that.

I’m not quite sure I understand what write here. Could you rephrase that?

NobbZ · November 13, 2022, 3:44pm

After kernel X has been removed, it takes longer than 2 weeks for X+1 to work with the users setup.

And X shouldn’t be removed as long as X+n does not have a sufficient replacement for X.

Also I have probably misunderstood Nvidia stuff, though at least one user in the discord has problems with their setup under 5.15.

We asked them for a more detailed report here, or describe their problem better in the discord. Sadly the user has not yet reported back here or there, but I really hope that we can get some first hand experience here.

TLATER · November 13, 2022, 6:25pm

As someone using nvidia and having had similar problems before, the problem is usually that hardware besides the nvidia GPU needs a more recent kernel version to function. Around 2020 I had a fairly new nvidia GPU and a very new motherboard; this required the latest linux kernel, which was incompatible with the nvidia driver on stable at the time.

For as long as those were incompatible there was no option besides running the latest kernel and trying to get nixpkgs to update the nvidia driver.

For nvidia specifically, part of the problem is that newer, compatible driver versions are almost always already available, but the policy on whether they can be backported isn’t clear. Maybe we need an nvidia-latest that ignores rules around breaking changes as well? I could see myself using that even without using the latest linux kernel.

toastal · November 14, 2022, 7:08am

Not yet mentioned, but you can also move to zfsUnstable @ 2.1.7 which is compatible with with Linux stable @ 6.0.x (provided you’re using NixOS Unstable). I got the 6.0.x compatibility merged before deprecation just so folks like you and I can stay current (my new laptop is not compatible with the LTS and wasn’t very stable til 5.19.x). When this happens, you can open a merge request like my latest unreviewed one to keep the ball rolling for zfsUnstable ready for the latest kernel. Their staging branch is tested on the latest kernel.

It a bit of a shame (but understandable) that ZFS team is just a little behind the curve for the Linux kernel (and you can see issues raised a lot from even more aggressive distros like Fedora that deprecates kernel almost immediately for security), but ultimately I agree with the NixOS stance of EOLing ELO’d kernels and actually keeping with what ZFS says is supported max version. Eventually my hardware needs will be met in the LTS kernel and when that day comes it wouldn’t be as big of a deal to be on LTS vs. stable (my GPU is AMD so I have less issues with video).