Aggressive Kernel removal on EOL in NixOS

Doing bugfixes downstream is a completely different work, really. Not a NixPkgs maintainer workload but a kernel developer workload. So to do it properly, I believe you also need to know kernel code internals, etc. It’s not really an option discussed in this topic.

What some propose is to just keep the versions as they were, with all the bugs, e.g. hidden behind some flag. And that would be technically relatively easy.

4 Likes

I agree, of course.

This is exactly what I am proposing as well.

2 Likes

Very little. The common kernel config already has conditions for that kernel if if was ever in Nixpkgs. New config flags are rare and need to be guarded behind a version anyways since we have older LTS kernels.

If the latest LTS kernel (the newest kernel you should use with ZFS) doesn’t support the rest of your hardware then yes, you should not be using ZFS.

I don’t know why you’d remove ZFS from Nixpkgs but that ain’t happening.

I don’t get your twisted logic here. Why should we remove the kernel modules from all of our kernels just because they might not support the latest kernel?

They simply get marked as broken on the newest kernel and that’s that.

No. The point of that option is to be unstable and always point at a different version; the latest available supported version. If you wanted a stable kernel version, you should reference it explicitly (i.e. linuxPackages_x_y).

Um, no? That’s an attribute of zfs, it has absolutely nothing to do with the Nvidia driver.

The problem is that if every tutorial you can find tells you to enable experimental features, those features can’t be that experimental, right?

Also, you probably don’t remember you set that experimental setting or even know which parts are experimental if you’ve set that option from the get-go like a guide told you to.

The problem is that we don’t want users to unknowingly have insecure setups. Removing EOL’d kernels entirely is the best way to ensure that.

Users can still knowingly have insecure setups by simply pinning an older version of Nixpkgs for their kernel. That’s entirely up to them and needs no involvement from upstream Nixpkgs.

4 Likes

Agree with most of what you said, but:

I dunno why you think you should only use LTS kernels with ZFS. Granted, sometimes latestCompatibleLinuxPackages points to an LTS kernel, but ZFS often does support newer kernels than that available in nixpkgs, and I don’t see a reason not to use that.

1 Like

Honestly, this thread, and the fact that the version can go backwards, is a reason.

Edit: to clarify, … a reason for a user to choose to stay on LTS, not a reason for the system to force that as the only available choice.

1 Like

Because that’s the only stable version. With latestCompatibleLinuxPackages, half of the time you’re going to jump back to that version anyways because the newest isn’t supported by ZFS and the second newest is EOL. Might as well stay on that version the entire time.

When I still used ZFS, there has been a situation where I once effectively skipped an entire kernel version because ZFS was lagging behind so far.

Using _latest with OOT modules makes no sense to me.


Another avenue I’d love to see explored in this topic is whether we could port distro kernels. For example, Ubuntu 22.10 will support 5.19 for quite a while and we could simply package their sources with their continued backports and have a “secure” 5.19 kernel.

1 Like

I am not sure if we want to import their LTS patchery that is likely wrapped in some debian packaging. Then we also would need to deal with problems that are not problems on Ubuntu because of outdated libraries or patches that work on Ubuntu but break for us.

Hello,
first of all a big thanks to all NixOS maintainers and contributors!

I ended up here because I got stuck. Kernel v6.0 causes crashes and v5.15 does not work at all. At the moment only v5.19 is working for me.

I understand that there are security concerns when old kernels are not removed, but in my case the security risk is now even worse because not only is the kernel outdated, but so is all the other software.
Wouldn’t it be better to just keep the kernel out of date and not the whole channel? Or do I see this wrong?

2 Likes

The security concern of keeping EOL kernels doesn’t seem very severe to me. We have other kernel variants including Zen, Xanmod, and linux-hardened are often behind the official kernel upstream. We already have the meta.knownVulnerabilities attribute that seems to be used for both specific CVEs and EOL status. Providing a nixpkgs flag to allow/block these seems useful in any case.

However, the maintenance overhead and responsibility for addressing possible CVEs is the obvious problem. I think the problem a bit too niche for any Nix(OS) specific project to solve.

I had a similar thought. The kernel should have a stable enough interface that I would’t expect any inter-distro issues but I don’t know for sure. IMO a better solution would be a distro agnostic community maintained kernel with backports to provide ~6 months of support while OOT kernel projects catchup.

I think a distro agnostic project that provided a Nix overlay, Archlinux PKGBUILDs, Gentoo overlay, etc. could have enough interest to solve this problem. The main OOT projects I can think of are ZFS, AWS ENA drivers, and Nvidia drivers. Knowing how many users use each of these projects and what combinations are common would be a good place to start.

I’d really appreciate if you could take a moment to think about people in situations similar to mine, and suggest a proper way forward, as from your post it sounds like you’re implying that one should either

I’m sorry to say, but the unfortunate reality is that there appears no officially supported way by upstream projects to use a Linux kernel in conjunction with ZFS in conjunction with nvidia from what I’ve read in this thread.

From my simplistic point of view, the principled (*) solution is to either completely eliminate ZFS and nVidia support from Nixpkgs entirely or

Why though? Both are working perfectly fine, the combination is the problem. I mean there is a lot of software packaged on NixOS cannot be operated at the same time on the same machine (first example that comes to my mind is running both libvirt and VirtualBox machines).

Every user is making a choice with how they ’re managing their system. If something was marked as “experimental”, “insecure” or “broken”, the users have willingly taken the risk and have no one to blame if something goes wrong but themselves

First of all, let me give you some context to why I brought this up: what other people will hear is “$person was using NixOS and now they’re p0wned” and you can’t really blame them for that. NixOS has a very powerful extension mechanism that most other distros don’t have, so to them it’ll sound like their distro provided them with insecure software.

Regarding your point in general: of course, it’s your system and I’m not preventing from being free to do whatever you want with it. You’re free to create an overlay and use 5.19. In fact, that’s what I’d probably do as well. However, using EOLed kernels should not be advertised by NixOS in any way IMHO.

I mean, most software has safeguards that technically restricts you in your freedom, but still, people prefer Rust over C and don’t use root for everything and that is happening for very good reasons!

Don’t get me wrong, it’s not that I don’t care. I’m also using ZFS pretty heavily on my own and also for my employer’s infrastructure, so I know the struggle to a certain degree. But as long as we don’t have the resources to maintain something on our own without assistance from the software upstreams, I don’t think we can do much more than providing the software that’s actually supported and nothing more.

6 Likes

Part of the reason why the situation sucks, is because latestCompatibleLinuxPackages constantly goes up and down as new ZFS versions get released and as kernel versions go EOL.

Where is this specifically documented? One may infer that by looking at the git history, but I certainly don’t see it mentioned anywhere officially, e.g. if you follow the OpenZFS/NixOS tutorial one may come to the conclusion that this is the correct way to have stable system. “Latest compatible” certainly implies stability to me. An option named something like bleedingEdgeUnstableMayBeDowngradedCompatibleLinuxPackages would be more in line with there way you describe the the stability guarantees if this option.

Then why is latestCompatibleLinuxPackages changed to point to anything but an LTS kernel in the first place? If this was done for only for zfsUnstable or the nixos-unstable branch (and not backported to the current stable one) I wouldn’t see any trouble with that. But when this is done on the current stable nixos branch it looks like setting a ticking bomb for users of new hardware (as at some point their non-LTS kernel will go EOL and the kernel would be downgraded to a version that likely doesn’t support their hardware).

There is a difference between enabling a safe looking option like latestCompatibleLinuxPackages, that is recommended by both:

and the likes of “insecure”, “broken”, “unfree”.

I don’t know if that’s the case, but use of “insecure” or “broken” packages can be made to aways print out warnings at nix eval time, to reduce the likelihood that someone would unknowingly forget them on for a long time, so in general, one is far more likely to become a victim of a denial of service (when the NixOS stable branch downgrades the kernel) rather than an actual security breach caused by use of a recent EOL’d kernel, that was marked as insecure in nixpkgs, but still kept there.

2 Likes

Because latest is literally in the name. If you want to use an LTS then use an LTS. latestCompatibileLinuxPackages is extremely self-explanatory.

As it is not always “latest”, but jumps back in time, it is at least not what one would expect from this “self-explanation”. latestNotEOLCompatibileLinuxPackages might suite better if it shall properly self-explain itself.

1 Like

NotEOL implies the existence of EOL kernels in nixpkgs, which of course is not the case. I think the word Compatible does about as much heavy lifting as can reasonably be put into a name for this.

There’s nothing we can do about that. If ZFS still doesn’t support the latest upstream kernel when rc1 has been out for months, that’s their problem.

That line should be removed from the guide. It’s not something that should be recommended by default.

It’s a convenience option advanced users who somehow benefit from it, not something to set and forget.

Also, “latest” never implies stability.

That’s an enterprise-ready name if I’ve ever seen one.

We don’t do that here.

Already answered:

If you want the latest LTS kernel (which ZFS is pretty much always compatible with), use linuxPackages which, incidentally, is the default.

If you require a specific kernel version reference it explicitly. i.e. linuxPackages_5_19.

When upstream drops support for that kernel, there’s nothing we can do about that. We’ll remove it, you get an eval error and you need to figure out yourself how you want to handle that (i.e. get the kernel from an older Nixpkgs or downgrade).

Yikes. Yeah, that needs to be changed. That sets the user up for a potentially terrible experience. There’s a reason the default linuxPackages is the latest LTS kernel rather than the latest stable kernel.

I’d be fine with that. A user will see those warnings on rebuild and any sane auto-build setup will error out when any warning is thrown during eval.

If that were the case, I wouldn’t mind keeping EOL kernels around.

We could even revive all the other EOL kernels; someone might have a use-case for one of them and it doesn’t really cost us anything. The files are tiny and the names won’t conflict.

If you want latest latest which always points at the latest kernel and doesn’t downgrade, you should use linuxPackages_latest.

latestCompatibileLinuxPackages are the latest linuxPackages that are compatible with ZFS. That may or may not be the latest linuxPackages.
If it always were the latest linuxPackages, there’d be no need for that attribute since you could just use linuxPackages_latest instead.

2 Likes

There’s plenty we can do. latestCompatibleLinuxPackages is defined by NixOS (not OpenZFS), so we have the option of making it useful for users instead of leaving it as a honey trap for the unsuspecting. Here a few simple ideas from top of my head that even if they sound stupid would be improvement over the status quo:

  • Make it point to the latest LTS kernel that is compatible the latest (stable) ZFS.
  • If the kernel version this option points to goes EOL, simply keep the kernel version, instead of downgrading to the last LTS version
  • If the kernel version it points to goes EOL remove the version, but don’t downgrade to an older one. This has the advantage that it alerts users of the problem at nix eval time, instead of delaying that until they reboot.
  • Add a new option e.g. latestCompatibleLTSLinuxPackages and rename this option to e.g. latestCompatibleLinuxPackagesInclEOL, latestCompatibleLinuxPackagesOrLastTLS or something similar that is more explicit

You keep repeating that, but I haven’t yet seen any official documentation backing your point of view. As far as I could find, the only mention in the official docs regarding this option is in the 21.11 release notes, which say:

Zfs: latestCompatibleLinuxPackages is now exported on the zfs package. One can use boot.kernelPackages = config.boot.zfs.package.latestCompatibleLinuxPackages; to always track the latest compatible kernel with a given version of zfs.

The wording certainly make it seem like an excellent safe and default choice for most NixOS + ZFS users.

If it wasn’t clear, I was joking. On a serious note, I’ve noticed that most times when one needs to use such “enterprise-ready” name that’s usually a sign that there’s something wrong with the design. (On the contrary: good designs are often easy to explain.)

No, I don’t see how this answers my question. Please note that I didn’t ask for a workaround, but I asked because I am bothered by this huge contradiction in the logic:

  • We do not want to keep EOL’d kernels in Nixpkgs, as this could potentially make us look bad if someone gets “p0wned”
  • We claim to support ZFS out of the box in NixOS
  • Most (all?) (semi-)official ZFS on NixOS documentation invites the user to use the latestCompatibleLinuxPackages with no warnings that this option is:
    • “unstable”
    • “not something that should be recommended by default”
    • “a convenience option advanced users”
    • “not something to set and forget”
    • “sets the user up for a potentially terrible experience”
  • Hard breaking changes are made to the values of this option on the current nixos stable branch that affect real users, specifically making new machines unusable after nixos-rebuild + reboot (good thing we have generations and easy rollbacks)

So, I’ll repeat my question:
If we claim to deeply care about the security and stability of NixOS users’ machines, then why latestCompatibleLinuxPackages does not point to something sane like latest LTS kernel that ZFS is compatible with (assuming, for the sake of the argument, that all other options I proposed above are bad in some way)?

In practice, what ends up happening for many users is that they postpone nixos-rebuild switch for some undetermined amount of time, until they can upgrade to a newer version of the nixos-<current-stable-release> branch where latestCompatibleLinuxPackages supports their hardware, leaving their system more vulnerable as not only they have an oudated kernel, but also every other part that normally receives updates during regular nixos-rebuild switch.

As it stands, one can infer that the answer is: “because we also secretly want to make users miserable” :smiling_imp:

IIRC (though I may be wrong), the latestCompatibleLinuxPackages option was introduced specifically because ZFS was not compatible with the default linuxPackages.

IMO, hardcoding a specific version is “not something to set and forget”, so I don’t think one should use this approach, unless they have no other choice. I think that most users want to be either on the latest LTS, or on the latest stable kernel (for hardware enablement reasons).

If only that was guaranteed to be compatible with ZFS…

I’d really appreciate it if we kept away from this kind of rhetoric. Joking or not, it’s harmful to suggest anyone’s being malicious.

Because it’s practically not documented at all. Let’s be clear: it’s not reasonable to assume that an undocumented feature is in any way stable or recommended.

We really truly cannot do anything about it if OpenZFS does not support kernels that are compatible with your hardware. latestCompatibleLinuxPackages is there in case you really don’t care what the latest compatible version actually is. If you do care what it is, it’s on you to select a kernel version that makes sense for you. When OpenZFS doesn’t support that version, well that’s on them. If you just want a default stable LTS, then just use the default kernel packages.

Is it not? ZFS is included in the NixOS installer ISO. That wouldn’t be possible if ZFS wasn’t compatible with the default linuxPackages.

3 Likes

Anyways… I have opened a PR that backports ZFS 2.1.7 to the nixos-22.11 branch:
https://github.com/NixOS/nixpkgs/pull/204659

Can someone review it, or point me to someone I can ask for a review? (I’m new to contributing to Nixpkgs.)

With this PR, ZFS users that want to stay on the current stable NixOS branch have an upgrade path, after latestCompatibleLinuxPackages was downgraded from 5.19 to 5.15 on nixos-22.05, as after this PR latestCompatibleLinuxPackages becomes 6.0.

Sorry for that, I was joking as I don’t believe any maintainer/contributor was/is being malicious. That said, downgrading the kernel (or any other software) under one’s feet can be harmful, even if it’s unintentional

If you’re talking about NixOS Search, I assumed that this option can’t be documented there due to technical limitations (as far as I know, only module options can be documented, while attributes of derivation functions can’t (including passthrough ones?)), while, as far as I can see, neither the NixOS, nor Nixpkgs manuals contain any documentation about ZFS, so latestCompatibleLinuxPackages did not look left out on purpose. (If ZFS is not documented in the manuals does it mean that it is not supported?)

I don’t think, nor I see how I’m implying that we (even the broader Nix community) can do anything about that. The one thing that that we can do something about though, is our Nix code and specifically latestCompatibleLinuxPackages.

In general, does the average user really need to care what the version is, as long they:

  • know that the version they started with supports their system
  • the version is monotonically increasing

?

The thing is that even if I wasn’t using ZFS, I would still think that using latestCompatibleLinuxPackages is a good idea because:

  • I want to use the latest Linux kernel for hardware enablement purposes
  • I assume this option only points to officially released versions (not e.g. RCs)
  • In general, OpenZFS always adds support for new kernels with some noticeable delay after they are released, which may be slightly disadvantageous for hw support, but on the other hand it likely also means that the latest supported kernel x.y has already received some bug fixes its initial release (x.y.0 → x.y.z, where z>0)

Hmm, that’s a good point… I have to check how the installer iso derivation is defineed.

Then you want… the default kernel branch. That switches to the latest longterm* branch, once NixOS community considers it good enough. If you go beyond that, you may not be able to ensure the conditions, getting into a “dead end” situation.

* “LTS” is a different term, presumably from Ubuntu