So, infamously, using an nvidia GPU on NixOS is wraught with pitfalls. I’d guesstimate a good 10% of posts here are from someone with an nvidia-related problem.
Part of this is simply because the nvidia driver is the nvidia driver, but we’re not making the situation much better. In typical NixOS fashion, there are lots of options, which is great (arguably…); Except a lot of them are poorly maintained or at least misleading. Some of my favorite pet peeves:
hardware.nvidia.prime- This entire submodule is practically
nonfunctionalnonfunctional for DE users as of 25.11; It (silently) does not work on any wayland compositor, which is both of the major DEs at this point. Whether or not it is needed on them is a different matter, as in theory wayland compositors have auto-detection and native multi-gpu support (though it’s unclear to me how well this is exposed to users, given that all documentation of this is deep inside the source code and no software seems to support annvidia-offloadlike feature), but that does not change the fact that most of the module doesn’t actually work in practice, making users think they have something configured that they do not.
- This entire submodule is practically
hardware.nvidia.videoAcceleration- Defaults to
true, even though this is entirely third party to the nvidia driver and requires extensive additional configuration to actually work, so we’re just installing additional third party software that isn’t even functional by default.
- Defaults to
hardware.nvidia.powerManagement.finegrained- Is for some reason (historic misunderstanding of the docs?) a sub-option of
hardware.nvidia.powerManagment, even though the two options have nothing whatsoever to do with one another (except for having a similar name), one of the two is experimental and the other is (normally) enabled by default by the driver. As a result, the module by default overrides the sensible default set by the driver, and fails to install the required udev rules even though they are inert if the driver feature is flipped off.
- Is for some reason (historic misunderstanding of the docs?) a sub-option of
boot.kernelPackages- Not an nvidia option, but it seems like 90% of newbies stumble upon this option and think “oh, hey, cool, let me set this to
_latest” and then get confused when two months down the line their update breaks with a build error involving nvidia; I appreciate the difficulty with supporting third party kernel modules, but we should be clearly projecting to users that they’re on their own if they do this (and that their computer may spontaneously burst into flame; Using a third party kernel module against a kernel source it’s not designed for strikes me as quite risky).
- Not an nvidia option, but it seems like 90% of newbies stumble upon this option and think “oh, hey, cool, let me set this to
This isn’t to put the spotlight on anyone in particular who may or may not have contributed to the module; the driver is an absolute PITA, and without testing and very thorough reading of the docs of both nvidia and the various compositors (as well as source code here and there) it’s bloody hard to figure this stuff out; it also changes quite significantly over time.
But it is problematic that what NixOS sets up by default is quite frankly broken. It paints a picture of very spotty maintenance to me that prime still doesn’t work on wayland in 25.11. This is probably accurate; I don’t think many people have experience with all the use cases for nvidia GPUs, let alone all the different pieces of hardware to test those use cases with. Stepping up to continuously maintain the module isn’t a light responsibility (more on this later…).
It also definitely isn’t only NixOS’ fault; different nvidia hardware setups require very different configuration. Which leads to the other coin of the configuration overload problem: People can’t just cargo cult a config and get it right. Different GPU architectures require different options. What works for one GPU will not work for another, and there is no way to see at a glance which GPU a specific configuration applies to. Prime setups confuse things even more.
Still, nvidia has by a wide margin the largest market share in GPUs (~90% today, and they have dominated for the past 10-20 years), and hence we have many users with computers that have nvidia GPUs in them. The relative and absolute numbers are growing, too. I doubt the influx of nvidia problems will slow for at least another 3-5 years, and they’re bound to get quite bad for a while in the near future with the EOL of pre-turing GPUs.
So, what can we do about this?
For one, obviously we always need better maintenance for everything. I’d like to step up, but the situation isn’t great - I can’t step up for the use cases I have experience with (post-Turing desktop use) and try to improve the situation without potentially stepping on CUDA/datacenter user’s toes. I have no way to assert that any changes I’d make to the module don’t break those use cases.
Furthermore, tweaking defaults is unlikely to actually fix anything for the large majority of users who’ve copied a version of the nvidia config from a wiki or reddit post from 4 years ago; even if it did, it’s likely to break configuration for people who did not, but use a pre-turing (and sometimes pre-ampere) nvidia GPU.
So what I’d really like is to just… Throw out the entire thing and start from scratch. In my mind, a good solution would be to introduce a new hardware.nvidia.architecture option, which is set to an enum of nvidia architectures. This would condense all the mess into a simple abstraction that users can actually grok without reading the whole nvidia readme, and allow us to deal with the maintenance in a sane way that actually matches upstream recommendations, as well as simplify the whole legacy driver problem. This would necessitate deprecating all the existing nvidia options, though, since those will remain wrong for a good chunk of users simply because they are already set.
This comes from a position of ignorance around all the non-desktop use cases though. It’s also the type of thing that feels like it should be in the domain of nixos-hardware or similar. It feels like it’d be far too bold of me to sit down and write all of that, then just drop a PR that throws out something used - statistically - by 90% of NixOS users.
So I’m writing this post. What do other people think? Should the hardware.nvidia module just be jettisoned from NixOS entirely, since it’s obviously broken and we can’t really fix the historic issues due to cargo culting, and arguably this module is overstepping the domain of NixOS anyway? Deprecated and replaced in a version or two like what was done e.g. with nixos-rebuild-ng? Left as-is, to generate support requests forever?
Also, is anyone actually actively maintaining the module?