I’m not really expecting much help necessarily by posting this. It’s more of a “state of the union” style post to document my own experiences with the AMD 9070 XT GPU as I haven’t seen a whole lot of NixOS 9070 users thus far.
The TLDR synopsis here at the top is “expect instability”. I’ll skip the really long rant here about being an Nvidia user since the TNT 2 days on Linux and the fact that the Nvidia driver stack has always been far more stable than the former fglrx or current amdgpu stack has ever been.
However, given the general insanity of the 12VHPWR connectors on the 50 series GeForce cards, I simply can’t justify putting something like that in my house. Having said that, part of the reason I’m posting this here now is because I ran into a somewhat similarly concerning issue with the 9070 also.
For the NixOS specific side of things, I’m not doing anything odd or crazy on this system:
boot = {
initrd.kernelModules = [ "amdgpu" ];
kernelPackages = pkgs.linuxPackages_6_14;
kernelParams = [
"amdgpu.ppfeaturemask=0xfffd3fff"
"split_lock_detect=off"
];
};
graphics = {
enable = true;
}
I think that’s about it other than using i3 via services.displayManager.defaultSession = "xsession"
and having Steam installed via programs.steam.enable = true
.
The good news is that things are great, at least for a day or so. Games in Steam run beautifully and all my various hardware accelerated video encoding/decoding needs work great. I initially was running this 9070 XT in a system built around an AMD 5800X, but in part due to the instability, I moved that system back to my older 3070 Ti and built a dedicated 9950X3D box. So even without the Jellyfin transcoding load from that older system (my NAS/Jellyfin box), the experience with the newer system is the same. Everything works for about a day, and then eventually, the system crashes.
The crash always seems to start with my mouse input starting to stagger for a few seconds like the system is under extreme load, followed eventually by a full graphical lockup or even the screen going blank and eventually the video card disabling the display port connection to the monitor entirely (as the monitor ends up powering off). This is partly why I’ve settled on the current kernel CLI parameters above as without those, the system seems to crash even faster. Additionally, I’ve had to disable the hardware video acceleration features in Steam, otherwise the UI likes to freeze and stop updating itself when I flip away to another i3 desktop and back to Steam. I can force the UI to start updating again if I open another window on the same desktop causing the Steam window itself to resize. But this is required every single time I flip away and back. Admittedly, this might be specific to i3 (I am using picom for whatever that’s worth).
I should also mention I’m using ZFS everywhere for everything. This has caused me zero problems with my Nvidia cards in the past. But I obviously can’t assume that necessarily for the AMD driver stack.
Now, to be fair, I saw very similar types of problems with a 6800 XT previously which ended up going to my partner’s Windows machine as a result. And even my Lenovo laptop with a 6900 HS seems to like to lock up within roughly the same time frame usually. So it seems like the amdgpu driver stack is questionable all around, regardless of hardware generation to some degree. About the only amdgpu based system I don’t have this particular experience with is my Steam Deck, although it’s usually not running for more than a few hours any given day before I shut it down again.
The concerning part in all of this was that yesterday this happened on my 9950X3D system and after the display blanked and powered down, some of my fans spun up to ludicrous levels in my system like something was trying to massively overheat. I ended up holding down the power button to kill power entirely as there was so much noise and vibration, it sounded like something was about to damage itself. Since I didn’t feel like I had time to pull the case apart, I’m not positive it was the GPU fans trying to rapidly disassemble themselves. It could have just as easily been the CPU or case fans if the CPU itself was the one trying to eat itself alive.
Anyway, I know 6.15 has even more amdgpu fixes which I will try just as soon as ZFS officially supports it. But currently running mesa-25.1.1 and kernel 6.14.9 (as of yesterday when this happened anyway; on 6.14.10 now), this card isn’t what I’d call stable by any means. It will probably be a frustrating experience for folks expecting things to “just work”.
I’d love to hear from other NixOS users, especially if you’re having a nearly 100% stable experience with this GPU to see how things differ potentially. But I’m also assuming this is all just par for the course for the amdgpu driver stack to some extent since I’ve never seen it be a paragon of stability in any of its various incarnations. I am hoping it at least stabilizes to the point where I don’t have to assume my system will crash about once a day or so if I avoid shutting it down.