MS-A1/AMD 8700G iGPU - AMDGPU page fault errors & temp monitoring

Hey folks, I have recently built a system on Minisforum’s new MS-A1 chassis, with an 8700G as the APU. I installed NixOS 24.05 and most stuff is working, but I have a few problems that have popped up that I would like some help with. I was hoping to use this machine as a workstation, plus playing a few light games on it, so I installed Steam and a couple of games to try things out, which is where things started to be less than ideal. So far I have 3 main problems that I’d like to see if anyone can help me out with:

  1. Some games (MTG:A) work fine with no problems, but when playing Path of Achra from Steam using Proton 9 or experimental, the game (actually everything but the cursor) will eventually hang and then recover. journalctl -f shows amdgpu page fault errors when that happens that look like this:
amdgpu: [gfxhub] page fault 
amdgpu:   in page starting at address 0x000000003f800000 from client 10
amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00501431
amdgpu:          Faulty UTCL2 client ID: SQC (data) (0xa)
amdgpu:          MORE_FAULTS: 0x1
amdgpu:          WALKER_ERROR: 0x0
amdgpu:          PERMISSION_FAULTS: 0x3
amdgpu:          MAPPING_ERROR: 0x0
amdgpu:          RW: 0x0
  1. I also noticed that when running picom -b --backend glx I will get continual “present flip failed” messages - many per second, in sets of 3:
Aug 21 14:46:20 msa1 xserver-wrapper[2297]: (WW) AMDGPU(0): flip queue failed: Invalid argument
Aug 21 14:46:20 msa1 xserver-wrapper[2297]: (WW) AMDGPU(0): Page flip failed: Invalid argument
Aug 21 14:46:20 msa1 xserver-wrapper[2297]: (EE) AMDGPU(0): present flip failed
  1. Finally, when investigating this I found that lm-sensors doesn’t seem to detect the CPU temperature sensors for this machine, even after running sensors-detect. I am using the nixos-hardware flake with common-cpu-amd, common-cpu-amd-pstate and common-cpu-amd-zenpower and I’m not sure what other stuff I should do to be able to see the sensor data from the CPU. I do see GPU “edge” temperature readings, though.

So, I have a few questions here. In doing research about issue #1, I found several people on Arch or other forums who said that simply switching to the amdgpu-pro drivers fixed similar problems for them, but it seems like trying to do that in NixOS 24.05 pushes my kernel version way back - to some 5.x kernel. Is there a good way to use the amdgpu-pro driver (or is there some other driver I should use instead) that will let me continue to use a modern kernel? Or should I just bite the bullet and let my yearning for modernity go?

Next, does anyone know what I need to do to get temperature sensors working for the 8700G? I think that this issue might be temperature related, but I really can’t tell since I can’t get a temperature reading out of the CPU. That might help me figure out if I have a defective cooling unit in my MS-A1 or something.

Finally, I’m not sure what to do about issue #2. It’s not the most important thing, as picom works fine and doesn’t produce errors when run without --backend glx - but xsecurelock has issues instead, which is what led me to use that setting. I guess I could just try a different screen locker but xsecurelock is my preferred option. Does anyone have any alternatives to picom I could try? compton is old which is why picom exists and everything else seems like it’s not intended for real use or is just a wayland compositor. I use qtile so I guess I could try to figure out how to use qtile in wayland, but my initial attempt at that didn’t go very well so I’m not inclined to dig into that unless I have to.

Thanks in advance for any advice you may have for me!

Welcome to NixOS. I think it would be more practical and pleasant of you to post three distinct threads for your issues.

Having said that, is this with nixpkgs-unstable? I would try that first.