hi every1!!! noob here…
ill try to be very short and very clear:
nixOS is installed on a steam deck LCD and i love it, btw, the system i mean, the device is a PITA, but thats besides the point. so anyway, i keep having this page fault
everytime i… simply exist - open wezterm
, open mpv
, open librewolf
- ANYTHING that uses GPU, except, INTERESTINGLY, games! the games dont crash, in fact, they are the only safe environment/condition that i can guarantee my deck isnt gonna crash. by the way, when i say crash, i mean "first, the rendered frame and menu elements will freeze and/or disappear (rarely corrupt), then the screen will soon dim to about 75% (EDIT: hyprland’s unresponsive window dimming feature), then my mosue will freeze and finally, it will kick me out from the session back into the login screen (tuigreet
), but sometimes it cant exit the session and infinitely waits for some PID to finish (which doesnt exist anymore…), OR it doesnt actually exit the session at all, and resets its graphics successfully (hyprland)… its always a coin flip, a random number generator. there is no way to force it to happen, IT JUST HAPPENS when it want to
so i had a look at dmesg
…
[ 5133.879897] amdgpu 0000:04:00.0: amdgpu: Dumping IP State
[ 5133.880808] amdgpu 0000:04:00.0: amdgpu: Dumping IP State Completed
[ 5133.880899] amdgpu 0000:04:00.0: amdgpu: ring sdma0 timeout, signaled seq=3810, emitted seq=3812
[ 5133.880905] amdgpu 0000:04:00.0: amdgpu: Starting sdma0 ring reset
[ 5134.078180] amdgpu 0000:04:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:24 vmid:5 pasid:32778)
[ 5134.078193] amdgpu 0000:04:00.0: amdgpu: in process .librewolf-wrap pid 11512 thread .librewolf:cs0 pid 11606
[ 5134.078199] amdgpu 0000:04:00.0: amdgpu: in page starting at address 0x0000800001600000 from client 0x1b (UTCL2)
[ 5134.078204] amdgpu 0000:04:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00501430
[ 5134.078208] amdgpu 0000:04:00.0: amdgpu: Faulty UTCL2 client ID: SQC (data) (0xa)
[ 5134.078212] amdgpu 0000:04:00.0: amdgpu: MORE_FAULTS: 0x0
[ 5134.078216] amdgpu 0000:04:00.0: amdgpu: WALKER_ERROR: 0x0
[ 5134.078219] amdgpu 0000:04:00.0: amdgpu: PERMISSION_FAULTS: 0x3
[ 5134.078222] amdgpu 0000:04:00.0: amdgpu: MAPPING_ERROR: 0x0
[ 5134.078225] amdgpu 0000:04:00.0: amdgpu: RW: 0x0
[ 5134.078236] amdgpu 0000:04:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:40 vmid:5 pasid:32778)
[ 5134.078241] amdgpu 0000:04:00.0: amdgpu: in process .librewolf-wrap pid 11512 thread .librewolf:cs0 pid 11606
[ 5134.078245] amdgpu 0000:04:00.0: amdgpu: in page starting at address 0x000080010a060000 from client 0x1b (UTCL2)
[ 5134.078249] amdgpu 0000:04:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00541051
[ 5134.078252] amdgpu 0000:04:00.0: amdgpu: Faulty UTCL2 client ID: TCP (0x8)
[ 5134.078255] amdgpu 0000:04:00.0: amdgpu: MORE_FAULTS: 0x1
[ 5134.078258] amdgpu 0000:04:00.0: amdgpu: WALKER_ERROR: 0x0
[ 5134.078261] amdgpu 0000:04:00.0: amdgpu: PERMISSION_FAULTS: 0x5
[ 5134.078265] amdgpu 0000:04:00.0: amdgpu: MAPPING_ERROR: 0x0
[ 5134.078268] amdgpu 0000:04:00.0: amdgpu: RW: 0x1
[ 5134.078272] amdgpu 0000:04:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:40 vmid:5 pasid:32778)
[ 5134.078276] amdgpu 0000:04:00.0: amdgpu: in process .librewolf-wrap pid 11512 thread .librewolf:cs0 pid 11606
[ 5134.078280] amdgpu 0000:04:00.0: amdgpu: in page starting at address 0x000080010a061000 from client 0x1b (UTCL2)
[ 5134.078285] amdgpu 0000:04:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:40 vmid:5 pasid:32778)
[ 5134.078289] amdgpu 0000:04:00.0: amdgpu: in process .librewolf-wrap pid 11512 thread .librewolf:cs0 pid 11606
[ 5134.078293] amdgpu 0000:04:00.0: amdgpu: in page starting at address 0x000080010a061000 from client 0x1b (UTCL2)
[ 5134.078298] amdgpu 0000:04:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:40 vmid:5 pasid:32778)
[ 5134.078302] amdgpu 0000:04:00.0: amdgpu: in process .librewolf-wrap pid 11512 thread .librewolf:cs0 pid 11606
[ 5134.078306] amdgpu 0000:04:00.0: amdgpu: in page starting at address 0x000080010a060000 from client 0x1b (UTCL2)
[ 5134.078311] amdgpu 0000:04:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:40 vmid:5 pasid:32778)
[ 5134.078315] amdgpu 0000:04:00.0: amdgpu: in process .librewolf-wrap pid 11512 thread .librewolf:cs0 pid 11606
[ 5134.078318] amdgpu 0000:04:00.0: amdgpu: in page starting at address 0x000080010a060000 from client 0x1b (UTCL2)
[ 5134.078323] amdgpu 0000:04:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:40 vmid:5 pasid:32778)
[ 5134.078327] amdgpu 0000:04:00.0: amdgpu: in process .librewolf-wrap pid 11512 thread .librewolf:cs0 pid 11606
[ 5134.078331] amdgpu 0000:04:00.0: amdgpu: in page starting at address 0x000080010a061000 from client 0x1b (UTCL2)
[ 5134.078336] amdgpu 0000:04:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:40 vmid:5 pasid:32778)
[ 5134.078340] amdgpu 0000:04:00.0: amdgpu: in process .librewolf-wrap pid 11512 thread .librewolf:cs0 pid 11606
[ 5134.078343] amdgpu 0000:04:00.0: amdgpu: in page starting at address 0x000080010a061000 from client 0x1b (UTCL2)
[ 5134.078348] amdgpu 0000:04:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:40 vmid:5 pasid:32778)
[ 5134.078352] amdgpu 0000:04:00.0: amdgpu: in process .librewolf-wrap pid 11512 thread .librewolf:cs0 pid 11606
[ 5134.078356] amdgpu 0000:04:00.0: amdgpu: in page starting at address 0x000080010a062000 from client 0x1b (UTCL2)
[ 5134.078361] amdgpu 0000:04:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:40 vmid:5 pasid:32778)
[ 5134.078365] amdgpu 0000:04:00.0: amdgpu: in process .librewolf-wrap pid 11512 thread .librewolf:cs0 pid 11606
[ 5134.078369] amdgpu 0000:04:00.0: amdgpu: in page starting at address 0x000080010a062000 from client 0x1b (UTCL2)
[ 5144.119856] amdgpu 0000:04:00.0: amdgpu: Dumping IP State
[ 5144.120877] amdgpu 0000:04:00.0: amdgpu: Dumping IP State Completed
[ 5144.130894] amdgpu 0000:04:00.0: amdgpu: ring gfx_0.1.0 timeout, signaled seq=49196, emitted seq=49198
[ 5144.130899] amdgpu 0000:04:00.0: amdgpu: Process information: process .Hyprland-wrapp pid 1599 thread Hyprland:cs0 pid 1607
[ 5144.130903] amdgpu 0000:04:00.0: amdgpu: Starting gfx_0.1.0 ring reset
[ 5144.327026] amdgpu 0000:04:00.0: amdgpu: Ring gfx_0.1.0 reset failure
[ 5144.327029] amdgpu 0000:04:00.0: amdgpu: GPU reset begin!
[ 5144.409954] amdgpu 0000:04:00.0: amdgpu: MODE2 reset
[ 5144.420100] amdgpu 0000:04:00.0: amdgpu: GPU reset succeeded, trying to resume
[ 5144.420540] [drm] PCIE GART of 1024M enabled (table at 0x000000F43FC00000).
[ 5144.420572] amdgpu 0000:04:00.0: amdgpu: PSP is resuming...
[ 5144.442727] amdgpu 0000:04:00.0: amdgpu: reserve 0xa00000 from 0xf43e000000 for PSP TMR
[ 5145.318237] amdgpu 0000:04:00.0: amdgpu: SMU is resuming...
[ 5145.319255] amdgpu 0000:04:00.0: amdgpu: SMU is resumed successfully!
[ 5145.319673] [drm] kiq ring mec 2 pipe 1 q 0
[ 5145.332059] [drm] DMUB hardware initialized: version=0x0300000A
[ 5145.410386] [drm] Failed to add display topology, DTM TA is not initialized.
[ 5145.439757] amdgpu 0000:04:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on hub 0
[ 5145.439762] amdgpu 0000:04:00.0: amdgpu: ring gfx_0.1.0 uses VM inv eng 1 on hub 0
[ 5145.439765] amdgpu 0000:04:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 4 on hub 0
[ 5145.439767] amdgpu 0000:04:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 5 on hub 0
[ 5145.439770] amdgpu 0000:04:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 6 on hub 0
[ 5145.439772] amdgpu 0000:04:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 7 on hub 0
[ 5145.439774] amdgpu 0000:04:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 8 on hub 0
[ 5145.439777] amdgpu 0000:04:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 9 on hub 0
[ 5145.439779] amdgpu 0000:04:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 10 on hub 0
[ 5145.439782] amdgpu 0000:04:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 11 on hub 0
[ 5145.439784] amdgpu 0000:04:00.0: amdgpu: ring kiq_0.2.1.0 uses VM inv eng 12 on hub 0
[ 5145.439787] amdgpu 0000:04:00.0: amdgpu: ring sdma0 uses VM inv eng 13 on hub 0
[ 5145.439789] amdgpu 0000:04:00.0: amdgpu: ring vcn_dec_0 uses VM inv eng 0 on hub 8
[ 5145.439792] amdgpu 0000:04:00.0: amdgpu: ring vcn_enc_0.0 uses VM inv eng 1 on hub 8
[ 5145.439794] amdgpu 0000:04:00.0: amdgpu: ring vcn_enc_0.1 uses VM inv eng 4 on hub 8
[ 5145.439796] amdgpu 0000:04:00.0: amdgpu: ring jpeg_dec uses VM inv eng 5 on hub 8
[ 5145.443634] amdgpu 0000:04:00.0: amdgpu: GPU reset(3) succeeded!
they are different and happen at random times, every single time. if i will find a more interesting one, i will share! i can get THIS exact same scenario 10 seconds after boot, or maybe while watching a video, etc. etc.
so i looked up “nixos GCVM_L2_PROTECTION_FAULT_STATUS” and then i went to this page:
but i didnt want to install LACT
, because i didnt have this issue before… besides, i can undervolt via BIOS (which is very dangerous and could lead to a black screen!!!), so then i found this site
but its like, definitely not an APU issue anymore then? these guys own actual dedicated GPU’s! my gpu is an AMD Custom GPU 0405 (RADV VANGOGH)
of PHYSICAL_DEVICE_TYPE_INTEGRATED_GPU
, by the way, idk if this is related or not, but i get this in vulkaninfo
:
WARNING: [Loader Message] Code 0 : terminator_CreateInstance: Received return code -3 from call to vkCreateInstance in ICD /nix/store/j16gwk21wpzliw3slgglbdb7nk5hrcdm-mesa-25.1.2/lib/libvulkan_dzn.so. Skipping this driver.
EDIT: i just want to say really quick, that when i installed AMDVLK
drivers (hardware.amdgpu.amdvlk.enable
), these errors went away, BUT!!! they do NOT support APU 0405 (as written in the list of supported devices, so instead i stuck with regular mesa
, i.e. hardware.graphics.enable
so yeah. something is terribly wrong and i dont know what i did or who did what. BY THE WAY, i DO have every power management setting maxed out for maximum performance (i use my steam deck in a dock station, like a PC!!!), but i dont think thats relevant because i still get the same hangs and freezes and crashes with or without TLP and/or powerManagement.cpuFreqGovernor = "perfrmance"
or services.tlp.settings
with every CPU/GPU preference set to "performance"
should i finally try steam deck drivers from here?
OR PERHAPS try chaotic-nyx
flake to install pkgs.mesa-git
?
cos i cant use the system like this! its very annoying and still HASNT been fixed, despite some official claims it was fixed in an update…
P.S. sorry i keep forgetting to set the tag to “Help”
P.P.S. yes the output of dmesg
is the one i got while writing this