So I have a very new AMD-based thinkpad that I’ve been tinkering with for a couple of weeks now. There are multiple significant problems that are causing usability issues on the laptop, but for this, I want to look at suspend/resume.
I’m frequently (but not 100% of the time) having the problem where either the laptop doesn’t suspend (remains at full power, screen goes solid white). This can happen whether I close the laptop lid or suspend the system with systemctl suspend. This machine came pre-installed with Windows. Though I did not see any sleep/suspend issues with Windows, I only ran it for one night.
Currently running kernel 5.3.9 (linuxPackages_latest, with linuxPackages_latest.acpi_call). I’ve set “acpi_backlight=none”, and enabled the kvm_amd module.
One of these failed attempts actually looks completely like a successful attempt in the log file:
Nov 09 10:23:24 garnet systemd-logind[1075]: Lid closed.
Nov 09 10:23:24 garnet systemd-logind[1075]: Suspending...
Nov 09 10:23:24 garnet systemd[1]: Starting Pre-Sleep Actions...
Nov 09 10:23:24 garnet systemd[1]: pre-sleep.service: Succeeded.
Nov 09 10:23:24 garnet systemd[1]: Started Pre-Sleep Actions.
Nov 09 10:23:24 garnet systemd[1]: Reached target Sleep.
Nov 09 10:23:24 garnet systemd[1]: Starting Suspend...
Nov 09 10:23:24 garnet systemd-sleep[2690]: Suspending system...
Nov 09 10:23:24 garnet kernel: PM: suspend entry (deep)
This is a thinkpad x395, manufactured a few weeks ago.
AMD Ryzen 7 PRO 3700U processor with Radeon Vega Mobile Gfx (yes, it appears that the grahpics adapter is integrated with the processor)
Intel Wireless-AC 9260 network controller
Realtek Semiconductor ethernet controller
And a lot of other things made by either Realtek or AMD.
I’ve researched this problem quite a lot and only found hints of things that might be related.
At least for my USB realtek WiFi that wasn’t enough, I also had to disable the standard rtl8xxxxu module completely.
I don’t have access to that computer to look up the actual module name (I’m not sure if it had 3 or 4 X’s in the name) and the config line to disable/blacklist kernel modules…
I get a lot of spammy notifications that the WiFi were disconnected and reconnected from network manager, but downloads or SSH connections are not affected or disrupted.
If I leave in the 8xxxu module, it is used for the usb WiFi and then I do see all networks from around me, but only with a minimal signal strength, as soon as I try to connect then I get told that the network wasn’t available anymore, but it’s shown instantly in the list of networks again.
I’m being nothing but serious in asking if you’ve considered using Windows + Nix on WSL2? Wait for a year for things on Linux to stabilize? Solving hardware issues on Linux without an understanding of what’s going on would be a big time-sink (specially with ultra new hardware), familiarizing yourself to Windows would be less and you’ll get to use some Windows software.
Windows is 100% irrelevant to both my life and my work. I’ve been doing MacOS + Nix for a few years, and it’s definitely not as good as a NixOS platform.
I’m not sure that I understand why the conversation has gone in the rtl8192 direction. But, lsmod does not report any rtl or 8192 modules on my system.
I can’t offer any help at this point, but solidarity in that I’ve been having trouble with suspend/resume on my older Thinkpad Ryzen laptop ever since I upgraded to 19.09. I haven’t noticed a white screen. My main symptom is it seems like when it resumes the screen doesn’t turn back on and I have to hard reboot it by holding the power button.
I checked the Lenovo Arch wiki for your model and the only advice it has is,
Prevent amdgpu issues by updating to latest BIOS [2]
That might do it. I just now updated the BIOS, and so will let you know what happens in a few days.
For the record, though their BIOS app is “for windows” or whatever, it’s an ISO that boots off of a USB drive and is completely platform independent. With lots of scary verbiage about “full battery”, “on AC power” and “the main board may have to be replaced” if the process is interrupted…
So, while it is still not 100% reliable, I feel like updating the bios (from an August version to the October 30th version) helped a little, and then setting acpi_osi=Linux on the kernel command line helped a bit more.
“white screen” in particular turns out to be an artifact of screen locking with i3lock, which turns the screen solid white. acpi_osi=Linux broke my screen locking, and I’ve now seen the system lock with just a snapshot of what was on my screen when it locked.
It’s good enough to make the machine usable now. I just have to work on screen locking and the myriad wireless issues.
Well, it’s been 9 days. Averaging everything out, any time I close my laptop is about a 50% chance of a hard crash. Either it will fail to sleep or it will fail to wake up.
I’ll get stretches of a day or so when the machine reliably goes to sleep every time I close it, and then stretches of a day in which it crashes every single time.
Every time I put my NixOS laptop to sleep it is ~70% chance it won’t wake up if I’m out and about. If I’m at work and the laptop is docked it is about 100% chance it will properly wake up.
Sometimes I wonder how I got to the point of accepting this as normal…
My E585 was having similar problems and I tried updating the BIOS and that didn’t help. Then I tried switching from Linux kernal latest to the normal Linux kernel in nixpkgs and it got a lot better. I’ll try to send you the kernel I’m on later.
I’m starting to think the problem is a GPU problem.
Flimsy evidence here, but every now and then when I boot the machine (which I have To. Do. So. Often.) X just fails to start and the machine actually hangs. No response to ctrl-alt-f1, ctrl-f1, no response to the power switch, etc.
Additionally, I’ve installed leela-zero on my machine, which makes heavy use of the GPU, and the app crashes while searching for the GPU. Works fine (but slowly) if I tell it to use only the CPU. The error message looks like this:
BLAS Core: built-in Eigen 3.3.7 library.
Detecting residual layers...v1...256 channels...40 blocks.
Initializing OpenCL (autodetecting precision).
OpenCL: clGetPlatformIDs
terminate called after throwing an instance of 'cl::Error'
what(): clGetPlatformIDs
Aborted (core dumped)
(I think this is GPU-related because of this, but I could be wildly wrong)
So, maybe the AMDGPU module is broken? But I also don’t know how to reconfigure X to start if I disable the driver, so I’ve made no real progress on that.
Other than that, the machine is fine. Definitely a powerhouse, compiles code quickly, links GTK libraries about as slowly as I think anything else would link GTK libraries, and performs fairly well when I’m doing photo editing on 6000x4000 photos.