Persistent USB drive not working after suspend

Hi folks, I’m very new to Nixos and not that big of a Linux expert either.
I’ve been trying out Nixos on my desktop for some time, but I’m still keeping Windows on my main SSD for the time, just in case.

$ nix-info -m
 - system: `"x86_64-linux"`
 - host os: `Linux 6.6.28, NixOS, 23.11 (Tapir), 23.11.6621.dd37924974b9`
 - multi-user?: `yes`
 - sandbox: `yes`
 - version: `nix-env (Nix) 2.18.1`
 - channels(root): `"nixos-23.11, nixos-unstable"`
 - nixpkgs: `/nix/var/nix/profiles/per-user/root/channels/nixos`

The problem

Since I wanted to try it out before fully committing, I figured that installing it on an SSD that’s connect via USB would make sense. Everything’s been working great, except this one issue that my PC can’t wake up from suspend.

I.e. if I do

  1. sudo systemctl suspend
  2. Press the power button, or wake up using keyboard
  3. Black screen appears with mouse cursor visible.
  • Sometimes the cursor can move, sometimes the mouse doesn’t seem to connect after suspend
  • A few times I actually got to see the login screen, but after typing the correct password it acts as if it’s the wrong password
  • Switching to a different TTY using Ctrl+Alt+F1 presents me with > nixos login: prompt, but no matter what I type in there, it doesn’t react to the return key. It just goes to the next line and I can keep on typing endlessly.

Yesterday I managed to reproduce the “faulty login screen” consistently. When I wake up to the black screen, if I disconnect the USB cable with the SSD, the login screen will appear. However, even after reconnecting the drive (to the same USB port) there’s no way for me to login. Switching to another TTY doesn’t help either, it’s still the same “endless typing” experience.

My thoughts and guesses

My primary guess is that it has to do with some sort of power suspend happening to the USB port/device, and after wake up it can’t access the disk.
My secondary guess is that it’s some combination of other factors, maybe KDE + something + something + USB drive installation result in this case.

My primary question here for you more savvy folks is - how do I even start debugging this? There are no logs in journalctl, which makes sense if the OS can’t access the drive to persist the logs in the first place.

What I have tried to no avail

So far my attempts were targeted at disabling usb autosuspend, but to no avail. Just to make sure we have ruled out some weird rebuild issues, after every related configuration.nix change I did sudo nixos-rebuild switch followed by sudo systemctl reboot before testing out the suspend.

  # did not work
  boot.extraModprobeConfig = ''
    options usbcore autosuspend=0
  '';
  # did not work either
  boot.extraModprobeConfig = ''
    options usbcore autosuspend=-1
  '';
  # did not work either
  boot.kernelParams = [
    "usbcore.autosuspend=-1"
  ];

Some (useless?) evidence

Here I will attach some information that might be relevant, but I really don’t know what else to try and where else to dig, especially since there are no easily obtainable log messages to nudge me in the right direction.

journalctl logs show nothing after the sleep entry.

May 03 01:28:23 nixos kernel: Linux version 6.6.28 (nixbld@localhost) (gcc (GCC) 12.3.0, GNU ld (GNU Binutils) 2.40) #1-NixOS SMP PREEMPT_DYNAMIC Wed Apr 17 09:>
-- Boot d325c44cb8cc4bf787f12370f1b4cca2 --
May 02 17:30:19 nixos kernel: PM: suspend entry (deep)
May 02 17:30:19 nixos systemd-sleep[3132]: Entering sleep state 'suspend'...

The SSD in the usb enclosure

$ sudo smartctl --all /dev/sda
smartctl 7.4 2023-08-01 r5530 [x86_64-linux-6.6.28] (local build)
Copyright (C) 2002-23, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Number:                       ADATA SX8200PNP
Serial Number:                      2N0729A6NJFY
Firmware Version:                   32B3T8ED
PCI Vendor/Subsystem ID:            0x1cc1
IEEE OUI Identifier:                0x707c18
Controller ID:                      1
NVMe Version:                       1.3
Number of Namespaces:               1
Namespace 1 Size/Capacity:          2,048,408,248,320 [2.04 TB]
Namespace 1 Utilization:            96,688,001,024 [96.6 GB]
Namespace 1 Formatted LBA Size:     512
Namespace 1 IEEE EUI-64:            707c18 00020729a6
Local Time is:                      Fri May  3 01:34:08 2024 EEST
Firmware Updates (0x14):            2 Slots, no Reset required
Optional Admin Commands (0x0017):   Security Format Frmw_DL Self_Test
Optional NVM Commands (0x005f):     Comp Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat Timestmp
Log Page Attributes (0x0e):         Cmd_Eff_Lg Ext_Get_Lg Telmtry_Lg
Maximum Data Transfer Size:         64 Pages
Warning  Comp. Temp. Threshold:     75 Celsius
Critical Comp. Temp. Threshold:     80 Celsius

Supported Power States
St Op     Max   Active     Idle   RL RT WL WT  Ent_Lat  Ex_Lat
 0 +     9.00W       -        -    0  0  0  0        0       0
 1 +     4.60W       -        -    1  1  1  1        0       0
 2 +     3.80W       -        -    2  2  2  2        0       0
 3 -   0.0450W       -        -    3  3  3  3     2000    2000
 4 -   0.0040W       -        -    4  4  4  4    15000   15000

Supported LBA Sizes (NSID 0x1)
Id Fmt  Data  Metadt  Rel_Perf
 0 +     512       0         0

=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SMART/Health Information (NVMe Log 0x02)
Critical Warning:                   0x00
Temperature:                        30 Celsius
Available Spare:                    100%
Available Spare Threshold:          10%
Percentage Used:                    0%
Data Units Read:                    137,496 [70.3 GB]
Data Units Written:                 147,472 [75.5 GB]
Host Read Commands:                 4,932,525
Host Write Commands:                5,245,461
Controller Busy Time:               26
Power Cycles:                       29
Power On Hours:                     87
Unsafe Shutdowns:                   22
Media and Data Integrity Errors:    0
Error Information Log Entries:      0
Warning  Comp. Temperature Time:    0
Critical Comp. Temperature Time:    0

Warning: NVMe Get Log truncated to 0x200 bytes, 0x200 bytes zero filled
Error Information (NVMe Log 0x01, 16 of 256 entries)
No Errors Logged

Warning: NVMe Get Log truncated to 0x200 bytes, 0x034 bytes zero filled
Self-test Log (NVMe Log 0x06)
Self-test status: No self-test in progress
No Self-tests Logged

Relevant changes introduced to the initial (after installation) nixos configuration:

{
  imports =
    [ # Include the results of the hardware scan.
      ./hardware-configuration.nix
    ];

  # Bootloader (did not change, but just to show that I haven't touched this stuff)
  boot.loader.systemd-boot.enable = true;
  boot.loader.efi.canTouchEfiVariables = true;

  # Updated the kernel
  boot.kernelPackages = pkgs.linuxPackages_6_6;


  # Enable the X11 windowing system, KDE, nvidia

  services.xserver.enable = true;
  # Enable the KDE Plasma Desktop Environment.
  services.xserver.displayManager.sddm.enable = true;
  services.xserver.desktopManager.plasma5.enable = true;
  # Enable nvidia drivers
  services.xserver.videoDrivers = ["nvidia"];
  hardware.nvidia = {
    # seems to be needed for wayland compositors
    modesetting.enable = true;
    # disable experimental power management
    powerManagement.enable = false;       # <-- I wonder if this may be the cause somehow?
    powerManagement.finegrained = false;  # <-- I wonder if this may be the cause somehow?
    # use open source kernel (not driver)
    open = true;
    # enable settings menu
    nvidiaSettings = true;
  };
  # some opengl stuff to help run steam
  hardware.opengl = {
    enable = true;
    driSupport = true;
    driSupport32Bit = true;
  };

  environment.systemPackages = with pkgs; [
    pciutils
    smartmontools
  ];

start of the hardware configuration, the stuff relevant to disk config

  imports =
    [ (modulesPath + "/installer/scan/not-detected.nix")
    ];

  boot.initrd.availableKernelModules = [ "xhci_pci" "ahci" "nvme" "usbhid" "uas" "usb_storage" "sd_mod" ];
  boot.initrd.kernelModules = [ ];
  boot.kernelModules = [ "kvm-amd" ];
  boot.extraModulePackages = [ ];

  fileSystems."/" =
    { device = "/dev/disk/by-uuid/f31fa8a9-b3e5-4209-96fd-23a8c0bcfe33";
      fsType = "ext4";
    };

  fileSystems."/boot" =
    { device = "/dev/disk/by-uuid/4ED7-A486";
      fsType = "vfat";
      options = [ "fmask=0022" "dmask=0022" ];
    };

  swapDevices =
    [ { device = "/dev/disk/by-uuid/1d943944-7653-445a-84c2-dcc46690f2b1"; }
    ];

disk stuff in /etc/fstab

$ cat /etc/fstab
# This is a generated file.  Do not edit!
#
# To make changes, edit the fileSystems and swapDevices NixOS options
# in your /etc/nixos/configuration.nix file.
#
# <file system> <mount point>   <type>  <options>       <dump>  <pass>

# Filesystems.
/dev/disk/by-uuid/f31fa8a9-b3e5-4209-96fd-23a8c0bcfe33 / ext4 x-initrd.mount 0 1
/dev/disk/by-uuid/4ED7-A486 /boot vfat fmask=0022,dmask=0022 0 2


# Swap devices.
/dev/disk/by-uuid/1d943944-7653-445a-84c2-dcc46690f2b1 none swap defaults

lsblk

$ lsblk -f
NAME        FSTYPE  FSVER            LABEL                      UUID                                 FSAVAIL FSUSE% MOUNTPOINTS
sda
├─sda1      vfat    FAT32                                       4ED7-A486                               479M     6% /boot
├─sda2      ext4    1.0                                         f31fa8a9-b3e5-4209-96fd-23a8c0bcfe33    1.7T     2% /nix/store
│                                                                                                                   /
└─sda3      swap    1                swap                       1d943944-7653-445a-84c2-dcc46690f2b1                [SWAP]

Some USB device power configs

$ cat /sys/bus/usb/devices/2-4/power/autosuspend
0

$ cat /sys/bus/usb/devices/2-4/power/autosuspend_delay_ms
0

$ cat /sys/bus/usb/devices/2-4/power/persist
1

$ cat /sys/bus/usb/devices/2-4/power/control
on
1 Like

Another interesting thing that happened yesterday is that upon wake-up, the OS somehow logged me in after a few seconds (automatically, without me typing in my password, not sure how, as I haven’t managed to reproduce this ever since).

Since I still had the terminal open from before suspend, I tried to type some commands into it, but the only one that "worked"was (I don’t remember) ether journalctl or dmesg.
In the attached picture the output of the log is the long bunch of “command not found” messages.
Then I tried ls and some other command, but kept on getting the input/output error.

In the end, I still had to cut the power to reboot the system, as the UI was broken and there’s no way to shut down otherwise. (One of these days I’ll bind a SysReq key in my keyboard firmware to do the REISUB thing)

Got some confirmation that this has to do with the USB drive in one way or another.

What I did was (while having usbcore.autosuspend=-1)

  • sudo systemctl suspend
  • wake up
  • once it loads me the black screen, switch to tty1
  • on the nixos login prompt immediately press Ctrl+D to exit the prompt

This shows the following logs.

Leaving this until the morning gives more error logs related to the disk.


Another thing that occurred to me that there may be a BIOS setting that’s messing up my USB power states. There indeed turned out to be several, but changing them all to “don’t turn anything off, always supply power” changed nothing.

At this point I’m about to give up on the whole USB thing . While it would be interesting to understand what’s going on, it doesn’t seem immediately solveable.

1 Like