NixOS hard locks driving my crazy!

The last month or two, I’ve been getting a ton of hard locks. It’s usually when my computer is idle. It’s the typical hard lock, mouse cursor doesn’t move, caps lock doesn’t respond, etc. NixOS was quite stable on my computer (as were several other Linux distros) and I haven’t changed any hardware recently.

I was running 24.11, but I recently upgraded to 25.05 in hopes to getting a more stable system. Here’s what I know so far:

I’ve added the following configuration:

  services.journald.extraConfig = ''
    Storage=persistent
  '';

In hopes of getting some logs. After the hard lock, I’ve tried running journalctl -b -1 to see if there’s any logs, but it appears the kernel locks before any logs can be written.

I’ve added some kernel parameters to see if anything helps:

  boot.kernelParams = [
    "amdgpu.aspm=0"
    "amdgpu.runpm=0"
    "amdgpu.dcdebugmask=0x10"
  ];

I’ve also been playing around with a few different kernel versions. Mostly I’m trying 6.12.57 since it’s LTS, but I’ve tried the latest 6.17 as well. Both have the same problem, which is several hard locks per day (it’s almost always locked when I wake up in the morning).

A bit about the hardware courtesy of Fastfetch:

 🖥️ PC: Z690 AORUS ELITE AX DDR4
    ├: 12th Gen Intel(R) Core(TM) i9-12900K (24) @ 5.20 GHz
    ├󰍛: AMD Radeon RX 6600 [Discrete]
    ├󰍛: Intel AlderLake-S GT1 @ 1.55 GHz [Integrated]
    ├󰍛: 5.27 GiB / 62.57 GiB (8%)
    └: 93.42 GiB / 915.34 GiB (10%) - ext4

 🐧 OS: NixOS 25.05 (Warbler) x86_64
    ├: Linux 6.12.57
    ├: F6 (5.24)
    └󰏖: 1295 (nix-system), 7 (nix-user)

 ⌨️ DE: Hyprland 0.49.0 (Wayland)
    ├: ghostty 1.1.3
    └: zsh 5.9

On occasion when it hard locks, I get a ton of loud static from the speakers. One thing to note is I’m running audio over HDMI to my monitor, I’m wondering if there’s some sort of audio driver issue. I haven’t tried disabling audio over HDMI. ChatGPT really seems to think it’s an HDMI audio issue but it seems ChatGPT goes down some crazy rabbit hole about half the time. I’ve messed with a bunch of Pipewire settings just to see if anything makes a difference.

I’ve run a few basic tests like disk checks and memory tests, nothing stands out.

I’m somewhat at a loss now as for what to try next. It just seems like any recent version of the Linux kernel just isn’t very stable, at least on my hardware. I probably haven’t provided enough information here for anyone to actually solve the problem, but I’m looking for some help on how to narrow down the issue and get some more clues.

I’m happy to post the output of any command, just let me know. Thanks!

Mike

Is it a kernel panic? Does your caps lock start flashing? If not I’d suspect a graphics issue rather than a “lock”, is there any chance you could run an ssh server and try to ssh in?

Do please share your configuration while you’re at it; much easier to see what could be going wrong if you know what software and configuration is involved.

I’d recommend removing the kernel parameters you added; As far as the ones you’ve shared go, disabling powermanagement is an idea if you suspect the GPU shuts down when idle even if it shouldn’t, which would match the frozen frame and also explain the audio glitches, but it didn’t work, so get rid of it, and you’re not getting any logs so making your gpu driver dump stuff into dmesg is at best causing churn. journald is also set to be persistent by default (and that’s not the “correct” way to change that setting), so you can get rid of that extraConfig.

I… really don’t recommend taking half a year for that, by the way. You will literally not get any updates a month after a newer release comes out, so you’ve been affected by a myriad of browser CVEs for at least 5 months now. Make sure to actually upgrade again two weeks from now, and consider anything you’ve done in your browser for the last 5 months compromised.

4 Likes

Is it a kernel panic? Does your caps lock start flashing? If not I’d suspect a graphics issue rather than a “lock”, is there any chance you could run an ssh server and try to ssh in?

No caps lock flashing, just a complete lock. Next time it happens, I’ll see if the network stack is still running and ping it from another computer and try to ssh in. I have a feeling it’s totally dead though.

I’d recommend removing the kernel parameters you added

Sounds good, I’ll remove. I think they were mostly ChatGPT suggestions, but you’re right - they don’t fix the problem so no use keeping them.

I… really don’t recommend taking half a year for that, by the way.

I do plan to upgrade to 25.11 once it’s out in a few weeks. In fact, at this point I’d be happy to get on the unstable channel since we’re pretty close to a new release. Either way, I’m guessing my problems are pretty low level and more to do with kernel versions and GPU support.

Current Configuration

# Edit this configuration file to define what should be installed on
# your system. Help is available in the configuration.nix(5) man page, on
# https://search.nixos.org/options and in the NixOS manual (`nixos-help`).

{ config, lib, pkgs, ... }:

{
  imports =
    [ # Include the results of the hardware scan.
      ./hardware-configuration.nix
    ];

  # Use the systemd-boot EFI boot loader.
  boot.loader.systemd-boot.enable = true;
  boot.loader.efi.canTouchEfiVariables = true;

  # Some kernel parameters to track down hard locks (might get brave and remove these with 25.05)
  boot.kernelParams = [
    "idle=nomwait"   # avoid buggy C-states
    "pcie_aspm=off"  # disable PCIe power saving

    "amdgpu.aspm=0"
    "amdgpu.runpm=0" # disable runtime power management
    "amdgpu.dcdebugmask=0x10"
  ];

  # Use latest kernel, rather than default NixOS one
  boot.kernelPackages = pkgs.linuxPackages_latest; # Use 6.17 which is latest, but not LTS
  hardware.enableAllFirmware = true;

  hardware.graphics = {
    enable = true;
    enable32Bit = true;
  };

  # AMD Video card
  boot.initrd.kernelModules = [ "amdgpu" ];

  # Enable crash dumb, which should provide a dump in /var/lib/kdump/
  boot.crashDump.enable = true;

  # Configure Wayland to use AMD GPU
  environment.variables = {
    # Force Vulkan to use AMD GPU (useful if you have multiple GPUs)
    "VK_ICD_FILENAMES" = "/etc/vulkan/icd.d/amd_icd64.json";
    "VK_LAYER_PATH" = "/etc/vulkan/explicit_layer.d";
  };

  # Enable persistent logs across reboots (While tracking down locks)
  services.journald.extraConfig = ''
    Storage=persistent
  '';

  # Enable HIP - https://nixos.wiki/wiki/AMD_GPU
  systemd.tmpfiles.rules = [
    "L+    /opt/rocm/hip   -    -    -     -    ${pkgs.rocmPackages.clr}"
  ];

  # OpenCL 
  hardware.graphics.extraPackages = with pkgs; [
    rocmPackages.clr.icd
    # amdvlk
  ];

  networking.hostName = "desktop"; # Define your hostname.
  networking.networkmanager.enable = true;  # Easiest to use and most distros use this by default.

  # Set your time zone.
  time.timeZone = "America/Los_Angeles";

  nixpkgs.config.allowUnfree = true;
  nixpkgs.config.permittedInsecurePackages = [
    "openssl-1.1.1w"
  ];

  # Enable printing and auto-discovery of printers
  services.printing = {
    enable = true;
    drivers = [ pkgs.hplip ];
  };

  hardware.printers = {
    ensurePrinters = [
      {
        name = "LaserJet";
        location = "Home";
        deviceUri = "ipp://laserjet.kitchenpc.net/ipp/print";
        model = "HP/hp-color_laserjet_m452d-ps.ppd.gz";
      }
    ];
    ensureDefaultPrinter = "LaserJet";
  };

  # Enable Locate service
  services.locate.enable = true;
  services.locate.package = pkgs.mlocate;

  # Enable sound using Pipewire
  security.rtkit.enable = true;
  services.pipewire = {
   enable = true;
   alsa.enable = true;
   alsa.support32Bit = true;
   pulse.enable = true;
  };

  # Define a user account.
  users.users.mike = {
    isNormalUser = true;
    home = "/home/mike";
    description = "Mike Christensen";
    shell = pkgs.zsh;
    extraGroups = [
      "wheel"
      "docker"
      "audio"
    ];
  };

  # This is needed for some reason since my account uses zsh
  programs.zsh.enable = true;

  # Enable Hyprland window manager
  programs.hyprland.enable = true;

  # Optional, hint Electron apps to use Wayland:
  environment.sessionVariables.NIXOS_OZONE_WL = "1";

  # Enable Gnome keyring (not sure if I need this?)
  services.gnome.gnome-keyring.enable = true;
  
  # Every possible proram I want installed on my system.
  environment.systemPackages = with pkgs; [
    # sddm themes for login screens
    sddm-astronaut
    catppuccin-sddm-corners

    # GTK Themes
    glib
    catppuccin-gtk

    pavucontrol # GUI audio control
    neovim
    micro # way better text editor than nano
    zsh
    zsh-autosuggestions
    mlocate
    calc
    fastfetch
    tmux
    playerctl # command line media player controls (next/prev track, etc)
    cava # command line audio visualizer
    chafa # command line image viewer using Kitty protocol
    wget
    unzip
    bat
    lsd
    git
    gh
    jq
    mc # midnight commander
    fzf # fuzzy search command
    yazi # file search tool
    imagemagick # command line image conversions
    ffmpeg-full # command line mp3/mp4 utilities
    nodejs_24
    python313
    dotnet-sdk
    dotnet-runtime
    # brave
    vivaldi
    vlc
    docker
    docker-compose
    vscode
    ghostty # so far it's my favorite terminal emulator
    starship # customizable prompt for any shell
    warp-terminal
    sublime4
    postman
    jetbrains.rider
    _1password-gui
    spotify
    devbox
    openssl_legacy
    remmina # Remote Desktop client

    # Screen capture tools for Wayland
    grim
    slurp

    # Stuff for Hyprland
    wl-clipboard # wl-copy and wl-paste for copy/paste from stdin / stdout
    waybar # the top menu bar
    rofi-wayland # app launch - In 25.11 this will just be rofi
    #rofi
    swww # wallpaper daemon that you can control from the command line using swww img
    libnotify # ability to send notifications from CLI
    hyprlock # screen lock utility
    wlogout # Wayland logout menu
    swaynotificationcenter # notification manager
    pyprland # Hyprland plugin system (Not sure if I really like it)

    # Some AMD stuff
    blender-hip
    clinfo
  ];

  # Fonts to install on the system
  fonts.packages = with pkgs; [
    font-awesome
    powerline-fonts
    powerline-symbols
    noto-fonts
    noto-fonts-emoji # obsolete in 25.11
    #noto-fonts-color-emoji
    nerd-fonts.fira-code
    nerd-fonts.jetbrains-mono
    nerd-fonts.caskaydia-cove
    nerd-fonts.symbols-only
  ];

  # Docker stuff
  virtualisation.docker.enable = true;
  virtualisation.docker.rootless = {
    enable = true;
    setSocketVariable = true;
  };

  virtualisation.docker.daemon.settings = {
    userland-proxy = false;
    experimental = true;
    metrics-addr = "0.0.0.0:9323";
    ipv6 = true;
    fixed-cidr-v6 = "fd00::/80";
  };

  ## Garbage Collection
  nix.gc = {
    automatic = true;
    dates = "weekly";
    options = "--delete-older-than 3d";
  };

  # Enable the OpenSSH daemon.
  # services.openssh.enable = true;

  # disable the firewall since I'm at home
  networking.firewall.enable = false;

  # Copy the NixOS configuration file and link it from the resulting system
  # (/run/current-system/configuration.nix). This is useful in case you
  # accidentally delete configuration.nix.
  # system.copySystemConfiguration = true;

  # This option defines the first version of NixOS you have installed on this particular machine,
  # and is used to maintain compatibility with application data (e.g. databases) created on older NixOS versions.
  #
  # Most users should NEVER change this value after the initial install, for any reason,
  # even if you've upgraded your system to a new NixOS release.
  #
  # This value does NOT affect the Nixpkgs version your packages and OS are pulled from,
  # so changing it will NOT upgrade your system - see https://nixos.org/manual/nixos/stable/#sec-upgrading for how
  # to actually do that.
  #
  # This value being lower than the current NixOS release does NOT mean your system is
  # out of date, out of support, or vulnerable.
  #
  # Do NOT change this value unless you have manually inspected all the changes it would make to your configuration,
  # and migrated your data accordingly.
  #
  # For more information, see `man configuration.nix` or https://nixos.org/manual/nixos/stable/options#opt-system.stateVersion .
  system.stateVersion = "24.11"; # Did you read the comment?

}

A wild guess, but do you have any NTFS drives mounted? I experienced occasional hard locks when I mount my windows partition and leave it on idle for a couple days. Unmounting the partition when I’m not using it seems to have solved that issue.

1 Like

I have the following network share. It’s a Synology RackStation, I don’t believe it’s NTFS. It’s basically a file share for Plex to run:

  fileSystems."/mnt/plex" = {
    device = "//rackstation.kitchenpc.net/plex";
    fsType = "cifs";
    options = [ "username=plex" "password=plex" "x-systemd.automount" "noauto" ];
  };

I guess I could try removing it. At this point, I’ll try anything.

Mike

I just removed all kernel parameters. I also enabled the SSH daemon and made sure I could SSH in from another computer while it was in a working state, so I’ll be able to try that next time I get a hard lock.

1 Like

There’s another kind lockup that happens when waking from sleep if there’s CUDA running in background. I’m not sure about the details, but there’s a known issue with CUDA and sleep (Occasionally cuda stops working · Issue #348769 · NixOS/nixpkgs · GitHub) that results in the GPU becoming unrecognized after waking. I’m not quite sure what’s exactly happening, but when I’m struggling with that issue, I’m also struggling with occasional freezes during sleep/wake cycle. Turning off LLM’s running in the background seems to solve both the GPU issue as well as freezing during sleep/wake. So, make sure you don’t have any pytorch running in the background if you put your PC to sleep?

Has the pattern or presentation of problems changed since the upgrade? I’m wondering if a potential culprit was the older hyprland version you had for a while.

If anything it’s gotten worse since the upgrade to 25.05.

I’m not doing anything with LLMs or pytorch on this machine. Isn’t CUDA a platform for NVidia hardware though? I’m using an AMD GPU.

1 Like

With SSH, sometimes it could be useful to SSH in in advance and ask for dmesg -w, sometimes you can get logs closer to a problem this way if the kernel does lock up.

1 Like

Quick update on this: Amazingly, the OS hasn’t locked in the last 4 days since I’ve posted this! That’s the longest it’s gone in quite a while. Either removing some of those kernel parameters fixed it, or I’ve just gotten lucky. My current plan is still to try to SSH in if it happens again.

2 Likes

Quick Update #2: Okay, finally got a hard lock this morning around 5am. It was the same issue with a bunch of static and noise coming out of the speakers, which actually woke me up. I verified I could not ping the machine or SSH into the machine, so it’s definitely locked pretty hard. No kernel panic or flashing caps lock though.

Did you follow the suggestion from @7c6f434c and verify that an active shell also stops working? Sounds like the crashdump thing isn’t working either?

If so, seems like a pretty severe kernel bug; Certainly above my ability to debug without hooking up to a serial console or something.

It will stop working, but if you constantly follow the logs (including-not-limited, dmesg) and save them elsewhere, this has pretty balance of effort to set up and access to the initial parts of the issue.

1 Like

Hmmmm. Sounds very annoying… ( If it make you feel any better, my NixOS has recently started to reboot if I start vscode… I need to debug this also. )

If you do have a serial port, here’s some config to get the tty working on the serial. This took me a minute to get working. Obviously you’ll need another machine with serial and a null modem cable.

[das@l:~/nixos/qotom/nfb]$ cat serial-tty.nix 
#
# qotom/nfb/serial-tty.nix
#
# Serial console configuration for /dev/ttyS0
# Enables login via serial interface

# https://github.com/NixOS/nixpkgs/blob/5ae3b07d8d6527c42f17c876e404993199144b6a/nixos/modules/services/ttys/getty.nix
# https://github.com/NixOS/nixpkgs/issues/84105

{ config, lib, pkgs, ... }:

{
  # Enable serial console on ttyS0
  boot.kernelParams = [
    "console=ttyS0,115200"
  ];

  # Disable the upstream getty module's automatic configuration for serial-getty@
  # This prevents conflicts with our custom configuration
  systemd.services."serial-getty@" = {
    enable = false;
  };

  # Configure our own serial-getty@ttyS0 service
  systemd.services."serial-getty@ttyS0" = {
    enable = true;
    wantedBy = [ "getty.target" ];
    after = [ "systemd-user-sessions.service" ];
    wants = [ "systemd-user-sessions.service" ];
    serviceConfig = {
      Type = "idle";
      Restart = "always";
      Environment = "TERM=vt220";
      ExecStart = "${pkgs.util-linux}/bin/agetty --login-program ${pkgs.shadow}/bin/login --noclear --keep-baud ttyS0 115200,57600,38400,9600 vt220";
      UtmpIdentifier = "ttyS0";
      StandardInput = "tty";
      StandardOutput = "tty";
      TTYPath = "/dev/ttyS0";
      TTYReset = "yes";
      TTYVHangup = "yes";
      IgnoreSIGPIPE = "no";
      SendSIGHUP = "yes";
    };
  };

  # Enable early console output during boot
  #boot.consoleLogLevel = 7;  # Show all kernel messages
  boot.initrd.verbose = true;  # Show initrd messages
}
1 Like