How to debug? Laptop suddenly extremely slow (starting at early boot process)

Hi!
I currently really struggle to pin down the issue. On Sunday I added the nix-unstable channel

# nix-channel --list
nixos https://channels.nixos.org/nixos-24.11
nixos-unstable https://nixos.org/channels/nixos-unstable

and added the unstable version of typst

{pkgs, config, ...}: {
  # Allow unstable packages.
  nixpkgs.config = {
    packageOverrides = pkgs: {
      unstable = import <nixos-unstable> {
        config = config.nixpkgs.config;
      };
    };
  };

  # List packages installed in system profile. To search, run:
  # $ nix search wget
  environment.systemPackages = with pkgs; [
#    typst
    unstable.typst

#    (if config.services.xserver.enable then [
#      tinymist
#    ])
    (if config.services.xserver.enable || config.services.xserver.wayland.enable then pkgs.unstable.tinymist else null)
  ];
}

I don’t think that this should have any impact on global system performance since it is a normal app. I think the --upgrade also updated some other stuff though - but typically this should be done regularly by my systemd service.

A few hours later, nixos suddenly became really slow, but RAM&SWAP usage was normal, CPU 100%. I rebooted and it was still extremely slow, even unlocking my full-disk-encryption LUKS took ~10s (it felt like this; compared to ~1.5s). I booted an old generation and it was still slow, LUKS seemed to unlock in reasonable time. New update, maybe a little bit faster? Still barely usable for office work (sometimes even scrolling in less is slow).

On journalctl there were a few errors where I don’t think they are the cause - e.g. xapp-status segvaulting. htop is using ~40%-60% CPU.

How to best debug this? There seems to be a very fundamental issue if unlocking LUKS is slow. Or multiple issues?

# uname -a
Linux abc 6.12.18 #1-NixOS SMP PREEMPT_DYNAMIC Fri Mar  7 17:25:47 UTC 2025 x86_64 GNU/Linux

I use systemd-boot on UEFI with disabled Secure Boot.

Smart shows nearly perfect results of my SSD.

On a higher level I have LUKS → LVM → ZFS with ~300 GB of empty space (of 1 TB) and cinnamon on X11.

Edit: Log of the incredibly slow boot process until mounting volumes (as the rest won’t be interesting): Log slow boot - Pastebin.com

Thanks!
Thomas131

If booting an older generation didn’t fix it, then it’s not a software configuration problem. Seems likely to be a hardware / firmware problem.

Probably caused by ZFS trim. On my system, ZFS trim is a bit slow, and trimming also consumes a lot of bandwidth on the disk, that is why booting is slow.

I think this might be a ZFS bug, but I’m not sure. With the same disk, trimming is very fast on Windows(I used it on Windows before).

To confirm this, when you boot into your system, run:


$zpool status -t

htop hides kernel processes by default, so you’ll need to change its settings.

Alternatively, you can use:


$ps aux --sort=+%cpu

100% CPU usage is not normal. I don’t have that high CPU usage, but anyway, you need to find out who is using your CPU.

“Probably” is a strong claim. I’ve never had the problem you’re describing in all my years of using ZFS on SSDs with any combination autotrim and zpool-trim.service enabled. I don’t think what you’re describing is normal. TRIM should not consume substantial bandwidth to the disk, not to mention ZFS tries to actual IO over TRIM commands.

There are several people who have the same symptoms. Though the actual cause is not clear, they are all using ZFS trim.

That’s why I think this might be a ZFS bug, and to determine if it’s a hardware, driver, or ZFS issue, we need more research.

Thanks for all the replies! I just reply now since my laptop was fast again yesterday and partly unusably slow today (now fast again) …

I think the issue is already starting before the LUKS ← LVM ← ZFS Setup is decrypted - so I don’t think it is ZFS.

I right now focus on this lines of todays boot:

Mär 20 13:19:32 [censored] kernel: DMAR: DRHD: handling fault status reg 2
Mär 20 13:19:32 [censored] kernel: DMAR: [INTR-REMAP] Request device [f0:1f.0] fault index 0x40c5 [fault reason 0x22] Present field in the IRTE entry is clear

I only partly understand IOMMU, remapping and INTR (Interrupt). Didn’t find good info on the internet yet.

In case you are interested in the zfs thing:

$ zpool status -t
  pool: zpool
 state: ONLINE
  scan: scrub repaired 0B in 00:13:45 with 0 errors on Sat Mar  1 09:11:31 2025
config:

	NAME                                                                            STATE     READ WRITE CKSUM
	zpool                                                                           ONLINE       0     0     0
	  dm-uuid-LVM-[censored]  ONLINE       0     0     0  (trim unsupported)

errors: No known data errors


Anyways, thanks & Best Regards!
Thomas131