Debug a slow system

Yesterday evening my system became slower and slower, restarting it to the newest generation took more than an hour (I went to bed before it finished), this morning I wanted to quickly check something in my config and triggered a nix build of the dev shell through direnv. The build of the environment never finished and the system instantly got slow again. Looking at htop I found a nix deamon process beeing in “uninteruptable sleep” as well as my systems load beein on 15+ on the short term average, mid and long term of course have been lower but were on the rising tide. When load was at 20 and system mostly irrespondable I issued a reboot from a TTY. Reaching end of the shutdown took more than 10 minutes of which there was a lot time spent on waiting for disks to be unmounted and the nix related services shutting down.

I booted into earlier available generations and each one was quicker to boot than the one before, but all but the last had the problem that either SDDM or a nix deamon became “uninterubtable sleep”, even after multiple tries.

What is the easiest way to inspect the differences between those generations? The only one I am aware off, is the kernel version (5.4.x for all of them, but differences in the “x”). I do not remember exact versions.

I left the computer in the seemingly working generation to observe how it behaves after some hours of uptime, and want to retry the more recent generations again this evening.

Though let me ask in advance, do you have any tips on how to properly debug such issues?

If it is of any help, the system is configured through flakes, though the span of generations I have tried might still be from the non-flake times…

1 Like

What is the easiest way to inspect the differences between those generations? The only one I am aware off, is the kernel version (5.4.x for all of them, but differences in the “x”). I do not remember exact versions.

Hmmm, sounds like it should be outside the range where the recent intel_pstate=active/passive/disable discussions apply, but maybe try booting with various values of kernel parameter itel_pstate?

Yeah, the pstate was my first hope, though then I realised the actual Kernel version in the bootloader entry when I rebooted today morning.

I might give it a try anyway.

My computer got slow again, though not as extreme as 2 weeks ago.

I did a more in depth check of the system and even checked my zpool status. ZFS is trimming.

I skimmed the journals and indeed it seems as if in fact trimming was running back when I had the problems and finished shortly after the last restart.

I need to have an eye on this.

Can zfs trimming being throttled with ionice? If it can, you make sure zfs doesn’t saturate your disk subsystem , making a trim while your using the system a little more bearable.

I see why more people just have a large ZFS pools on a seperate network attached storage machine, on a separate system, and connect via NFS or ISCSI, or network block devices to it… :-).

Not sure… I am not even sure since when trimming is active at all. I can not find any setting in my configuration that would have enabled it (and even worse, therefore I can not change its frequency).

Though as I understand the process, the unit just runs zpool trim rpool on a weekly basis. This will return within split seconds, and the actual trim happens within the kernel module.

Though I also have to confess, I read the journal wrong. What I reported as “finished shortly after reboot” was the regular re-initialisation of the timer after a reboot. Anyway, todays observed duration of ~7h makes it feel as if slowness that day could be related to trimming.

is there any way to add an m.2 to your system?

Using ZFS at root, with spinning disks seems like it would be really slow to begin with.

No one said anything about a spinning disk, its an SSD in the laptop. Though I won’t upgrade any of its components anymore unless I really have to. I hope to be able to buy a new laptop at the end of '21 instead :smiley:

end of '21, we’re all be using RISC-V by then ;-).