Need help debugging my computers being unresponsive while having high disk i/o

Hi there :wave:

I’m a long time NixOS user and really love using it on my laptop, desktop and servers. However I do not consider myself an expert or anywhere near. This is why I’m asking here and hopefully someone can help me figure this out.

Basically since I began using NixOS on my desktop and laptop, I am having issues with the systems being unresponsive when they have disk i/o. For example when running nixos-rebuild switch, home-manager switch, when updating steam games or when nixos-collect-garbage runs I am having issues with the system being basically unusable. Basically every time my HDD LED is on for a longer time period the whole system freezes every few seconds for one to ~30 seconds. Even the mouse pointer is not movable and sound output stops.

I have tried to somehow debug this, looked at journalctl, tried to tweak my mount options, googled for issues and read like everything I could find on issues like this, but to this day I could not figure it out.

My Nixos config can be found here: https://git.sr.ht/~martinimoe/nixos-config/tree
Affected hosts are the ones with graphical user interfaces (or at least thats where I notice this): galactica and omnissiah. I am using btrfs on these hosts btw (maybe that matters?).

I’d be really happy and thankful if someone here has any ideas on what could cause this problem, or ideas on how to debug this and narrow it down :heart:

Thank you :slight_smile:

might worth to check your hardware and try to monitor resources usage (e.g run htop and keep it open to monitor) , at least to eliminate that possibilities
i have similar symptoms when running on devices with little available resources but kinda expected

Since you mentioned that you’re using btrfs. I had similar freezing issues years ago with a Fedora laptop with an encrypted btrfs drive.

I resolved those freezes by mounting btrfs partitions with noatime mount option. For more info see: BTRFS SPECIFIC MOUNT OPTIONS — BTRFS documentation

Also, before trying the noatime option, I had some reduction on the freezing when I removed read and write work queues from the encrypted drive. For this, see: dm-crypt/Specialties - ArchWiki

yeah! i have these in my configuration.nix:

  fileSystems = { # REMINDER: btrfs only works with "zstd"!
    "/".options = [ "compress=zstd" ];
    "/nix".options = [ "compress=zstd" "noatime" ];
  };

and i had no issues since! :smiley_cat:

Additionally, this sounds a lot like thrashing. You wouldn’t really expect that to look like this on modern systems, but with a particularly full SSD nearing the end of its life and little RAM… Keeping an eye on memory use with htop or btm sounds like a great idea. Also double check your backups.

linux_xanmod with its realtime scheduling might also hide the effects of IO work a little better from humans, if this turns out to be safe to ignore.

Edit: You’re using a swapfile-on-btrfs-on-luks? That might be worth double checking. clamav might also be a culprit, especially coupled with little ram and an… interesting swapfile. clamav is particularly memory hungry.

Thank you all so much for your answers! I really appreciate it <3

Lets go through your ideas:

  • I will keep btm running on my next rebuild and watch memory usage
  • I already have noatime as a mount option :confused:
  • I set boot.initrd.luks.devices.<name>.bypassWorkqueues to true for my LUKS device
  • I disabled ClamAV

Now I will test some heavy I/O tasks and give feedback here.

As for the SWAP file: That also sounds like a really good explanation! Unfortunately I am not sure how I can easily test this without repartitioning my disk :confused: But will go for this, when the other ideas do not help!

Again thank you all so much :pray: Awesome help here! :slight_smile:

Okay, so my first impression after applying the changes I described in the earlier post:

  • The system seems to temporarily freeze less often
  • Memory utilization was never above 60%
  • The freezes did only occur when swap was used, but not every time swap was used

So I will probably test this some more and then maybe on the weekend I will repartition my disks to have a seperate swap partition.

if possible, check your drive health also specs (read/write speed), cause if not “good enough”, swap I/O may be affected and may not give the expected performance

you also don’t need to re-partition your disks, still a good idea to do so if you can confirm that swap helps performances and to have a clean install

They’re already using a swapfile. Using a swapfile on btrfs has significant limitations, you should really consider those before just using a swapfile with this filesystem. It’s why I’m suggesting not doing that.

If I read it correctly it’s also a really tiny swapfile (1KB?), so I’m not sure it serves any useful purpose. I think it’s worth repartitioning and having a dedicated partition for this if you really want swap (it’s useful for sleep, but you need more than 1KB).

That said, just running without any swap at all (just use swapoff <filename>) could also confirm or deny this being the culprit.

Swap does have some purposes besides serving as a dumping ground for memory pages that don’t fit (e.g. the kernel sometimes uses it to store parsed data in its binary representation so parsing can be skipped down the line without taking up memory), but its performance impact if you have sufficient memory is incredibly minor.

Right I am using a swap file with the size of 16 GB:

 ❯ free -h
               total        used        free      shared  buff/cache   available
Mem:            15Gi       4,9Gi       541Mi       1,2Gi        11Gi        10Gi
Swap:           15Gi       2,5Mi        15Gi

I did a short SMART test and read the results, the drive seems to be fine as far as I can tell?

 ❯ nix run 'nixpkgs#smartmontools' -- --test=short /dev/sdc
smartctl 7.5 2025-04-30 r5714 [x86_64-linux-6.12.34] (local build)
Copyright (C) 2002-25, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===
Sending command: "Execute SMART Short self-test routine immediately in off-line mode".
Drive command "Execute SMART Short self-test routine immediately in off-line mode" successful.
Testing has begun.
Please wait 2 minutes for test to complete.
Test will complete after Fri Jun 27 10:01:34 2025 CEST
Use smartctl -X to abort test.



 ❯ nix run 'nixpkgs#smartmontools' -- -a /dev/sdc
smartctl 7.5 2025-04-30 r5714 [x86_64-linux-6.12.34] (local build)
Copyright (C) 2002-25, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Crucial/Micron Client SSDs
Device Model:     CT480BX500SSD1
Serial Number:    2044E4C2FB58
LU WWN Device Id: 5 00a075 1e4c2fb58
Firmware Version: M6CR041
User Capacity:    480.103.981.056 bytes [480 GB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    Solid State Device
Form Factor:      2.5 inches
TRIM Command:     Available
Device is:        In smartctl database 7.5/5706
ATA Version is:   ACS-3 T13/2161-D revision 4
SATA Version is:  SATA 3.3, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Fri Jun 27 10:02:13 2025 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x02)	Offline data collection activity
					was completed without error.
					Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0)	The previous self-test routine completed
					without error or no self-test has ever 
					been run.
Total time to complete Offline 
data collection: 		(  120) seconds.
Offline data collection
capabilities: 			(0x11) SMART execute Offline immediate.
					No Auto Offline data collection support.
					Suspend Offline collection upon new
					command.
					No Offline surface scan supported.
					Self-test supported.
					No Conveyance Self-test supported.
					No Selective Self-test supported.
SMART capabilities:            (0x0002)	Does not save SMART data before
					entering power-saving mode.
					Supports SMART auto save timer.
Error logging capability:        (0x01)	Error logging supported.
					General Purpose Logging supported.
Short self-test routine 
recommended polling time: 	(   2) minutes.
Extended self-test routine
recommended polling time: 	(  10) minutes.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   100   100   000    Pre-fail  Always       -       0
  5 Reallocate_NAND_Blk_Cnt 0x0032   100   100   010    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       911
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       398
171 Program_Fail_Count      0x0032   100   100   000    Old_age   Always       -       0
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       0
173 Ave_Block-Erase_Count   0x0032   097   097   000    Old_age   Always       -       24
174 Unexpect_Power_Loss_Ct  0x0032   100   100   000    Old_age   Always       -       29
180 Unused_Reserve_NAND_Blk 0x0033   100   100   000    Pre-fail  Always       -       46
183 SATA_Interfac_Downshift 0x0032   100   100   000    Old_age   Always       -       4
184 Error_Correction_Count  0x0032   100   100   000    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
194 Temperature_Celsius     0x0022   066   053   000    Old_age   Always       -       34 (Min/Max 21/47)
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0
197 Current_Pending_ECC_Cnt 0x0032   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   100   100   000    Old_age   Always       -       0
202 Percent_Lifetime_Remain 0x0030   097   097   001    Old_age   Offline      -       3
206 Write_Error_Rate        0x000e   100   100   000    Old_age   Always       -       0
210 Success_RAIN_Recov_Cnt  0x0032   100   100   000    Old_age   Always       -       0
246 Total_LBAs_Written      0x0032   100   100   000    Old_age   Always       -       11449376677
247 Host_Program_Page_Count 0x0032   100   100   000    Old_age   Always       -       357793021
248 FTL_Program_Page_Count  0x0032   100   100   000    Old_age   Always       -       482046336
249 Unkn_CrucialMicron_Attr 0x0032   100   100   000    Old_age   Always       -       0
250 Read_Error_Retry_Rate   0x0032   100   100   000    Old_age   Always       -       0
251 Unkn_CrucialMicron_Attr 0x0032   100   100   000    Old_age   Always       -       4106454187
252 Unkn_CrucialMicron_Attr 0x0032   100   100   000    Old_age   Always       -       3
253 Unkn_CrucialMicron_Attr 0x0032   100   100   000    Old_age   Always       -       0
254 Unkn_CrucialMicron_Attr 0x0032   100   100   000    Old_age   Always       -       212
223 Unkn_CrucialMicron_Attr 0x0032   100   100   000    Old_age   Always       -       71

SMART Error Log not supported

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%       911         -

Selective Self-tests/Logging not supported

The above only provides legacy SMART information - try 'smartctl -x' for more

I also tested the write speeds of my SSD, which seem to be fine i guess:

 ❯ dd if=/dev/zero of=/tmp/test.img bs=1G count=20 oflag=dsync
20+0 records in
20+0 records out
21474836480 bytes (21 GB, 20 GiB) copied, 49,4217 s, 435 MB/s

Also some read testing with hdparm:

 ❯ nix run 'nixpkgs#hdparm' -- -Tt /dev/mapper/nixenc

/dev/mapper/nixenc:
 Timing cached reads:   19918 MB in  2.00 seconds = 9981.12 MB/sec
 Timing buffered disk reads: 1392 MB in  3.00 seconds = 463.31 MB/sec
1 Like