How to avoid system freeze maybe connected to swap/RAM?

Hi,
I’m using NixOS as a desktop system. Often (roughly 1/3 of the times) when I run anything slightly more demanding like nixos rebuild or opening pycharm, my system gets very slow/unresponsive and after a short while does not react at all until I restart via the power butten.
After activating swap on a HDD (system and most files are on SSD) I don’t get a total freeze but significant waiting time for things like “button notices that the mouse hovers on top” or mouse movement in general. The HDD gets very loud. While this indicates that the swap is being used and since it’s HDD, it’s slow. But even if I end the resource-hungry tasks (like nixos rebuild is finished, 40 Tabs in Firefox are closed, pycharm is closed), the HDD is still loud and the desktop environment is still very slow to respond.
My guess is therefore that I have memory problems but of course it could be something else entirely.

So my questions are:

  1. Which diagnosis possibilities do I have, especially after a forced restart? (When the system is not responding, I also cannot open a terminal or tty to look at swap/RAM usage.)
  2. Can I configure NixOS such that the desktop environment is still responsive if some programs are too resource hungry?
  3. Why is system not moving away from HDD Swap usage when the resource hungry tasks are over? It looks to me like memory is not freed.

I would like to avoid the conclusion “just buy more RAM” since I did not have these problems on Fedora.

My system:

  • system and /home on SSD (470GB used/635GB)
  • 8 GB RAM
  • zram with 50% and priority 10 enabled
  • swap partition with 8GB on HDD
  • NixOS unstable with KDE
  • now (running thunderbird, firefox with a few tabs, system monitor, vim and 2 terminals: used 6.7GB / 7.6 GB physical memory and 1.8 / 11GiB Swap

My nixos configuration.nix (excerpt):

  # memory hungry tasks often freezed my computer. Maybe this helps
  # idea from https://discourse.nixos.org/t/desktop-often-freezes-during-cpu-intense-tasks/41583/5
  # additionally it seems to help to run nix resource hungry tasks with `nice -20` prepended. No idea if they might be killed then
  systemd.oomd = {
    enable = true;              # not really necessary because it's default
    enableRootSlice = true;
    enableUserSlices = true;
    # could try it out:
    # enableSystemSlice = true;   # not in Fedora.
    # extraConfig = {
    #   SwapUsedLimit = "60%";
    #   DefaultMemoryPressureDurationSec = "15s";
    # };
  };
  # boot.kernel.sysctl = { "vm.swappiness" = 10;};
       # default is 60. Range is 0-200. Lower number says use RAM rather than swap.
       # I considered it, didn't try it out yet
  zramSwap = { enable = true;
    priority = 10;  # higher than HDD swap
};
  # it is annoyingly loud:
  swapDevices = [ {
      device = "/dev/disk/by-partuuid/000<someNumbers>";
      randomEncryption.enable = true;
      options = [ "nofail" ];
      priority = 5;
    } ];

As you can see, I already tried to play around with some oomd options but haven’t noticed improvement.

Update: now tried it out: I ran nixos-rebuild. I had to wait for reactions it system-monitor for a while. When it was done, physical memory reduced to 4,8GB but swap usage stays at 4,1 GB with 3,8GB in zram and 325,6MB (out of 8GB) on HDD. These 325.6MB are actually used as I hear the HDD rotating.

1 Like

Probably because you weren’t running nix on Fedora :grin: nix is extremely memory hungry and swap won’t help here. (Also swap on an HDD is wild lol.) zram is making your problem worse, stop using it. If you want to limit mem usage during builds, set max-jobs/max-cores to a lower value. If it’s OOMing during eval then you really need to eval on another machine or buy more ram. (Or switch off nix, if you prefer.)

1 Like

Why does Zram Swap make things worse?

Enable sysrq and invoke the OOM killer interactively using SYSRQ+f. That will get you back to a usable system and the kernel will log which processes it killed.

I’m afraid no OS can remain responsive when running OOM.

Early OOM daemons should be able to mitigate this somewhat though.

Likely because there is no space in memory; making it immediately swap out again.

Upgrade that; it’s not enough for the modern web or electron apps these days.

2 things you shouldn’t do here:

Combine zram swap and physical swap

  • Though compressed, every page that is in zram swap must be backed by physical memory. In the worst case (pages aren’t compressible) you would actually not free any memory by swapping.
  • The kernel will never move pages out of zram swap into other swaps, even when running OOM. It would rather kill init and panic than move a single page out of zram swap. Seriously.
  • If your zram swap is filled with entirely inactive anonymous pages, additional “inactive” pages will be swapped to disk instead even if they’re much more active than those already in zram.
  • Worse yet, those pages in zram take up valuable memory that could have been used to hold regular anonymous pages instead.
  • Because it has higher priority, zram swap is likely to accumulate the most inactive pages over time and will hold onto them until they’re unswapped. That can only happen if they’re accessed which is of course unlikely given that they’re inactive. Once a truly inactive page is in zram, it’s there effectively forever, including during OOM
  • This effectively leads to LRU inversion under memory pressure which is the last thing you want in such an event

Use either zram swap XOR physical swap. Never both at the same time.

Swap on an HDD

HDDs are typically a few orders of magnitude slower than SSDs at random reads, especially at low queue depths.

Unswap causes such unpredictable read operations. This is what you’re hearing; the actuator needs to jump around rapidly.

Put your swap on the SSD instead.

How many tabs exactly and how much memory are they using? You shouldn’t be using 8GB worth of memory with just those apps. I’m currently using that much with a bunch of tabs, two electron apps, another app and emacs running at the same time.

2 Likes

Less ram means more swapping occurs.
Swapping continuously leads to I/O thrashing (CPU is spending more cycles swapping than doing your desired task).
Thrashing is what OP is suffering from.

Thank you very much for the quick and thorough replies.
I’l need some more time to understand and follow your advice. For some I can answer though:

If you want to limit mem usage during builds, set max-jobs/max-cores

I’ll do this.

Enable sysrq and invoke the OOM killer interactively using SYSRQ+f. That will get you back to a usable system and the kernel will log which processes it killed.

I started trying it out, not sure yet what my SYSRQ key is on Lenovo L450 with a Swedish keyboard that replaced the original german keyboard and Neo keyboard layout. Twice Alt+PrtSc did something with a quick black screen, login screen and black screen upon trying to login.
I’ll try it out more.

2 things you shouldn’t do here:

Combine zram swap and physical swap

I disabled zram. Now the HDD is running constantly. Currently with 2 Firefox tabs (almost idle) at 290MB and 3.1GB/7.6 GB RAM.

Swap on an HDD

While swap on SSD would be much slower and quieter, the internet told me that it wears down the SSD quite quickly. Those advice websites also say that the SSDs got better over time (so that might be less of a concern?) but with my 10 year old SSD (Lenovo Thinkpad L450, not sure exactly which SSD) mine might fall under the old category.

My zram configuration, if any help.

{
  # https://www.kernel.org/doc/Documentation/blockdev/zram.txt
  # Compression algorithm. `lzo` has good compression,
  # but is slow. lz4 has bad compression, but is fast.
  # zstd is both good compression and fast, but requires newer kernel.
  # You can check what other algorithms are supported by your zram device with
  # {command} cat /sys/class/block/zram*/comp_algorithm
  zramSwap = {
    enable = true;
    # one of "lzo", "lz4", "zstd"
    algorithm = "zstd";
    # Priority of the zram swap devices.
    # It should be a number higher than the priority of your disk-based swap devices
    # (so that the system will fill the zram swap devices before falling back to disk swap).
    priority = 5;
    # Maximum total amount of memory that can be stored in the zram swap devices (as a percentage of your total memory).
    # Defaults to 1/2 of your total RAM. Run zramctl to check how good memory is compressed.
    # This doesn’t define how much memory will be used by the zram swap devices.
    memoryPercent = 100;
  };  

  # recommended settings from https://wiki.archlinux.org/title/Zram
  # Optimizing swap on zram
  # Since zram behaves differently than disk swap, 
  # we can configure the system's swap to take full potential of the zram advantages:
  boot.kernel.sysctl = {
    "vm.swappiness" = 180;
    "vm.watermark_boost_factor" = 0;
    "vm.watermark_scale_factor" = 125;
    "vm.page-cluster" = 0;
  };
}
1 Like

There is no free lunch, but modern SSDs usually survive atleast 10k writes per cell (a flash optimized filesystem like f2fs will help even with old SSDs and stupid controllers) and i would guess that with normal swap usage the TBW will be not much worse than MS Windows with its default background activity (at least from observing my T400 Thinkpad).
My advice would be a proper backup strategy (reguardless of anything) and running regular smart tests. SSDs are relatively inexpensive nowadays.

2 Likes

Try running a harmless operation such as disk sync via sysrq from a functioning system and observe the kernel log.

Please note that you must set the kernel.sysrq sysctl to 1 in order to be able to trigger any sysrq function other than sync using a keyboard.

That’s more like it but still quite high. What’s using the other 3GiB?

What @zimward said.

I’d also recommend you to just buy a new one. I personally wouldn’t trust a 10 year old SSD to not suddenly go poof.

That reminded me to play around with the swappiness.
Your swappiness of 180 resulted in a lot of HDD usage and lagging window behavior - it’s probably better with your zram configuration.
Beforehand I had swappiness of 10 which was similar.
Now I changed it to 0 and my computer is quiet, using the same amount of swap (1,3GB) as before and responding well with 28 Tabs, Thunderbird, Signal while it had troubles beforehand with just Thunderbird.
So I’ll stay with swappiness = 0 and just HDD-RAM for now.