Nix build ate my RAM 😭

ruro · November 20, 2023, 2:47pm

Some libraries/packages can be quite RAM-hungry during compilation (for example, scientific/CUDA packages). Unfortunately, Linux is generally not great about handling memory pressure. This isn’t really a nix-specific problem, however nix/NixOS appears to handle this issue especially poorly.

Case in point, just now I was rebuilding my NixOS system after bumping the nixpkgs version, which triggered most of my overlayed scientific packages to be rebuilt. Unsurprisingly, this build eventually ran out of memory.

To add insult to injury, the OOM killer has for some reason decided to kill .kwin_x11-wrapped, .plasmashell-wrapped, tilda (my terminal emulator) and a bunch of other random service processes, instead of the build daemon.

Unsaved documents were lost.

My laptop has 32Gb of RAM, so this isn’t exactly a “thin” machine. I use ZFS, so it’s possible that the memory pressure was made slightly worse due to the ARC not shrinking fast enough, however the zfs_arc_max should be 50% of RAM by default, so at least ~16Gb should have still been available for the build.

I should also mention that this is not the first time I am running into this issue (although all the previous times didn’t result in a dead graphical session). After my first run in with this problem, I started running all my full system builds with nix build --cores 10 (down from the default core count of 20) in the hopes of reducing the number of simultaneously built packages. I am hesitant to reduce this setting any further, as I actually want to build most packages in parallel. It’s just the huge scientific packages that are “poor tenants”.

So I guess, there are 2 issues here:

Some derivations can be exceptionally memory hungry. Is there any way to make nix play more nicely with such derivations (especially, when there are multiple such packages in a closure)?
Assuming that an OOM does occur during a build, can we reduce the blast radius of the OOM killer (at least on NixOS)?

Some possible ideas:

Assign build processes higher OOM killer scores (or maybe even monitor the memory usage in the daemon and eagerly kill any runaway builds before they trigger the OOM)?
Enforce resource limits via cgroups (or similar)?
Assuming that we can contain/detect the OOM condition, maybe we could optionally retry building “fat” derivations one at a time (without resource contention due to parallel builds).

Any thoughts/recommendations?

Edit: For anyone finding this issue in the future, I ended up adding the following to my config. It’s probably a bit aggressive, but I haven’t had any dead sessions since making these changes.

  # OOM configuration:
  systemd = {
    # Create a separate slice for nix-daemon that is
    # memory-managed by the userspace systemd-oomd killer
    slices."nix-daemon".sliceConfig = {
      ManagedOOMMemoryPressure = "kill";
      ManagedOOMMemoryPressureLimit = "50%";
    };
    services."nix-daemon".serviceConfig.Slice = "nix-daemon.slice";

    # If a kernel-level OOM event does occur anyway,
    # strongly prefer killing nix-daemon child processes
    services."nix-daemon".serviceConfig.OOMScoreAdjust = 1000;
  };

Artturin · November 20, 2023, 6:03pm

There is rudimentary support for cgroups nix.conf - Nix Reference Manual

The pr which added it https://github.com/NixOS/nix/pull/3600 had a “in the future”

It adds an experimental feature cgroups that causes builds to be executed in a cgroup. This allows getting some statistics from a build (such as CPU time) and in the future may allow setting resource limits. But it mainly exists because the uid-range feature requires it.

Mic92 · November 20, 2023, 6:37pm

You can limit resources in the nix-daemon systemd services. In srvos we for example change process priority: https://github.com/nix-community/srvos/blob/c89d0acb7c447a85f9f3d751321e9012ea21e8e1/nixos/common/nix.nix#L21 You maybe want to adjust the oom score to make it more likely to kill a build instead of precious other services (OOMScoreAdjust=, see systemd.exec)

TLATER · November 20, 2023, 6:44pm

Bazel has a concept of marking specific tasks as “big” (as well as small), which has implications on scheduling. I think it only does this for tests, presumably it has fine-grained enough understanding of the build process itself to avoid scheduling too much memory-heavy work.

Since nix is not particularly fine-grained, it may be able to schedule things a bit better if derivations could contain a hint for how “large” a build is? We already kind of have some control over scheduling with runCommandLocal and whatnot, so it’s not entirely far-fetched.

SergeK · November 20, 2023, 7:16pm

Yup. Idk how to fix this (well, cgroups, but idk how), I use

  nix.settings.max-jobs = 16;
  nix.settings.max-silent-time = let minute = 60; in 120 * minute;
  services.hercules-ci-agent = {
    settings.concurrentTasks = 4;
  };
  services.earlyoom = {
    enable = true;
    enableNotifications = true;
    extraArgs =
      let
        catPatterns = patterns: builtins.concatStringsSep "|" patterns;
        preferPatterns = [
          ".firefox-wrappe"
          "hercules-ci-age"
          "ipfs"
          "java" # If it's written in java it's uninmportant enough it's ok to kill it
          ".jupyterhub-wra"
          "Logseq"
        ];
        avoidPatterns = [
          "bash"
          "mosh-server"
          "sshd"
          "systemd"
          "systemd-logind"
          "systemd-udevd"
          "tmux: client"
          "tmux: server"
        ];
      in
      [
        "--prefer '^(${catPatterns preferPatterns})$'"
        "--avoid '^(${catPatterns avoidPatterns})$'"
      ];
  };

nixinator · November 20, 2023, 7:58pm

can’t you configure the OOM killers, to favour you most critical processes, and kill the the build processes , like the children of the nix build daemon.

https://updates.virtuozzo.com/doc/pcs/en_us/virtuozzo/6/current/html/Virtuozzo_Users_Guide/35935.htm

in fact, i’m not sure why NixOS doesn’t come with the this preconfigured…but i suppose it could be.

Atemu · November 21, 2023, 12:01am

I think this would be the best solution. There is also kind of precedent for this with the big-parallel nix feature.

Nix could be made to assume big-parallel builds will use all the available resources and therefore schedule fewer builds while a big-parallel build is running (or even none).

Atemu · November 21, 2023, 12:31am

cgroups won’t really help here. Limiting the memory of a build will only serve to get it killed and therefore fail.

The issue lies in that we must tell the build process (as in: make or ninja) how parallel it can be once upfront while actual permittable parallelism during the build is very dynamic.

make (and likely other build tools aswell) have a s solution to that by using the system load factor (how many processes are waiting to be scheduled) to limit process spawning and we used to have that in Nixpkgs but it turned out to be too limiting for Hydra, so it was removed: treewide: drop -l$NIX_BUILD_CORES by grahamc · Pull Request #192447 · NixOS/nixpkgs · GitHub
A solution to that is Allow configuration of load limit for nix builds · Issue #7091 · NixOS/nix · GitHub.

ruro · November 21, 2023, 12:23pm

@Artturin yeah, I remember reading about this. However, I am not very familiar with cgroups, so I would prefer if there was a built-in way to configure this from nix rather than having to bodge it together myself.

@Mic92 @nixinator I am assuming that you are suggesting to set systemd.services.nix-daemon.serviceConfig.OOMScoreAdjust? I am not sure, if that would do the “right thing™”. Wouldn’t this result in killing the nix-daemon process itself instead of the “RAM-hungry” build process?

If this does indeed work like we want, maybe it should be exposed as nix.daemonOOMScoreAdjust and set to some sane default value out of the box?

This should work as a short term fix for the “don’t let builds kill the graphical session” problem, but we might also want to improve “soft” OOM handling (not letting this happen in the first place OR automatically retrying/recovering when the build failed due to parallel build induced memory pressure).

@SergeK I think I remember trying earlyoom back when I was using Arch. I’ll take another look at it. Thank you for the recommendation.

@TLATER @Atemu building big-parallel (or maybe some new class like big-memory) tasks one at a time actually makes a lot of sense.

Also, regarding cgroups/kernel OOM killer/userspace OOM killer configurations, I wouldn’t write them off so quickly. It’s true that they won’t allow the currently failing builds to succeed. However, it is inevitable that there will always be some derivations that occasionally run out of memory, despite our best efforts.

I think that constraining such build processes ought to be included in the “build sandboxing” that nix provides (I am aware, that the nix build sandboxing is meant more for reproducibility rather than as a security feature, but my point still stands).

Atemu · November 21, 2023, 1:36pm

An additional optimisation that just sprung to my mind would be that builds marked to be preferLocalBuild could be ran with higher job count than the system default as they’re assumed to be tiny.

Atemu · November 21, 2023, 2:04pm

Certainly. For properly killing a whole build cgroups are an obvious boon. This would also allow the likes of systemd-oomd to kill builds pre-emtively.
Now, ideally, nix could even get smart enough to restart such a failed build because it didn’t actually fail because of some property of the build itself but rather an “environmental” factor; an impurity.

RaitoBezarius · November 21, 2023, 2:30pm

cgroups are also useful to freeze the problematic processes to avoid complete exhaustion and system reset.

Unfortunately, I think it’s not easy to have a way to put all the nix-daemon (in case of multi-user) in the same cgroup…