Some libraries/packages can be quite RAM-hungry during compilation (for example, scientific/CUDA packages). Unfortunately, Linux is generally not great about handling memory pressure. This isnât really a nix-specific problem, however nix/NixOS appears to handle this issue especially poorly.
Case in point, just now I was rebuilding my NixOS system after bumping the nixpkgs
version, which triggered most of my overlay
ed scientific packages to be rebuilt. Unsurprisingly, this build eventually ran out of memory.
To add insult to injury, the OOM killer has for some reason decided to kill .kwin_x11-wrapped
, .plasmashell-wrapped
, tilda
(my terminal emulator) and a bunch of other random service processes, instead of the build daemon.
Unsaved documents were lost.
My laptop has 32Gb of RAM, so this isnât exactly a âthinâ machine. I use ZFS, so itâs possible that the memory pressure was made slightly worse due to the ARC not shrinking fast enough, however the zfs_arc_max
should be 50% of RAM by default, so at least ~16Gb should have still been available for the build.
I should also mention that this is not the first time I am running into this issue (although all the previous times didnât result in a dead graphical session). After my first run in with this problem, I started running all my full system builds with nix build --cores 10
(down from the default core count of 20
) in the hopes of reducing the number of simultaneously built packages. I am hesitant to reduce this setting any further, as I actually want to build most packages in parallel. Itâs just the huge scientific packages that are âpoor tenantsâ.
So I guess, there are 2 issues here:
-
Some derivations can be exceptionally memory hungry. Is there any way to make nix play more nicely with such derivations (especially, when there are multiple such packages in a closure)?
-
Assuming that an OOM does occur during a build, can we reduce the blast radius of the OOM killer (at least on NixOS)?
Some possible ideas:
-
Assign build processes higher OOM killer scores (or maybe even monitor the memory usage in the daemon and eagerly kill any runaway builds before they trigger the OOM)?
-
Enforce resource limits via cgroups (or similar)?
-
Assuming that we can contain/detect the OOM condition, maybe we could optionally retry building âfatâ derivations one at a time (without resource contention due to parallel builds).
Any thoughts/recommendations?
Edit: For anyone finding this issue in the future, I ended up adding the following to my config. Itâs probably a bit aggressive, but I havenât had any dead sessions since making these changes.
# OOM configuration:
systemd = {
# Create a separate slice for nix-daemon that is
# memory-managed by the userspace systemd-oomd killer
slices."nix-daemon".sliceConfig = {
ManagedOOMMemoryPressure = "kill";
ManagedOOMMemoryPressureLimit = "50%";
};
services."nix-daemon".serviceConfig.Slice = "nix-daemon.slice";
# If a kernel-level OOM event does occur anyway,
# strongly prefer killing nix-daemon child processes
services."nix-daemon".serviceConfig.OOMScoreAdjust = 1000;
};