A while ago I started getting errors like the following and basically I am wondering how to interpret them and what I should do about them?
Jul 18 16:45:50 myserver systemd[1]: zfs-snapshot-frequent.service: Failed to add control inotify watch descriptor for control group /system.slice/zfs-snapshot-frequent.service: No space left on device
Jul 18 16:45:50 myserver systemd[1]: zfs-snapshot-frequent.service: Failed to add memory inotify watch descriptor for control group /system.slice/zfs-snapshot-frequent.service: No space left on device
Jul 23 18:39:27 myserver systemd[1]: systemd-tmpfiles-clean.service: Failed to add control inotify watch descriptor for control group /system.slice/systemd-tmpfiles-clean.service: No space left on device
Jul 23 18:39:27 myserver systemd[1]: systemd-tmpfiles-clean.service: Failed to add memory inotify watch descriptor for control group /system.slice/systemd-tmpfiles-clean.service: No space left on device
Jul 23 20:10:56 myserver systemd[1]: session-20.scope: Failed to add control inotify watch descriptor for control group /user.slice/user-1000.slice/session-20.scope: No space left on device
Jul 23 20:10:56 myserver systemd[1]: session-20.scope: Failed to add memory inotify watch descriptor for control group /user.slice/user-1000.slice/session-20.scope: No space left on device
Some background:
Google was of some help, but I did not have time to really understand the problem. I do not understand what it does, but the following solved the problem temporarily:
boot.kernel.sysctl = {
"fs.inotify.max_user_watches" = "16384"; # 2 times the default 8192
};
Recently the same kind of errors returned, and I have now fixed it (temporarily?) by increasing the value even more:
boot.kernel.sysctl = {
"fs.inotify.max_user_watches" = "1048576"; # 128 times the default 8192
};
Most complaints were from the zfs-snapshot-frequent.service, but I assume that is only because it is the one most frequently executed. In any case, the machine has been running NixOS with two ZFS pools more or less flawlessly for several years. No pool is out of space but one of the pools has 90% usage as of now (was around 80% for a long time, but I think it was around 90% for at least some months before the errors appeared). My ZFS related settings are:
boot.supportedFilesystems = [ "zfs" ];
services.zfs.autoSnapshot.enable = true;
services.zfs.autoScrub.enable = true;
services.zfs.autoScrub.interval = "Sun, 05:00";
fileSystems."/data" = {
device = "data";
fsType = "zfs";
};
fileSystems."/backup" = {
device = "backup";
fsType = "zfs";
};