I use an impermanence (erase your darlings) setup and recently ran into issues with what I think is a recursive symlink. Does anyone have any ideas? I outlined the problems im facing in the link below. My theories right now are Postgres itself or a small change to systemd tmpfiles, linked in the below PR
Okay, I reduced the surface area to some commit that happened between these 2 (I just worked off of builds I found in Hydra to reduce my testing time) eb62e6aa39ea67e0b8018ba8ea077efe65807dc8
(2025-01-14) ā b4622e7a25f1df3f40c9c649b990cb7f4820ed33
(2025=01-19)
Message I wrote on the PR linked, which isnāt relevant anymore:
Is it possible that this could cause an infinite loop in a symlink (if you symlink your postgresql data dir)? I just upgraded nixpkgs and got this unpleasant error:
I use this for symlinking to my āpermanentā state directory:
systemd.tmpfiles.rules = [
"L /var/lib/postgresql - - - - /persist/var/lib/postgresql"
"L /var/lib/iwd - - - - /persist/var/lib/iwd"
"L /var/kolide-k2 - - - - /persist/var/kolide-k2"
];
I made a NixOS test
with import <nixpkgs> {};
testers.nixosTest {
name = "foo";
nodes.machine = {
systemd.tmpfiles.rules = [
"D /foo 0700 root root"
"L /var/lib/foo - - - - /foo"
];
systemd.services.foo = {
wantedBy = [ "multi-user.target" ];
serviceConfig.StateDirectory = "foo";
serviceConfig.ExecStart = "${coreutils}/bin/touch %S/foo/baz";
serviceConfig.Type = "oneshot";
serviceConfig.RemainAfterExit = true;
};
};
testScript = ''
machine.wait_for_unit("foo.service")
print(machine.succeed("stat /var/lib/foo/baz"))
'';
}
And indeed it appears this is a regression in systemd 257. With NixOS 24.11, which uses systemd 256, this works fine. On unstable, which uses 257, it fails with Too many levels of symbolic links
.
You can just use a bind mount instead of a symlink though, which impermanence has nixos options for. Iām surprised you werenāt already using those
Ahhhh, I figured it might be the systemd upgrade. Thank you so much for writing the test and confirming.
I am not actually using the impermanence project. I am just using my own hand rolled minimal solution. I will look at switching to impermanence or just switching my setup to use bind mounts!
Bind mount should be easy
fileSystems."/var/lib/foo" = {
device = "/persist/var/lib/foo";
options = [ "bind" ];
};
Does this pass your NixOS test? Iām still getting the error with bind mounts
with import <nixpkgs> {};
testers.nixosTest {
name = "foo";
nodes.machine = {
virtualisation.fileSystems."/var/lib/foo" = {
device = "/foo";
options = [ "bind" ];
};
systemd.services.foo = {
wantedBy = [ "multi-user.target" ];
serviceConfig.StateDirectory = "foo";
serviceConfig.ExecStart = "${coreutils}/bin/touch %S/foo/baz";
serviceConfig.Type = "oneshot";
serviceConfig.RemainAfterExit = true;
};
};
testScript = ''
machine.wait_for_unit("foo.service")
print(machine.succeed("stat /var/lib/foo/baz"))
print(machine.succeed("stat /foo/baz"))
'';
}
yes (ignore that itās virtualisation.fileSystems
instead of just fileSystems
; thatās an artifact of the test framework)
Is the bind mountās source also a symlink? Because obviously thatāll just create the same problem.
Okay, the issue was that I needed to manually delete the symlinks that were previously created by the systemd tmpfiles. I assume they just werenāt cleaned up for some reason!
Yes, removing a tmpfiles
rule will not delete the result of said rule.
Oh, I see, the symlink was being dereferenced by the mount. So since the symlink that was there pointed at the /persist/var/lib/foo
directory already, you were effectively mounting like mount --bind /persist/var/lib/foo /persist/var/lib/foo
, which obviously isnāt helpful.
A bit off topic: is there a reason why you donāt use the dataDir
option to set it directly to your persistent storage? I do this for all persistent storage whenever possible so I donāt have to use neither symlinks nor bind mounts.
That would work for postgres (it was probably an oversight on my part) but Iād still need bind mounts for iwd and others