Having issues with recursive symlinks

I use an impermanence (erase your darlings) setup and recently ran into issues with what I think is a recursive symlink. Does anyone have any ideas? I outlined the problems im facing in the link below. My theories right now are Postgres itself or a small change to systemd tmpfiles, linked in the below PR

Okay, I reduced the surface area to some commit that happened between these 2 (I just worked off of builds I found in Hydra to reduce my testing time) eb62e6aa39ea67e0b8018ba8ea077efe65807dc8 (2025-01-14) ā†’ b4622e7a25f1df3f40c9c649b990cb7f4820ed33 (2025=01-19)

Message I wrote on the PR linked, which isnā€™t relevant anymore:

Is it possible that this could cause an infinite loop in a symlink (if you symlink your postgresql data dir)? I just upgraded nixpkgs and got this unpleasant error:

I use this for symlinking to my ā€œpermanentā€ state directory:

  systemd.tmpfiles.rules = [
    "L /var/lib/postgresql - - - - /persist/var/lib/postgresql"
    "L /var/lib/iwd - - - - /persist/var/lib/iwd"
    "L /var/kolide-k2 - - - - /persist/var/kolide-k2"
  ];

I made a NixOS test

with import <nixpkgs> {};

testers.nixosTest {
  name = "foo";
  nodes.machine = {
    systemd.tmpfiles.rules = [
      "D /foo 0700 root root"
      "L /var/lib/foo - - - - /foo"
    ];

    systemd.services.foo = {
      wantedBy = [ "multi-user.target" ];
      serviceConfig.StateDirectory = "foo";
      serviceConfig.ExecStart = "${coreutils}/bin/touch %S/foo/baz";
      serviceConfig.Type = "oneshot";
      serviceConfig.RemainAfterExit = true;
    };
  };

  testScript = ''
    machine.wait_for_unit("foo.service")
    print(machine.succeed("stat /var/lib/foo/baz"))
  '';
}

And indeed it appears this is a regression in systemd 257. With NixOS 24.11, which uses systemd 256, this works fine. On unstable, which uses 257, it fails with Too many levels of symbolic links.

2 Likes

You can just use a bind mount instead of a symlink though, which impermanence has nixos options for. Iā€™m surprised you werenā€™t already using those

2 Likes

Ahhhh, I figured it might be the systemd upgrade. Thank you so much for writing the test and confirming.

I am not actually using the impermanence project. I am just using my own hand rolled minimal solution. I will look at switching to impermanence or just switching my setup to use bind mounts!

Bind mount should be easy

fileSystems."/var/lib/foo" = {
  device = "/persist/var/lib/foo";
  options = [ "bind" ];
};

Does this pass your NixOS test? Iā€™m still getting the error with bind mounts

with import <nixpkgs> {};

testers.nixosTest {
  name = "foo";
  nodes.machine = {
    virtualisation.fileSystems."/var/lib/foo" = {
      device = "/foo";
      options = [ "bind" ];
    };

    systemd.services.foo = {
      wantedBy = [ "multi-user.target" ];
      serviceConfig.StateDirectory = "foo";
      serviceConfig.ExecStart = "${coreutils}/bin/touch %S/foo/baz";
      serviceConfig.Type = "oneshot";
      serviceConfig.RemainAfterExit = true;
    };
  };

  testScript = ''
    machine.wait_for_unit("foo.service")
    print(machine.succeed("stat /var/lib/foo/baz"))
    print(machine.succeed("stat /foo/baz"))
  '';
}

yes (ignore that itā€™s virtualisation.fileSystems instead of just fileSystems; thatā€™s an artifact of the test framework)

Is the bind mountā€™s source also a symlink? Because obviously thatā€™ll just create the same problem.

1 Like

Okay, the issue was that I needed to manually delete the symlinks that were previously created by the systemd tmpfiles. I assume they just werenā€™t cleaned up for some reason!

Yes, removing a tmpfiles rule will not delete the result of said rule.

Oh, I see, the symlink was being dereferenced by the mount. So since the symlink that was there pointed at the /persist/var/lib/foo directory already, you were effectively mounting like mount --bind /persist/var/lib/foo /persist/var/lib/foo, which obviously isnā€™t helpful.

A bit off topic: is there a reason why you donā€˜t use the dataDir option to set it directly to your persistent storage? I do this for all persistent storage whenever possible so I donā€™t have to use neither symlinks nor bind mounts.

That would work for postgres (it was probably an oversight on my part) but Iā€™d still need bind mounts for iwd and others