Having issues with recursive symlinks

I use an impermanence (erase your darlings) setup and recently ran into issues with what I think is a recursive symlink. Does anyone have any ideas? I outlined the problems im facing in the link below. My theories right now are Postgres itself or a small change to systemd tmpfiles, linked in the below PR

Okay, I reduced the surface area to some commit that happened between these 2 (I just worked off of builds I found in Hydra to reduce my testing time) eb62e6aa39ea67e0b8018ba8ea077efe65807dc8 (2025-01-14) → b4622e7a25f1df3f40c9c649b990cb7f4820ed33 (2025=01-19)

Message I wrote on the PR linked, which isn’t relevant anymore:

Is it possible that this could cause an infinite loop in a symlink (if you symlink your postgresql data dir)? I just upgraded nixpkgs and got this unpleasant error:

I use this for symlinking to my ā€œpermanentā€ state directory:

  systemd.tmpfiles.rules = [
    "L /var/lib/postgresql - - - - /persist/var/lib/postgresql"
    "L /var/lib/iwd - - - - /persist/var/lib/iwd"
    "L /var/kolide-k2 - - - - /persist/var/kolide-k2"
  ];

I made a NixOS test

with import <nixpkgs> {};

testers.nixosTest {
  name = "foo";
  nodes.machine = {
    systemd.tmpfiles.rules = [
      "D /foo 0700 root root"
      "L /var/lib/foo - - - - /foo"
    ];

    systemd.services.foo = {
      wantedBy = [ "multi-user.target" ];
      serviceConfig.StateDirectory = "foo";
      serviceConfig.ExecStart = "${coreutils}/bin/touch %S/foo/baz";
      serviceConfig.Type = "oneshot";
      serviceConfig.RemainAfterExit = true;
    };
  };

  testScript = ''
    machine.wait_for_unit("foo.service")
    print(machine.succeed("stat /var/lib/foo/baz"))
  '';
}

And indeed it appears this is a regression in systemd 257. With NixOS 24.11, which uses systemd 256, this works fine. On unstable, which uses 257, it fails with Too many levels of symbolic links.

2 Likes

You can just use a bind mount instead of a symlink though, which impermanence has nixos options for. I’m surprised you weren’t already using those

2 Likes

Ahhhh, I figured it might be the systemd upgrade. Thank you so much for writing the test and confirming.

I am not actually using the impermanence project. I am just using my own hand rolled minimal solution. I will look at switching to impermanence or just switching my setup to use bind mounts!

Bind mount should be easy

fileSystems."/var/lib/foo" = {
  device = "/persist/var/lib/foo";
  options = [ "bind" ];
};

Does this pass your NixOS test? I’m still getting the error with bind mounts

with import <nixpkgs> {};

testers.nixosTest {
  name = "foo";
  nodes.machine = {
    virtualisation.fileSystems."/var/lib/foo" = {
      device = "/foo";
      options = [ "bind" ];
    };

    systemd.services.foo = {
      wantedBy = [ "multi-user.target" ];
      serviceConfig.StateDirectory = "foo";
      serviceConfig.ExecStart = "${coreutils}/bin/touch %S/foo/baz";
      serviceConfig.Type = "oneshot";
      serviceConfig.RemainAfterExit = true;
    };
  };

  testScript = ''
    machine.wait_for_unit("foo.service")
    print(machine.succeed("stat /var/lib/foo/baz"))
    print(machine.succeed("stat /foo/baz"))
  '';
}

yes (ignore that it’s virtualisation.fileSystems instead of just fileSystems; that’s an artifact of the test framework)

Is the bind mount’s source also a symlink? Because obviously that’ll just create the same problem.

1 Like

Okay, the issue was that I needed to manually delete the symlinks that were previously created by the systemd tmpfiles. I assume they just weren’t cleaned up for some reason!

Yes, removing a tmpfiles rule will not delete the result of said rule.

Oh, I see, the symlink was being dereferenced by the mount. So since the symlink that was there pointed at the /persist/var/lib/foo directory already, you were effectively mounting like mount --bind /persist/var/lib/foo /persist/var/lib/foo, which obviously isn’t helpful.

A bit off topic: is there a reason why you donā€˜t use the dataDir option to set it directly to your persistent storage? I do this for all persistent storage whenever possible so I don’t have to use neither symlinks nor bind mounts.

That would work for postgres (it was probably an oversight on my part) but I’d still need bind mounts for iwd and others