I’ve been using NixOS for about a year now, and for most of this time, I have been using rootless podman running as a podman
user to persistently run some services for me. Unfortunately, with the upgrade to 23.11, I have found that nixos-rebuild switch
will clobber all running containers, whereas 23.05 left them alone. I’ve been banging my head against this ever since I updated the other day and I’m at a bit of a loss on how I can get this resolved…
Here is a snippet of some of the relevant configuration:
config = {
virtualisation.podman.enable = true
users = {
groups.podman = { gid = 31000; };
users.podman = {
uid = 31000;
linger = true;
group = "podman";
home = "/home/podman";
createHome = true;
subUidRanges = [ { count = 65536; startUid = 615536; } ];
subGidRanges = [ { count = 65536; startGid = 615536; } ];
};
};
systemd.services = {
"podman-compose@" = {
enable = true;
after = [ "podman.service" ];
path = [ "/run/wrappers" ];
serviceConfig = {
ExecStart = [
"${pkgs.podman-compose}/bin/podman-compose --podman-path ${pkgs.podman}/bin/podman --project-name %i up --detach --remove-orphans --build --force-recreate"
];
ExecStop = [
"${pkgs.podman-compose}/bin/podman-compose --podman-path ${pkgs.podman}/bin/podman --project-name %i down"
];
RemainAfterExit = true;
Type = "oneshot";
User = "podman";
Group = "podman";
WorkingDirectory = "/etc/containers/compose/%i";
};
};
};
};
Each of my compose files is setup with the following:
config = {
environment.etc."containers/compose/heimdall/compose.yml".source = ./heimdall/compose.yml;
systemd.services."podman-compose@heimdall" = {
overrideStrategy = "asDropin";
path = [ "/run/wrappers" ]; # https://github.com/NixOS/nixpkgs/issues/219013
wantedBy = [ "machines.target" ];
};
};
In my actual config, I have them more templated, but for this message, I’ve simplified it a bit so that it’s easier to glance at. This gives me the ability to have the containers all managed by an unprivileged user, allows me to have them run at boot, and allows me to manage them with something like systemctl status podman-compose@heimdall.service
.
The error message that gets spit out varies sometimes, but this is one of the ones I have been seeing most persistently:
reloading user units for podman...
Failed to start nixos-activation.service: Transaction for nixos-activation.service/start is destructive (systemd-exit.service has 'start' job queued, but 'stop' is included in transaction).
See user logs and 'systemctl --user status nixos-activation.service' for details.
setting up tmpfiles
When this happens, all of the containers die and complain about losing their socket. I can go through and restart them all with something like systemctl podman-compose@heimdall.service
, but it’s really annoying when in 23.05, they worked without a problem. When I would run nixos-rebuild switch
there, they would continue to run and not even be restarted.
I would be exceptionally grateful if anyone could help me figure out why nixos-activation.service demands that it kill my lingering session as of 23.11.
EDIT: It isn’t nixos-activation.service. I pulled out the switch-to-configuration perl script and I’ve been trying to narrow down when exactly it murders podman. It is well before nixos-activation.service. Also realizing that I never added all the various things I tried. There isn’t much of a point in doing so right now because every single thing I’ve tried besides this has been a red herring.
EDIT 2: It breaks on line 820 of switch-to-configuration:
/nix/store/1pqilys7x9kqdbkzkilsbgilc4w7pmaz-nixos-system-xxx-23.11.20231218.d02ffbb/activate
In between then and line 940, any commands attempted by podman yell about not having crun (Error: default OCI runtime "crun" not found: invalid argument
).
/nix/store/i0sdqs34r68if9s4sfmpixnnj36npiwj-systemd-254.6/bin/systemctl start -- basic.target cryptsetup.target getty.target local-fs.target machines.target multi-user.target network-interfaces.target network-online.target paths.target remote-fs.target slices.target sockets.target sound.target swap.target sysinit.target timers.target
EDIT 3: I got it!
23.11 added a new users.users.<name>.linger option which handles users which should linger. That activate script has a line which does the following:
ls /var/lib/systemd/linger | sort | comm -3 -1 /nix/store/pplsfrc0hqkqdfi7mj43z125ya0kdiy2-lingering-users -
That comm part strips out the users which have users.users.<name>.linger = true;
, so they aren’t reset. When I throw that setting on my podman user, everything works exactly as it used it.
EDIT 4: I updated the scripts at the top to what they should be for 23.11. If you want to see the original version for some reason, click the pencil in the top left of this post.