Running Nix(OS) containers directly from the store with podman

I’ve been building containers with Nix and wanted to avoid running podman load < result every time I did a change.
I realized you could directly run the copyToRoot derivation of buildImage rather than build a full image and then run it.
For the classic hello world container:

pkgs.dockerTools.buildImage {
    name = "hello";
    tag = "latest";
    copyToRoot = pkgs.hello;
    config = { cmd = [ "/bin/hello" ]; };
};

You can also run it using:

$ nix build nixpkgs#hello
$ podman run -ti --rm -v /nix/store:/nix/store --rootfs ./result:O /bin/hello

The Nix store is shared with the container for the dependencies, and the overlay :O option is used on --rootfs to make it writable in the container. You could share only the closure of the root instead of the whole nix store with nix path-info --recursive nixpkgs#hello.

It also works with a NixOS root, but I had issues with the /etc symlink so I made a function to copy it instead:

system-container-root = (system:
  let
    pkgs = system.pkgs;
  in (system.pkgs.runCommand "nixos-container-root" {
    nixos = system.config.system.build.toplevel;
  } ''
    export PATH=${pkgs.coreutils}/bin
    cp -r $nixos/ $out
    chmod u+w $out
    rm -f $out/etc
    mkdir $out/etc
    mkdir $out/sbin
    cp $out/init $out/sbin/init
    cp -r $nixos/etc/* $out/etc/
  '')
);

I also copy /init to /sbin/init so podman can automatically detect that it’s running under systemd.

You can then build a root for the container:

(system-container-root (nixpkgs.lib.nixosSystem {
  system = "x86_64-linux";
  modules = [
    ({ pkgs, modulesPath, ... }: {
      imports = [
        "${toString modulesPath}/virtualisation/docker-image.nix"
      ];
      boot.isContainer = true;
      services.journald.console = "/dev/console";
      services.getty.autologinUser = "root";
      services.nginx.enable = true;
    })
  ];
}));

I’ve put this example in a flake that you build with:

$ nix build .#nixos-container

And run it. You have to copy the root directory so it is owned by your user, I think due to a limitation of podman (if someone has a clue): cp -r result root, or run it as root:

$ podman run -ti --rm -p 8080:80 -v /nix/store:/nix/store --rootfs ./root:O /sbin/init
$ curl http://localhost:8080

Hope it helps, I’m curious if someone has similar tricks?

15 Likes

Note that this will probably break if you modify the host’s Nix store while the container is running. On Linux, an overlay filesystem whose underlying storage has been modified will exhibit undefined behaviour (inconsistent VFS state).

2 Likes

As long as you have a gc root for the derivation that is used in the container, it should be fine?

I played around with this and did not need the system-container-root function. It is possible to boot a system derivation directly:

(import <nixpkgs/nixos> {
  configuration = {
    imports = [
      ({ modulesPath, ... }: { imports = [(modulesPath + "/profiles/minimal.nix")];})
      ({ pkgs, lib, ... }: {
        config = {
          environment.systemPackages = [
            pkgs.coreutils
            pkgs.python3
            pkgs.wget
          ];

          boot.specialFileSystems = lib.mkForce {};
        };
      })
    ];

    config = {
      boot.isContainer = true;
      networking.hostName = "";
      services.journald.console = "/dev/console";
      users.mutableUsers = false;
      #users.allowNoPasswordLogin = true;
      services.getty.autologinUser = "root";
      users.users.root.hashedPassword = "";
      #systemd.services.systemd-logind.enable = false;
      #systemd.services.console-getty.enable = false;
      # Setuid wrappers do not work without this hack:
      boot.postBootCommands = "mkdir /run/wrappers";

      # Disable a ton of stuff we don't need
      networking.dhcpcd.enable = false;
      systemd.oomd.enable = false;
      services.nscd.enableNsncd = false;
      networking.firewall.enable = false;
      services.openssh.startWhenNeeded = false;
      nix.enable = false;
      services.lvm.enable = false;

      system.stateVersion = "24.05";
    };
  };
}).system
nix-build container.nix

podman run \
       --rm \
       -it \
       --volume "/nix/store:/nix/store:ro" \
       --systemd=always \
       --env container=podman \
       --rootfs root:O \
       $(readlink result)/init

I found that login and anything pam was broken because of a missing directory at /run/wrappers. Creating that directory before systemd starts solves that problem.

2 Likes

Nice! It could be that podman or systemd fixed what prevented /etc from being a symlink. IIRC it had something to do with /etc/machineid or something like that.

For the gc, ‘nix build’ does create a GC root, the result/ symlink, so it is not a problem.

I tried that just now, and got Error: running container create option: faccessat root: no such file or directory.

You need to create the directory you use as root for the container. That is the --rootfs option. mkdir root should do. Maybe you need to create the mount point for the store as well, I don’t recall if podman will do that for you. If so, mkdir -p root/nix/store.

2 Likes

Are you actually able to run the exact derivation you posted? Or would I need more things to have a working container?

With it, podman is able to mount the store and execute the init script, but I start getting lots of errors after that: https://pastejustit.com/xjomvqrsgs

In particular,

[...]
Welcome to NixOS 24.11 (Vicuna)!

Initializing machine ID from container UUID.
Failed to mount /run/machine-id (type n/a) on /etc/machine-id (MS_BIND ""): Operation not permitted

[...]

         Mounting /run/wrappers...
[584538.741428] systemd[1]: Mounting /run/wrappers...
[584538.741487] systemd[1]: Rule-based Manager for Device Events and Files was skipped because of an unmet condition check (ConditionPathIsReadWrite=/sys).
[584538.746480] mount[96]: mount: /run/wrappers: permission denied.
[584538.746542] mount[96]:        dmesg(1) may have more information after failed mount system call.
[584538.746598] systemd[1]: run-wrappers.mount: Mount process exited, code=exited, status=32/n/a
[584538.746644] systemd[1]: run-wrappers.mount: Failed with result 'exit-code'.
[FAILED] Failed to mount /run/wrappers.
[584538.746743] systemd[1]: Failed to mount /run/wrappers.
See 'systemctl status run-wrappers.mount' for details.
[DEPEND] Dependency failed for Create SUID/SGID Wrappers.

[...]

Yes, it boots all the way to login, and I am able to sign in as root without a password. Be sure that you have the following in the system config:

      # Setuid wrappers do not work without this hack:
      boot.postBootCommands = "mkdir /run/wrappers";

Without that, I got an error as well, and login would not succeed. I could get to an emergency shell, but su and sudo would not work. There is a few things that are important in the system configuration. isContainer = true; is a requirement as well.

Also, I was using 24.05. I did not try with 24.11 yet. I pushed the scripts here if you want to try the exact version. I think it’s the same as pasted above.

I ran it just now and put the boot log here.

I do use the exact derivation you have posted with all options you have included. However, I have been testing on unstable instead. When I use 24.05, it does work fine. So I guess something has changed between then and now that breaks the boot.postBootCommands = "mkdir /run/wrappers";.

Do you know why NixOS fails to create that directory by itself in the first place?

I don’t know. Perhaps it is because we remove all the special file system mounts.

Nothing significant changed for the service that creates the wrappers. I’m not entirely sure what is going on.

BTW, this:

podman run \
       --rm \
       -it \
       --volume "/nix/store:/nix/store:ro" \
       --mount=type=tmpfs,tmpfs-size=512M,destination=/run \
       --mount=type=tmpfs,tmpfs-size=512M,destination=/run/wrappers \
       --systemd=always \
       --env container=podman \
       --rootfs root:O \
       $(readlink result)/init

is a less hacky way to setup the wrapppers directory, instead of using postBootCommands, and it also doesn’t cause the problems I reported above on 24.11 and unstable.

1 Like

This is working pretty nicely. :+1: The only issue I’ve found so far is that all systemd services that have serviceConfig.DynamicUser = true; break with a Failed to set up mount namespacing: Operation not supported inside podman. Would greatly appreciate if someone got a solution for that.

1 Like

Maybe it’s linked to this issue? systemd-journal-upload.service not starting in podman container · Issue #29860 · systemd/systemd · GitHub

So you would need to give CAP_SYS_ADMIN privileges to the container for now…

2 Likes

Thanks for the link. That was indeed the problem. :slight_smile: It seems the security implications of this are not as bad as it sounds: Security implications of --cap-add=CAP_SYS_ADMIN for rootless containers? · containers/podman · Discussion #23558 · GitHub

2 Likes