Nixifying Kubernetes with nix-csi, easykubenix and dinix

{
  pkgs ? import <nixpkgs> { },
}:
let
  sysMap = {
    "x86_64-linux" = "aarch64-linux";
    "aarch64-linux" = "x86_64-linux";
  };
  pkgsCrossish = import pkgs.path { system = sysMap.${builtins.currentSystem}; };

  # You can use flakes, npins, niv, fetchTree, fetchFromGitHub or whatever.
  easykubenix = builtins.fetchTree {
    type = "github";
    owner = "lillecarl";
    repo = "easykubenix";
  };
  ekn = import easykubenix {
    inherit pkgs;
    modules = [
      {
        kluctl = {
          discriminator = "demodeploy"; # Used for kluctl pruning (removing resources not in generated manifests)
          pushManifest = {
            enable = true; # Push manifest (which depends on pkgs.hello) before deploying
            to = "ssh://root@192.168.88.20"; # Shouldn't be root but here we are currently, maybe shouldn't be a module option either?
          };
        };
        kubernetes.resources.none.Pod.hello.spec = {
          containers = {
            _namedlist = true; # This is a meta thing to use attrsets instead of lists
            hello = {
              image = "quay.io/nix-csi/scratch:1.0.1"; # 1.0.1 sets PATH to /nix/var/result/bin
              command = [ "hello" ];
              volumeMounts = {
                _namedlist = true;
                nix.mountPath = "/nix";
              };
            };
          };
          volumes = {
            _namedlist = true;
            nix.csi = {
              driver = "nix.csi.store";
              volumeAttributes.${pkgs.system} = pkgs.hello; # this is stringified into a storepath,
              volumeAttributes.${pkgsCrossish.system} = pkgsCrossish.hello; # this is stringified into a storepath,
              # Now the manifest depends on pkgs.hello so when we push it we bring pkgs.hello and nix-csi can fetch it.
            };
          };
        };
      }
    ];
  };
in
ekn
nix run --file . deploymentScript
…n [☸ kubernetes-admin@shitbox]~/C/nix-csi/demo [🎋 main][!?][🗀 loaded/allowed][🐚fish]
[01:39:34]❯ nix run --file pod.nix deploymentScript
+ nix copy --to ssh://root@192.168.88.20 /nix/store/cr80hnrfl72nc943cr17v6fj2x0vpbaq-manifest.json
+ /nix/store/zpimjgx9k9drc2yvyx0v2kzbrds12y9z-kluctl-2.27.0/bin/kluctl deploy --no-update-check --target local --discriminator demodeploy --project-dir /nix/store/5jkji22hgsabx5qdrx19k7qzl2y8yp2r-kluctlProject
⚠ Failed to detect git project root. This might cause follow-up errors
✓ Initializing k8s client
✓ Rendering templates
✓ Rendering Helm Charts
✓ Building kustomize objects
✓ Postprocessing objects
✓ Writing rendered objects
✓ Getting remote objects by discriminator
✓ Getting 1 additional remote objects
✓ Getting namespaces
✓ prio-10: Nothing to apply.
✓ Finished waiting
✓ default: Applied 1 objects.

New objects:
  default/Pod/hello
✓ The diff succeeded, do you want to proceed? (y/N) y
✓ prio-10: Nothing to apply.
✓ Finished waiting
✓ default: Applied 1 objects.
✓ Writing command result

New objects:
  default/Pod/hello
…ubernetes-admin@shitbox]~/C/nix-csi/demo [🎋 main][!?][🗀 loaded/allowed][🐚fish][⏱ 2s]
[01:39:40]❯ kubectl logs pods/hello 
Hello, world!

There’s a bit of boilerplate and it’s still a moving target, if you have nix-csi installed this is how you’d deploy an application to your cluster in the current iteration of the project(s).
There’s more work to be done so it’s this easy for more usecases. This also ignores multi-arch clusters by only targeting pkgs.system but you could easily add a pkgsCross thingy. I want to make it easy to push to cachix, attic and S3 (with signing) and such too but this is the user experience using the builtin cache (exposed through Kubernetes as a loadbalancer on 192.168.88.20).

Edit: The deployment log is always quite verbose, kluctl isn’t really meant for “dump your manifests here” deployments, they wanna render kustomizations and helm charts and all the bits and bobs. I’ve discussed with the kluctl maintainer and he thinks it’s a good idea to separate the (great) deployment engine from the rendering bits, soon :tm:

4 Likes

Thanks @Lillecarl , this was quite insightful. Now I understand better

1 Like

The storepath rather than the expression example didn’t exist this Wednesday so I couldn’t have made that example until now :wink: Previously i was focused on building expressions in-cluster. It’s still a goal to have that featureset, now that both easykubenix and nix-csi are “maturing” from alpha1 into alpha2(ish) I realize it’s probably not going to be a widely used feature, building AOT is more important :smile:

Edit: I updated the example to show how to support a mixed arch cluster with this mode too :smile:

1 Like

I was able to run NixOS in unprivileged containers, though I had to mount /sys/fs/cgroup RW which is cursed. Luckily within a Kubernetes release or two we’ll have KEP5474 which should let the CRI setup a writable cgroupfs without security complications for us.

One big downside is still that it’s impossible to remount /nix/store RO in the guests so either the CSI gives RO mount and you won’t be able to run nix commands in the container, or you RW mount and /nix/store is writable for root applications :smile:

1 Like

Updates

Cache

nix-csi now bundles a StatefulSet that acts as a central cache. It’s just ssh-ng with the dumbest SSH key setup ever (Reusing the same keys like a madman).
WIP: A little patch to Lix that updates registrationTime in Nix database of packages as soon as they’re queried meaning we can garbage collect based on that registrationTime and keep the cache hot. (Kinda like attic but with just Nix and OpenSSH).

Builders

The cache maintains a list of all builders (CSI Pods) in /etc/nix/machines, with some SSH configuration on your client you’re now able to utilize all CSI Pods as your own builders, works with aarch64-linux and x86_64-linux so it can be your own little build cluster :smile:

CSI

Not much news here, it just works :tm:

Misc

There’s an undeploy option you can set in the easykubenix settings that’ll spawn a DaemonSet to clear the /var/lib/nix-csi off the host in case you want to get rid of the thing entirely, it is however important that you decommission any pods mounting nix-csi volumes before removing the CSI (unless you plan on reinstalling it) or Kubernetes will have stuck pods since there’s nothing to respond to NodeUnpublishVolume requests, you don’t want stuck pods :smile:

2 Likes

Updates on nix-csi:

  • Extracts storepaths from podspec and puts them in volume now
  • Emit Kubernetes events
  • Robustness
  • Working on implementing the NRI protocol which allows manipulating the container on CRI level while still being “a part of Kubernetes” (emit events and such) which is pretty cool. Very POC still
1 Like

for what it’s worth, here’s a few thoughts investigating the viability of nixos’s (modular) services on k8s, reasoning we could maybe generate container images from them

Being able to reuse services from nixos would be great in some distant future, if that’s what you’re refering to.

The issue with container images is that they’re limited to 12X layers so any Nix tool building images will stuff multiple storepaths into one “slop layer”, you’re also forced into overlayFS which won’t share inodes and effectively save RAM like nix-csi is able to :smiley:

Being able to use Nix cache infrastructure instead of OCI is nice too

1 Like

i take it docker had been internally using nix these days - is there no workaround on the layers thing yet?

edit: related discourse thread

I’ve never heard of Docker using Nix internally, the limit is imposed by the kernel. There are newer API’s that’d allow raising the limit from my understanding but in practice nobody but Nix people would care about this limit so it’s not prioritized and you’re still stuck with overlayFS which wastes page cache.

I’m really happy with the easykubenix & nix-csi workflow where I build packages into the manifests, ship the manifests to cache and deploy the manifests to Kubernetes in one go, nix-csi fetches the paths from cache and bob’s your uncle.

I just got NRI protocol working in nix-csi (so the name nix-csi might have to go) which allows me to mutate the container creation request to add new mounts so the following manifest brings storepaths into the container :slight_smile:

apiVersion: v1
kind: Pod
metadata:
  annotations:
    nix-nri/test: "true"
  name: nritest
  namespace: nix-csi
spec:
  containers:
    - command:
        - /nix/store/a0b6nf74d71qqbd5qq9f9nhbfd8af7yx-tini-0.19.0/bin/tini
        - --
        - /nix/store/i2vmgx46q9hd3z6rigaiman3wl3i2gc4-coreutils-9.9/bin/sleep
        - infinity
      env:
        - name: AFILE
          value: /nix/store/vbhnr2jbkg0m5p3rkk620ddlp2cc2lrn-afile
        - name: PATH
          value: /nix/store/x12lw455sq6qy2wcya85d7rb88ybc3df-bash-interactive-5.3p9/bin
      image: gcr.io/distroless/static:latest
      name: "1771867320"

this is the podspec from easykubenix

kubernetes.resources.nix-csi.Pod.nritest = {
  metadata.annotations."nix-nri/test" = "true";
  spec = {
    containers = lib.mkNamedList {
      ${toString builtins.currentTime} = { # change an immutable field for redeploy
        image = "gcr.io/distroless/static:latest"; # useful until we have out of store mounting

        env = lib.mkNamedList {
          AFILE.value = pkgs.writeText "afile" "this is a file";
          PATH.value = lib.makeBinPath [ pkgs.bash ];
        };
        command = [
          (lib.getExe pkgs.tini)
          "--"
          (lib.getExe' pkgs.coreutils "sleep")
          "infinity"
        ];
      };
    };
  };
};

2 Likes

do i understand correctly that, given what you’ve figured out on this, if we can somehow fit nixos modules into this all then we’re good, more or less?

I’m doubtful NixOS modules will practically make it into easykubenix in my lifetime unless systemd is entirely abstracted away into wrapper scripts (Modular modules?) but yes.

You can run NixOS in Kubernetes today using one of two ways: Enable writeable cgroups in containerd or make the container privileged. nix-csi can already build NixOS configurations with boot.isContainer = true; set. Once KEP-5474 lands you’ll be able to run systemd in Kubernetes without reconfiguring CRI or making it privileged

Here’s a (dirty) example: nix-csi/demo/nixos.nix at c765b49d24f2d22dba7d358af1a95db651e0e65b · Lillecarl/nix-csi · GitHub
It’s a very unhappy systemd but it works :smiley:

easykubenix is using the NixOS module system, so you can create “Kubernetes modules” which paired with nix-csi could create arbitrary configurations.

from what i understand, running systemd within kubernetes isn’t necessarily desirable, in the sense if you run separate services then kubernetes can evaluate health, restart or scale out at that level as needed.

I’m doubtful NixOS modules will practically make it into easykubenix in my lifetime unless systemd is entirely abstracted away into wrapper scripts (Modular modules?) but yes.

so, the mentioned modular services feature that was recently introduced does target this level - so we’re kind of trying to figure out how we can better make that work to facilitate supporting the functionality of more of the existing service modules.

now, our existing nixos service modules of course both use potentially multiple services and could contain non-systemd stuff. this poses challenges as you noted, tho (as per the linked threads) at least on the former we have a few ideas already.

for what it’s worth, we have a vested interest in figuring those things out, given this could make the modules generic enough for reuse across environments - see also the RFC that inspired modular service. there’s a matrix channel on this as well.

I didn’t mean to dismiss the effort to standardize modular services and some could very well be reused (argv, config and env) but the confinement bits of systemd are different enough from Kubernetes.

Thanks for pointing in that direction, it’s a much needed change to enable sharing as much as possible of the implementation efforts :slight_smile:

1 Like

I’ve been working on a Python implementation of the Nix daemon protocol for two weekends. It’s intended to be the cache and build scheduler for nix-csi.

The code lives here. The architecture is pretty simple. Nix daemon protocol on both the front and backend, a local store that’s the “source of truth”. I took inspiration from some things I remember Rickard from nixbuild.net telling me, rio-build and the snix daemon proto docs.

Implements enough 1.32 to use nixbuild as a backend, 1.35 and 1.38. Haven’t tested CA or dynamic derivations yet. Can be used over UDS or SSH, about 50% overhead over local builds when run over UDS locally against a test UDS builder (you can’t build against the same store you’re initiating from because it deadlocks).

I’m able to serve just over 100MB/s of large NARs over SSH or a couple hundred small NARs/s. Implements a HTTP cache too with the daemon proto as the backend. It’s all single threaded using asyncio so it’s never going to be “high-performance” but it’s able to spread builds nicely and can easily handle 100s of client connections by connection pooling local and backend stores :slight_smile:

4 Likes