How to setup Kubernetes k3d on NixOS?

Hi there! Today was the first time, that I had the need to try something out in a cluster. I had kind installed before, since that’s what I knew before @azazel75 pointed me to k3d, which is supposed to be much more lightweight than kind.

I went ahead and installed it and tried some stuff out. However, I ran into two errors:

  1. kubectl-commands are executed painfully slow; about 10 seconds per command and they even time out sometimes.
  2. The k8s-server that was created when I executed k3d cluster create test never left the NotReady-status! It’s sitting there since 24 minutes now and I think I might have configured something wrong.

My setup of everything related to containers and clusters comes down to:

virtualisation.docker.enable = true;
users.extraUseres.tim.extraGroups = [ "docker" ];

# and in home-manager-config:
home.packages = with pkgs; [ kube3d kubectl ];

Do I have to add some configuration for Kubernetes to get it running? There have recently been this issue and the following PR. Could this be related? It just got merged into master 6 days ago, so I would just have to wait until it’s merged to unstable to use it.

I’m thankful for any help! Also here is my configuration if anyone wants to see the settings in context.

I did not use kind or k3d.

I mostly just do

{
  services.k3s.enable = true;
  # You need at least port 6443 for the API server.
  # Some other k3s services needs other ports.
  # I am sometimes too lazy to debug, why stuff cannot be reached.
  networking.firewall.enable = false;
}

and:

$ sudo k3s kubectl get pod

Hey @tim-hilt I maintain kube3d/k3d for nixpkgs. Sorry you’re having issues. The reason it’s slow is because k3s is repeatedly crashing; there’s been a couple of these issues recently :frowning_face:

So the k3s used by k3d should be completely unrelated to the k3s that’s packaged in nixpkgs as it pulls it in as a container:

λ docker ps
CONTAINER ID   IMAGE                      COMMAND                  CREATED        STATUS                          PORTS                             NAMES
5824c5c858c0   rancher/k3d-proxy:v4.4.4   "/bin/sh -c nginx-pr…"   26 hours ago   Up 26 hours                     80/tcp, 0.0.0.0:44471->6443/tcp   k3d-k3s-default-serverlb
d74323e94175   rancher/k3s:v1.20.6-k3s1   "/bin/entrypoint.sh …"   26 hours ago   Restarting (1) 20 seconds ago                                     k3d-k3s-default-server-0

This is also how you can see if the containers k3d spins up are having issues. (You can also run docker logs k3d-k3s-default-server-0)


The v2 cgroups change did cause issues with k3d but that was resolved with a flag in v4.4.3 (here) and by default in v4.4.4 (here)


If you’re on the latest version of k3d and still getting crashes it’s probably the new error:

F0605 09:33:43.803005 7 server.go:495] open /proc/sys/net/netfilter/nf_conntrack_max: permission denied

Details are in the k3d FAQ here and should be fixed in k3s (and the k3s docker images) soonish

For now the fix is to use the following to create your clusters:

λ k3d cluster create \
  --k3s-server-arg "--kube-proxy-arg=conntrack-max-per-core=0" \
  --k3s-agent-arg "--kube-proxy-arg=conntrack-max-per-core=0" \
  --image rancher/k3s:v1.20.6-k3s

I’ve given it a go and this fixed the crashes for me!

Hope this helps

Great to have a maintainer respond directly! :slight_smile: Thanks for your efforts and the suggestions! Weirdly, the given command failed on me, when I just tried it out:

❯ k3d cluster create \
        --k3s-server-arg "--kube-proxy-arg=conntrack-max-per-core=0" \
        --k3s-agent-arg "--kube-proxy-arg=conntrack-max-per-core=0" \
        --image rancher/k3s:v1.20.6-k3s
INFO[0000] Prep: Network
INFO[0000] Re-using existing network 'k3d-k3s-default' (1c6d23b91642d33572b4ebfdbc6f6b035c9d9a289030597853c86eb7f9a0bca1)
INFO[0000] Created volume 'k3d-k3s-default-images'
INFO[0001] Creating node 'k3d-k3s-default-server-0'
ERRO[0003] Failed to pull image 'rancher/k3s:v1.20.6-k3s'
ERRO[0003] Failed to create container 'k3d-k3s-default-server-0'
ERRO[0003] Failed to create node 'k3d-k3s-default-server-0'
ERRO[0003] Failed to create node
ERRO[0003] Failed Cluster Creation: Error response from daemon: manifest for rancher/k3s:v1.20.6-k3s not found: manifest unknown: manifest unknown
ERRO[0003] Failed to create cluster >>> Rolling Back
INFO[0003] Deleting cluster 'k3s-default'
ERRO[0003] No nodes found for given cluster
FATA[0003] Cluster creation FAILED, also FAILED to rollback changes!

To be honest, I didn’t take much time into researching why this happened, but do you know of any solution to this by any chance?

Failed to pull the image.
Sorry the FAQ refers to an image that doesn’t exist, I also had the pull issue and fixed it but then I pasted you the old command :man_facepalming:

https://hub.docker.com/r/rancher/k3s/tags?page=1&ordering=last_updated&name=v1.20.6-k3s

k3d cluster create \
        --k3s-server-arg "--kube-proxy-arg=conntrack-max-per-core=0" \
        --k3s-agent-arg "--kube-proxy-arg=conntrack-max-per-core=0" \
        --image rancher/k3s:v1.20.6-k3s1 # added a 1 to the end here lol

You can just do this too and let it pick an image:

k3d cluster create \
        --k3s-server-arg "--kube-proxy-arg=conntrack-max-per-core=0" \
        --k3s-agent-arg "--kube-proxy-arg=conntrack-max-per-core=0"

(also when doing k3d cluster delete you wont need these args as they dont exist, its just needed for creation)

Great, works flawlessly now :slight_smile: Is there an issue or PR I can subscribe to on GitHub to see, if there are any changes?

Ok next issue: I can’t delete resources via kubectl delete -f filename.yaml… When I researched the problem I didn’t find anything similar. Is this an issue with k3d or the nix-packaging? :grimacing: Do you know of this problem?

EDIT:

I’ve tried out kubectl delete with kind and it also fails to execute for me. Seems to be an issue with kubectl instead of k3d/kind.

Yeah I’m not sure what the issue could be without logs or the yaml.
If it was a k3d related issue it’d be output in the server logs.

I guess just make sure that the yaml refers to a resource and check your namespace? :man_shrugging:

I replicated my setup on an Arch Linux installation and everything worked as expected. The same holds true on MacOS. Unfortunately I don’t have the time to look deeper into this right now, but I will keep it in mind and report back, if I have something new!

I tried to create a cluster

k3d cluster create mycluster
INFO[0000] Prep: Network
INFO[0000] Created network 'k3d-mycluster'
INFO[0000] Created image volume k3d-mycluster-images
INFO[0000] Starting new tools node...
INFO[0000] Starting Node 'k3d-mycluster-tools'
INFO[0001] Creating node 'k3d-mycluster-server-0'
INFO[0001] Creating LoadBalancer 'k3d-mycluster-serverlb'
INFO[0001] Using the k3d-tools node to gather environment information
INFO[0001] HostIP: using network gateway 172.23.0.1 address
INFO[0001] Starting cluster 'mycluster'
INFO[0001] Starting servers...
INFO[0001] Starting Node 'k3d-mycluster-server-0'
INFO[0005] All agents already running.
INFO[0005] Starting helpers...
INFO[0005] Starting Node 'k3d-mycluster-serverlb'
INFO[0012] Injecting records for hostAliases (incl. host.k3d.internal) and for 2 network members into CoreDNS configmap...
INFO[0014] Cluster 'mycluster' created successfully!
INFO[0014] You can now use it like this:
kubectl cluster-info

which works but somehow the node cannot connect to 127.0.0.1

the log from docker logs k3d-mycluster-server-0 pastebin

The connection to the server 127.0.0.1:6443 was refused - did you specify the right host or port?
The connection to the server 127.0.0.1:6443 was refused - did you specify the right host or port?
The connection to the server 127.0.0.1:6443 was refused - did you specify the right host or port?
The connection to the server 127.0.0.1:6443 was refused - did you specify the right host or port?
The connection to the server 127.0.0.1:6443 was refused - did you specify the right host or port?

I just installed k3d in my packages, not over service.k3d.enable because this is not a server but my work machine…

Anybody lknows what the problem might be?