I’m not super well versed in the kubernetes ecosystem so I might be misunderstanding the problem, but I think there are a few gaps in the current nixos wiki page that I’d like clarified:
It seems like the default cfssl config and the default kubernetes config disagree on where ca.pem
(the public key for the CA root?) should go. cfssl
puts it in /var/lib/cfssl/ca.pem
and something dumps an empty file at /var/lib/kubernetes/secrets/ca.pem
. I fixed this by manually copying the cfssl
ca.pem
to /var/lib/kubernetes/secrets
. This fixed certmgr
:
Jul 17 12:01:55 carrot-cake certmgr-pre-start[80356]: 2022/07/17 12:01:55 [INFO] certmgr: loading from config file /nix/store/4r56vfg2skz4hm1ymvmwmrwn3sa486k0-certmgr.yaml
Jul 17 12:01:55 carrot-cake certmgr-pre-start[80356]: 2022/07/17 12:01:55 [INFO] manager: loading certificates from /nix/store/m4z0digyigrqps9p68vxqyjkx91cbjbx-certmgr.d
Jul 17 12:01:55 carrot-cake certmgr-pre-start[80356]: 2022/07/17 12:01:55 [INFO] manager: loading spec from /nix/store/m4z0digyigrqps9p68vxqyjkx91cbjbx-certmgr.d/addonManager.json
Jul 17 12:01:55 carrot-cake certmgr-pre-start[80356]: 2022/07/17 12:01:55 [ERROR] cert: failed to fetch remote CA: failed to parse rootCA certs
Which got further:
Jul 17 12:29:02 carrot-cake certmgr-pre-start[138552]: 2022/07/17 12:29:02 [INFO] certmgr: loading from config file /nix/store/4r56vfg2skz4hm1ymvmwmrwn3sa486k0-certmgr.yaml
Jul 17 12:29:02 carrot-cake certmgr-pre-start[138552]: 2022/07/17 12:29:02 [INFO] manager: loading certificates from /nix/store/m4z0digyigrqps9p68vxqyjkx91cbjbx-certmgr.d
Jul 17 12:29:02 carrot-cake certmgr-pre-start[138552]: 2022/07/17 12:29:02 [INFO] manager: loading spec from /nix/store/m4z0digyigrqps9p68vxqyjkx91cbjbx-certmgr.d/addonManager.json
Jul 17 12:30:32 carrot-cake systemd[1]: certmgr.service: start-pre operation timed out. Terminating.
Then I manually added my kube.api
IP to the loopback interface:
ip addr add 10.1.1.2 dev lo
and certmgr seemed happy.
After that, kube-apiserver
was still failing to start so I had to chown /var/lib/kubernetes/secrets/ca.pem
to kubernetes:nogroup
, mode 644
. Finally, I was able to tickle etcd
and the full system seems to be up:
# systemctl start etcd
# kubectl cluster-info
Kubernetes control plane is running at https://api.kube:6443
CoreDNS is running at https://api.kube:6443/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy
Hopefully someone searching for these errors can find this page. It would be nice for the nixos kubernetes module to set itself up properly though. If someone can point me to where in the nixos-modules
forest some changes need to be made I can submit a PR to nixpkgs
.
This might be the same issue I’m having with the k3s service.
opened 02:56AM - 17 Jul 22 UTC
0.kind: bug
### Describe the bug
This is going to be a long description since I'm not entir… ely sure where the bounds of k3s are when it comes to statefulness. I'm currently running the latest version of k3s in nixpkgs and I am unable to stand up the cluster without all pods failing after helm deploys traefik. I have search upstream's issues and there doesn't seem to be anything relevant there. I have spent quite a lot of time scouring the logs of both the pods and k3s itself. I also tried reverting the package to a previous commit and that did not work either.
I can't see anything obvious in the k3s logs but there are quite a few warnings and errors but I'm not sure which are genuine errors or just the kubelet complaining about the state not being ready. The pods don't seem to indicate anything explicit either.
### Steps To Reproduce
Here is the current config I have:
```nix
{ lib, pkgs, ... }:
let
# https://github.com/NixOS/nixpkgs/pull/176520
k3s = pkgs.k3s.overrideAttrs
(old: rec { buildInputs = old.buildInputs ++ [ pkgs.ipset ]; });
in {
networking.firewall.allowedTCPPorts = [ 6443 80 443 10250 ];
networking.firewall.allowedUDPPorts = [ 8472 ];
services.k3s = {
enable = true;
role = "server";
package = k3s;
};
environment.systemPackages = [
(pkgs.writeShellScriptBin "k3s-reset-node"
(builtins.readFile ./k3s-reset-node))
];
}
```
### Expected behavior
Pods come up in the kube-system namespace.
### Screenshots
journalctl
[k3s_logs.txt](https://github.com/NixOS/nixpkgs/files/9126972/k3s_logs.txt)
pods
```console
$ k get pods -A
[sudo] password for collin:
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system helm-install-traefik-crd-ntl24 0/1 Completed 0 124m
kube-system helm-install-traefik-xg6tn 0/1 Completed 1 124m
kube-system traefik-7cd4fcff68-8s2tg 0/1 CrashLoopBackOff 28 (3m38s ago) 124m
kube-system local-path-provisioner-7b7dc8d6f5-rqd5j 0/1 CrashLoopBackOff 23 (2m46s ago) 124m
kube-system coredns-b96499967-4vhc8 0/1 CrashLoopBackOff 26 (116s ago) 124m
kube-system svclb-traefik-1628ae6b-vc7j8 0/2 CrashLoopBackOff 50 (41s ago) 124m
kube-system metrics-server-668d979685-m5mwm 1/1 Running 34 (5m54s ago) 124m
```
```console
$ k logs metrics-server-668d979685-m5mwm -p -n kube-system
I0717 00:24:27.948191 1 requestheader_controller.go:169] Starting RequestHeaderAuthRequestController
I0717 00:24:27.948198 1 configmap_cafile_content.go:202] Starting client-ca::kube-system::extension-apiserver-authentication::client-ca-file
I0717 00:24:27.948240 1 shared_informer.go:240] Waiting for caches to sync for RequestHeaderAuthRequestController
I0717 00:24:27.948243 1 shared_informer.go:240] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::client-ca-file
I0717 00:24:27.948204 1 configmap_cafile_content.go:202] Starting client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file
I0717 00:24:27.948258 1 shared_informer.go:240] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file
I0717 00:24:27.948381 1 dynamic_serving_content.go:130] Starting serving-cert::/tmp/apiserver.crt::/tmp/apiserver.key
I0717 00:24:27.948519 1 secure_serving.go:202] Serving securely on [::]:4443
I0717 00:24:27.948563 1 tlsconfig.go:240] Starting DynamicServingCertificateController
I0717 00:24:28.049036 1 shared_informer.go:247] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file
I0717 00:24:28.049050 1 shared_informer.go:247] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::client-ca-file
I0717 00:24:28.049088 1 shared_informer.go:247] Caches are synced for RequestHeaderAuthRequestController
I0717 00:24:28.170843 1 server.go:188] "Failed probe" probe="metric-storage-ready" err="not metrics to serve"
I0717 00:24:29.171668 1 server.go:188] "Failed probe" probe="metric-storage-ready" err="not metrics to serve"
I0717 00:24:29.362833 1 server.go:188] "Failed probe" probe="metric-storage-ready" err="not metrics to serve"
I0717 00:24:31.363846 1 server.go:188] "Failed probe" probe="metric-storage-ready" err="not metrics to serve"
I0717 00:24:33.363524 1 server.go:188] "Failed probe" probe="metric-storage-ready" err="not metrics to serve"
I0717 00:24:35.363869 1 server.go:188] "Failed probe" probe="metric-storage-ready" err="not metrics to serve"
I0717 00:24:37.362718 1 server.go:188] "Failed probe" probe="metric-storage-ready" err="not metrics to serve"
I0717 00:24:39.363017 1 server.go:188] "Failed probe" probe="metric-storage-ready" err="not metrics to serve"
I0717 00:24:41.362617 1 server.go:188] "Failed probe" probe="metric-storage-ready" err="not metrics to serve"
I0717 00:26:05.487063 1 requestheader_controller.go:183] Shutting down RequestHeaderAuthRequestController
I0717 00:26:05.487084 1 configmap_cafile_content.go:223] Shutting down client-ca::kube-system::extension-apiserver-authentication::client-ca-file
I0717 00:26:05.487091 1 configmap_cafile_content.go:223] Shutting down client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file
I0717 00:26:05.487177 1 tlsconfig.go:255] Shutting down DynamicServingCertificateController
I0717 00:26:05.487229 1 secure_serving.go:246] Stopped listening on [::]:4443
I0717 00:26:05.487237 1 dynamic_serving_content.go:145] Shutting down serving-cert::/tmp/apiserver.crt::/tmp/apiserver.key
```
```console
$ k logs local-path-provisioner-7b7dc8d6f5-rqd5j -p -n kube-system
I0717 00:26:57.595741 1 controller.go:773] Starting provisioner controller rancher.io/local-path_local-path-provisioner-7b7dc8d6f5-rqd5j_0677ce3a-05da-446b-9492-3a9eb140c921!
I0717 00:26:57.696339 1 controller.go:822] Started provisioner controller rancher.io/local-path_local-path-provisioner-7b7dc8d6f5-rqd5j_0677ce3a-05da-446b-9492-3a9eb140c921!
time="2022-07-17T00:29:50Z" level=info msg="Receive terminated to exit"
time="2022-07-17T00:29:50Z" level=info msg="stop watching config file"
```
```console
$ k logs coredns-b96499967-4vhc8 -p -n kube-system
[WARNING] No files matching import glob pattern: /etc/coredns/custom/*.server
.:53
[WARNING] No files matching import glob pattern: /etc/coredns/custom/*.server
[INFO] plugin/reload: Running configuration SHA512 = b941b080e5322f6519009bb49349462c7ddb6317425b0f6a83e5451175b720703949e3f3b454a24e77f3ffe57fd5e9c6130e528a5a1dd00d9000e4afd6c1108d
CoreDNS-1.9.1
linux/amd64, go1.17.8, 4b597f8
[INFO] SIGTERM: Shutting down servers then terminating
```
```console
$ k logs svclb-traefik-1628ae6b-vc7j8 -p -n kube-system
Defaulted container "lb-tcp-80" out of: lb-tcp-80, lb-tcp-443
+ trap exit TERM INT
+ echo 10.43.12.129
+ grep -Eq :
+ cat /proc/sys/net/ipv4/ip_forward
+ '[' 1 '!=' 1 ]
+ iptables -t nat -I PREROUTING '!' -s 10.43.12.129/32 -p TCP --dport 80 -j DNAT --to 10.43.12.129:80
+ iptables -t nat -I POSTROUTING -d 10.43.12.129/32 -p TCP -j MASQUERADE
+ '[' '!' -e /pause ]
+ mkfifo /pause
/usr/bin/entry: line 27: can't open /pause: Interrupted system call
+
+ exit
```
```console
$ k logs traefik-7cd4fcff68-8s2tg -p -n kube-system
time="2022-07-17T02:34:13Z" level=info msg="Configuration loaded from flags."
time="2022-07-17T02:34:14Z" level=error msg="accept tcp [::]:8000: use of closed network connection" entryPointName=web
time="2022-07-17T02:34:14Z" level=error msg="accept tcp [::]:9100: use of closed network connection" entryPointName=metrics
time="2022-07-17T02:34:14Z" level=error msg="accept tcp [::]:8443: use of closed network connection" entryPointName=websecure
time="2022-07-17T02:34:14Z" level=error msg="accept tcp [::]:9000: use of closed network connection" entryPointName=traefik
time="2022-07-17T02:34:14Z" level=error msg="close tcp [::]:8443: use of closed network connection" entryPointName=websecure
time="2022-07-17T02:34:14Z" level=error msg="close tcp [::]:8000: use of closed network connection" entryPointName=web
time="2022-07-17T02:34:14Z" level=error msg="close tcp [::]:9100: use of closed network connection" entryPointName=metrics
time="2022-07-17T02:34:14Z" level=error msg="close tcp [::]:9000: use of closed network connection" entryPointName=traefik
```
```console
$ k describe node zombie
Name: zombie
Roles: control-plane,master
Labels: beta.kubernetes.io/arch=amd64
beta.kubernetes.io/instance-type=k3s
beta.kubernetes.io/os=linux
egress.k3s.io/cluster=true
kubernetes.io/arch=amd64
kubernetes.io/hostname=zombie
kubernetes.io/os=linux
node-role.kubernetes.io/control-plane=true
node-role.kubernetes.io/master=true
node.kubernetes.io/instance-type=k3s
Annotations: flannel.alpha.coreos.com/backend-data: {"VNI":1,"VtepMAC":"ca:10:b7:59:64:03"}
flannel.alpha.coreos.com/backend-type: vxlan
flannel.alpha.coreos.com/kube-subnet-manager: true
flannel.alpha.coreos.com/public-ip: 192.168.1.164
k3s.io/hostname: zombie
k3s.io/internal-ip: 192.168.1.164
k3s.io/node-args: ["server","--kubelet-arg","cgroup-driver=systemd"]
k3s.io/node-config-hash: 6FBPL7ZNB3BFG32NEVA44JMMFWZ34N6AY4YKS3OHY45APNNIJBJQ====
k3s.io/node-env: {"K3S_DATA_DIR":"/var/lib/rancher/k3s/data/a2527f9db03e21bee3a56f440fb6ea3cd6e7796abf2bb0c3428db9f447b522e5"}
node.alpha.kubernetes.io/ttl: 0
volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp: Sat, 16 Jul 2022 15:57:20 -0400
Taints: <none>
Unschedulable: false
Lease:
HolderIdentity: zombie
AcquireTime: <unset>
RenewTime: Sat, 16 Jul 2022 22:35:52 -0400
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
MemoryPressure False Sat, 16 Jul 2022 22:33:16 -0400 Sat, 16 Jul 2022 15:57:20 -0400 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure False Sat, 16 Jul 2022 22:33:16 -0400 Sat, 16 Jul 2022 15:57:20 -0400 KubeletHasNoDiskPressure kubelet has no disk pressure
PIDPressure False Sat, 16 Jul 2022 22:33:16 -0400 Sat, 16 Jul 2022 15:57:20 -0400 KubeletHasSufficientPID kubelet has sufficient PID available
Ready True Sat, 16 Jul 2022 22:33:16 -0400 Sat, 16 Jul 2022 15:57:31 -0400 KubeletReady kubelet is posting ready status
Addresses:
InternalIP: 192.168.1.164
Hostname: zombie
Capacity:
cpu: 24
ephemeral-storage: 479081160Ki
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 65780800Ki
pods: 110
Allocatable:
cpu: 24
ephemeral-storage: 466050152083
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 65780800Ki
pods: 110
System Info:
Machine ID: 440d6715398c4a968f8638b885267bf9
System UUID: b9c28570-b664-0000-0000-000000000000
Boot ID: 9a01a61d-e374-446d-8595-5fc100db7d98
Kernel Version: 5.15.53
OS Image: NixOS 22.11 (Raccoon)
Operating System: linux
Architecture: amd64
Container Runtime Version: containerd://1.5.13-k3s1
Kubelet Version: v1.24.2+k3s2
Kube-Proxy Version: v1.24.2+k3s2
PodCIDR: 10.42.0.0/24
PodCIDRs: 10.42.0.0/24
ProviderID: k3s://zombie
Non-terminated Pods: (5 in total)
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits Age
--------- ---- ------------ ---------- --------------- ------------- ---
kube-system metrics-server-668d979685-m5mwm 100m (0%) 0 (0%) 70Mi (0%) 0 (0%) 6h38m
kube-system svclb-traefik-1628ae6b-vc7j8 0 (0%) 0 (0%) 0 (0%) 0 (0%) 6h38m
kube-system local-path-provisioner-7b7dc8d6f5-rqd5j 0 (0%) 0 (0%) 0 (0%) 0 (0%) 6h38m
kube-system traefik-7cd4fcff68-8s2tg 0 (0%) 0 (0%) 0 (0%) 0 (0%) 6h38m
kube-system coredns-b96499967-4vhc8 100m (0%) 0 (0%) 70Mi (0%) 170Mi (0%) 6h38m
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource Requests Limits
-------- -------- ------
cpu 200m (0%) 0 (0%)
memory 140Mi (0%) 170Mi (0%)
ephemeral-storage 0 (0%) 0 (0%)
hugepages-1Gi 0 (0%) 0 (0%)
hugepages-2Mi 0 (0%) 0 (0%)
Events: <none>
```
### Additional context
If you want to look at my entire config, it's hosted here https://github.com/collinarnett/brew
### Notify maintainers
@euank
@superherointj
@Mic92
@kalbasit
### Metadata
```console
$ nix-shell -p nix-info --run "nix-info -m"
- system: `"x86_64-linux"`
- host os: `Linux 5.15.53, NixOS, 22.11 (Raccoon)`
- multi-user?: `yes`
- sandbox: `yes`
- version: `nix-env (Nix) 2.9.1`
- channels(collin): `""`
- channels(root): `"nixos-21.11.335130.386234e2a61"`
- nixpkgs: `/nix/var/nix/profiles/per-user/root/channels/nixos`
```
Thank you so much for documenting your fixes!
Here is where the Kubernetes service is defined
{ config, lib, options, pkgs, ... }:
with lib;
let
cfg = config.services.kubernetes;
opt = options.services.kubernetes;
defaultContainerdSettings = {
version = 2;
root = "/var/lib/containerd";
state = "/run/containerd";
oom_score = 0;
grpc = {
address = "/run/containerd/containerd.sock";
};
plugins."io.containerd.grpc.v1.cri" = {
sandbox_image = "pause:latest";
This file has been truncated. show original