Kubernetes: network malfunction after upgrading to 19.09

Hi,

I’ve upgraded a test cluster of a three machine Kubernetes cluster on NixOS 19.09. So now It’s using tha bare 19.09 k8s module (a 1.15, flannel 0.11), before it was using using 1.14 from unstable, with the stabilization across machines (#56789) patch applied and before the PR that reverts it (#67563) went into place.

The only problem that I’m facing now is that the network communication between the machines is spotty after the machines finish the bootstrap phase. If I manually restart the flannel daemon of each machine the network layer resumes normal functionality. Do you have experienced any similar symptom?

The only thing that seems stand out in the flannel logs are these lines:

   github.com/coreos/flannel/subnet/kube/kube.go:310: Failed to watch *v1.Node: Get https://belial.etour.tn.it:6443/api/v1/nodes?resourceVersion=43657342&timeoutSeconds=414&watch=true: dial tcp 192.168.122.102:6443: connect: connection refused
   github.com/coreos/flannel/subnet/kube/kube.go:310: Failed to list *v1.Node: nodes is forbidden: User "flannel-client" cannot list resource "nodes" in API group "" at the cluster scope
   github.com/coreos/flannel/subnet/kube/kube.go:310: Failed to watch *v1.Node: Get https://belial.etour.tn.it:6443/api/v1/nodes?resourceVersion=43658446&timeoutSeconds=384&watch=true: dial tcp 192.168.122.102:6443: connect: connection refused
   github.com/coreos/flannel/subnet/kube/kube.go:310: Failed to list *v1.Node: nodes is forbidden: User "flannel-client" cannot list resource "nodes" in API group "" at the cluster scope

My suspect that this happens while flannel has started and kube-apiserver has yet to be completely ready (flannel is configured to use “kubernetes” storage).

Anyway, when I restart the daemons manually (so way after kube-apiserver has become ready) those lines do not end up in the logs.

I thinking about how to resolve this issue… starting the flannel service after the apiserver is easy on the master server, but I don’t know how to do that on the other machines. If you have any comment, suggestion, please speak up!

cc @johanot @srhb @globin @calbrecht

P.S. I’ve tought of filing a bug but I have yet to establish how much this is reproducible.

Does the error stabilize once the apiserver is up?

I had a bunch of trouble getting kube to work the way I wanted and ended up using a different approach based on kubeadm.
There was a comment with some code on one of the PRs which led me to this implementation below. I’m quite happy with it, I had some pain getting flannel to work properly with the 19.09 kubernetes module (I bind all my services to tinc interfaces, and couldn’t get the flannel implementation to play nicely) but with this setup I now implement the network overlay the ‘vanilla way’, through a simple kubectl apply -f . This means I am also not tied to flannel which I personally not use it myself.
I suppose you could hook in a one-off systemd boot script that also configured the network overlay by executing a kubectl apply, I don’t bootstrap the cluster that often so I never bothered.

This won’t fix the issues you are having, mostly posting this here in case you or others are keen to try a different/simpler implementation.


{ pkgs, lib, config, ... }: let cfg = config.services.kubeadm; in {
  options.services.kubeadm = {
    enable = lib.mkEnableOption "kubeadm";
    role = lib.mkOption {
      type = lib.types.enum ["master" "worker" ];
    };
    apiserverAddress = lib.mkOption {
      type = lib.types.str;
      description = ''
        The address on which we can reach the masters. Could be loadbalancer
      '';
    };
    bootstrapToken = lib.mkOption {
      type = lib.types.str;
      description = ''
        The master will print this to stdout after being set up.
      '';
    };
    nodeip = lib.mkOption {
      type = lib.types.str;
    };

    discoveryTokenCaCertHash = lib.mkOption {
      type = lib.types.str;
    };


  };
  config = lib.mkIf cfg.enable {

    boot.kernelModules = [ "br_netfilter" ];
    boot.kernel.sysctl = {
      "net.ipv4.ip_forward" = 1;
      "net.bridge.bridge-nf-call-iptables" = 1;
    };

    environment.systemPackages = with pkgs; [
      gitMinimal
      openssh
      docker
      utillinux
      iproute
      ethtool
      thin-provisioning-tools
      iptables
      socat
    ];

    virtualisation.docker.enable = true;

    systemd.services.kubeadm = {
      wantedBy = [ "multi-user.target" ];
      after = [ "kubelet.service" ];
      postStart = lib.mkIf (cfg.role == "master")
        ''
          KUBECONFIG=/etc/kubernetes/admin.conf kubectl -n kube-public get cm cluster-info -o json | jq -r '.data.kubeconfig' > /etc/kubernetes/cluster-info.cfg
          chmod a+r /etc/kubernetes/cluster-info.cfg
        '';

      # These paths are needed to convince kubeadm to bootstrap
      path = with pkgs; [ kubernetes jq gitMinimal openssh docker utillinux iproute ethtool thin-provisioning-tools iptables socat ];
      serviceConfig = {
        Type = "oneshot";
        RemainAfterExit = true;
        # Makes sure that its only started once, during bootstrap
        ConditionPathExists = "!/var/lib/kubelet/config.yaml";
        Statedirectory = "kubelet";
        ConfigurationDirectory = "kubernetes";
        ExecStart = {
          master = "${pkgs.kubernetes}/bin/kubeadm init --pod-network-cidr=10.244.0.0/16 --apiserver-advertise-address=${cfg.apiserverAddress} --ignore-preflight-errors='all' --token ${cfg.bootstrapToken} --token-ttl 0 --upload-certs";
          worker = "${pkgs.kubernetes}/bin/kubeadm join ${cfg.apiserverAddress} --token ${cfg.bootstrapToken}  --discovery-token-unsafe-skip-ca-verification --ignore-preflight-errors all --discovery-token-ca-cert-hash ${cfg.discoveryTokenCaCertHash}";
        }.${cfg.role};
      };
    };
    systemd.services.kubelet = {
      description = "Kubernetes Kubelet Service";
      wantedBy = [ "multi-user.target" ];

      path = with pkgs; [ gitMinimal openssh docker utillinux iproute ethtool thin-provisioning-tools iptables socat cni ];

      serviceConfig = {
        StateDirectory = "kubelet";

        # This populates $KUBELET_KUBEADM_ARGS and is provided
        # by kubeadm init and join
        EnvironmentFile = "-/var/lib/kubelet/kubeadm-flags.env";

        Restart = "always";
        StartLimitInterval= 0;
        RestartSec = 10;

        ExecStart = ''
          ${pkgs.kubernetes}/bin/kubelet \
            --kubeconfig=/etc/kubernetes/kubelet.conf \
            --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf \
            --config=/var/lib/kubelet/config.yaml \
            --fail-swap-on=false \
            --cni-bin-dir="/opt/cni/bin" \
            --address="${cfg.nodeip}" \
            --node-ip="${cfg.nodeip}" \
            $KUBELET_KUBEADM_ARGS
        '';
      };
    };
  };
}

and then on my configuration.nix

on the workers:

  imports = [
    ./kubeadm.nix
  ];

   .
   .
   .

      kubeadm.enable = true;
      kubeadm.role = "worker";
      kubeadm.apiserverAddress = "172.16.254.60:6443";
      kubeadm.bootstrapToken = "8tipwo.tst0nvf7wcaqjcj0";
      kubeadm.discoveryTokenCaCertHash = "sha256:c1c70671f3d765fdadc3bcc3ef544222256bc7b89df5e86b7dcceb68336774da";
      kubeadm.nodeip = "172.16.254.63";

and on me slave,

      kubeadm.enable = true;
      kubeadm.role = "master";
      kubeadm.apiserverAddress = "172.16.254.60";
      kubeadm.bootstrapToken = "8tipwo.tst0nvf7wcaqjcj0";
      kubeadm.discoveryTokenCaCertHash = "sha256:c3e9efd010c793d2c983ea17f1e7f9346adf6018d524db0793bf550e39b1a402";
      kubeadm.nodeip = "172.16.254.60";

If you are asking if the issue disappears after a while then I’m not competely sure. It doesn’t goes away in half-hours or few hours. but today I found the cluster functioning as expect after the threee VMs have been shut down berfore and restarted after the borg backup job that saves them. I’ll wait some more cycles of the same to happen to say something :wink:

Hi @Azulinho, thanks for sharing this! I’m quite curious of testing it out :wink: what do you use other than flannel? I’ve another client which set up their own cluster and they are using calico, but I must say that on the surface flannel seems a bit simpler than calico and because the things to learn about kubernetes are so many a simpler tool makes at least that part easier to manage.

I use weave, mostly cause it just works with my ‘tricky’ network setup.
I reckon if you install flannel ‘the normal way’ it should work fine too.

@arianvp now I’m sure, it doesn’t disapperars after sometime; as of now only restarting the flannel daemons seems to solve it.