0
votes

I am creating a 3-node cluster inside an Ubuntu VM running on my Mac using Kind. They work as they should:

NAME                 STATUS   ROLES    AGE   VERSION
kind-control-plane   Ready    master   20h   v1.17.0
kind-worker          Ready    <none>   20h   v1.17.0
kind-worker2         Ready    <none>   20h   v1.17.0

I have installed Consul using the official tutorial with the default Helm chart. Now, the problem is that the consul pods are either running or pending and none of them are ready:

NAME                        READY   STATUS    RESTARTS   AGE
busybox-6cd57fd969-9tzmf    1/1     Running   0          17h
hashicorp-consul-hgxdr      0/1     Running   0          18h
hashicorp-consul-server-0   0/1     Running   0          18h
hashicorp-consul-server-1   0/1     Running   0          18h
hashicorp-consul-server-2   0/1     Pending   0          18h
hashicorp-consul-vmsmt      0/1     Running   0          18h

Here is the full description of the pods:

Name:         busybox-6cd57fd969-9tzmf
Namespace:    default
Priority:     0
Node:         kind-worker2/172.17.0.4
Start Time:   Tue, 14 Jan 2020 17:45:03 +0800
Labels:       pod-template-hash=6cd57fd969
              run=busybox
Annotations:  <none>
Status:       Running
IP:           10.244.2.11
IPs:
  IP:           10.244.2.11
Controlled By:  ReplicaSet/busybox-6cd57fd969
Containers:
  busybox:
    Container ID:  containerd://710eba6a12607021098e3c376637476cd85faf86ac9abcf10f191126dc37026b
    Image:         busybox
    Image ID:      docker.io/library/busybox@sha256:6915be4043561d64e0ab0f8f098dc2ac48e077fe23f488ac24b665166898115a
    Port:          <none>
    Host Port:     <none>
    Args:
      sh
    State:          Running
      Started:      Tue, 14 Jan 2020 21:00:50 +0800
    Ready:          True
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-zszqr (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             True 
  ContainersReady   True 
  PodScheduled      True 
Volumes:
  default-token-zszqr:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-zszqr
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:          <none>


Name:         hashicorp-consul-hgxdr
Namespace:    default
Priority:     0
Node:         kind-worker2/172.17.0.4
Start Time:   Tue, 14 Jan 2020 17:13:57 +0800
Labels:       app=consul
              chart=consul-helm
              component=client
              controller-revision-hash=6bc54657b6
              hasDNS=true
              pod-template-generation=1
              release=hashicorp
Annotations:  consul.hashicorp.com/connect-inject: false
Status:       Running
IP:           10.244.2.10
IPs:
  IP:           10.244.2.10
Controlled By:  DaemonSet/hashicorp-consul
Containers:
  consul:
    Container ID:  containerd://2209cfeaa740e3565213de6d0653dabbe9a8cbf1ffe085352a8e9d3a2d0452ec
    Image:         consul:1.6.2
    Image ID:      docker.io/library/consul@sha256:a167e7222c84687c3e7f392f13b23d9f391cac80b6b839052e58617dab714805
    Ports:         8500/TCP, 8502/TCP, 8301/TCP, 8301/UDP, 8302/TCP, 8300/TCP, 8600/TCP, 8600/UDP
    Host Ports:    8500/TCP, 8502/TCP, 0/TCP, 0/UDP, 0/TCP, 0/TCP, 0/TCP, 0/UDP
    Command:
      /bin/sh
      -ec
      CONSUL_FULLNAME="hashicorp-consul"

      exec /bin/consul agent \
        -node="${NODE}" \
        -advertise="${ADVERTISE_IP}" \
        -bind=0.0.0.0 \
        -client=0.0.0.0 \
        -node-meta=pod-name:${HOSTNAME} \
        -hcl="ports { grpc = 8502 }" \
        -config-dir=/consul/config \
        -datacenter=dc1 \
        -data-dir=/consul/data \
        -retry-join=${CONSUL_FULLNAME}-server-0.${CONSUL_FULLNAME}-server.${NAMESPACE}.svc \
        -retry-join=${CONSUL_FULLNAME}-server-1.${CONSUL_FULLNAME}-server.${NAMESPACE}.svc \
        -retry-join=${CONSUL_FULLNAME}-server-2.${CONSUL_FULLNAME}-server.${NAMESPACE}.svc \
        -domain=consul

    State:          Running
      Started:      Tue, 14 Jan 2020 20:58:29 +0800
    Ready:          False
    Restart Count:  0
    Readiness:      exec [/bin/sh -ec curl http://127.0.0.1:8500/v1/status/leader 2>/dev/null | \
grep -E '".+"'
] delay=0s timeout=1s period=10s #success=1 #failure=3
    Environment:
      ADVERTISE_IP:   (v1:status.podIP)
      NAMESPACE:     default (v1:metadata.namespace)
      NODE:           (v1:spec.nodeName)
    Mounts:
      /consul/config from config (rw)
      /consul/data from data (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from hashicorp-consul-client-token-4r5tv (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  data:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:     
    SizeLimit:  <unset>
  config:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      hashicorp-consul-client-config
    Optional:  false
  hashicorp-consul-client-token-4r5tv:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  hashicorp-consul-client-token-4r5tv
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/disk-pressure:NoSchedule
                 node.kubernetes.io/memory-pressure:NoSchedule
                 node.kubernetes.io/not-ready:NoExecute
                 node.kubernetes.io/pid-pressure:NoSchedule
                 node.kubernetes.io/unreachable:NoExecute
                 node.kubernetes.io/unschedulable:NoSchedule
Events:
  Type     Reason     Age                   From                   Message
  ----     ------     ----                  ----                   -------
  Warning  Unhealthy  96s (x3206 over 14h)  kubelet, kind-worker2  Readiness probe failed:


Name:         hashicorp-consul-server-0
Namespace:    default
Priority:     0
Node:         kind-worker2/172.17.0.4
Start Time:   Tue, 14 Jan 2020 17:13:57 +0800
Labels:       app=consul
              chart=consul-helm
              component=server
              controller-revision-hash=hashicorp-consul-server-98f4fc994
              hasDNS=true
              release=hashicorp
              statefulset.kubernetes.io/pod-name=hashicorp-consul-server-0
Annotations:  consul.hashicorp.com/connect-inject: false
Status:       Running
IP:           10.244.2.9
IPs:
  IP:           10.244.2.9
Controlled By:  StatefulSet/hashicorp-consul-server
Containers:
  consul:
    Container ID:  containerd://72b7bf0e81d3ed477f76b357743e9429325da0f38ccf741f53c9587082cdfcd0
    Image:         consul:1.6.2
    Image ID:      docker.io/library/consul@sha256:a167e7222c84687c3e7f392f13b23d9f391cac80b6b839052e58617dab714805
    Ports:         8500/TCP, 8301/TCP, 8302/TCP, 8300/TCP, 8600/TCP, 8600/UDP
    Host Ports:    0/TCP, 0/TCP, 0/TCP, 0/TCP, 0/TCP, 0/UDP
    Command:
      /bin/sh
      -ec
      CONSUL_FULLNAME="hashicorp-consul"

      exec /bin/consul agent \
        -advertise="${POD_IP}" \
        -bind=0.0.0.0 \
        -bootstrap-expect=3 \
        -client=0.0.0.0 \
        -config-dir=/consul/config \
        -datacenter=dc1 \
        -data-dir=/consul/data \
        -domain=consul \
        -hcl="connect { enabled = true }" \
        -ui \
        -retry-join=${CONSUL_FULLNAME}-server-0.${CONSUL_FULLNAME}-server.${NAMESPACE}.svc \
        -retry-join=${CONSUL_FULLNAME}-server-1.${CONSUL_FULLNAME}-server.${NAMESPACE}.svc \
        -retry-join=${CONSUL_FULLNAME}-server-2.${CONSUL_FULLNAME}-server.${NAMESPACE}.svc \
        -server

    State:          Running
      Started:      Tue, 14 Jan 2020 20:58:27 +0800
    Ready:          False
    Restart Count:  0
    Readiness:      exec [/bin/sh -ec curl http://127.0.0.1:8500/v1/status/leader 2>/dev/null | \
grep -E '".+"'
] delay=5s timeout=5s period=3s #success=1 #failure=2
    Environment:
      POD_IP:      (v1:status.podIP)
      NAMESPACE:  default (v1:metadata.namespace)
    Mounts:
      /consul/config from config (rw)
      /consul/data from data-default (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from hashicorp-consul-server-token-hhdxc (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  data-default:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  data-default-hashicorp-consul-server-0
    ReadOnly:   false
  config:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      hashicorp-consul-server-config
    Optional:  false
  hashicorp-consul-server-token-hhdxc:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  hashicorp-consul-server-token-hhdxc
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason     Age                    From                   Message
  ----     ------     ----                   ----                   -------
  Warning  Unhealthy  97s (x10686 over 14h)  kubelet, kind-worker2  Readiness probe failed:


Name:         hashicorp-consul-server-1
Namespace:    default
Priority:     0
Node:         kind-worker/172.17.0.3
Start Time:   Tue, 14 Jan 2020 17:13:57 +0800
Labels:       app=consul
              chart=consul-helm
              component=server
              controller-revision-hash=hashicorp-consul-server-98f4fc994
              hasDNS=true
              release=hashicorp
              statefulset.kubernetes.io/pod-name=hashicorp-consul-server-1
Annotations:  consul.hashicorp.com/connect-inject: false
Status:       Running
IP:           10.244.1.8
IPs:
  IP:           10.244.1.8
Controlled By:  StatefulSet/hashicorp-consul-server
Containers:
  consul:
    Container ID:  containerd://c1f5a88e30e545c75e58a730be5003cee93c823c21ebb29b22b79cd151164a15
    Image:         consul:1.6.2
    Image ID:      docker.io/library/consul@sha256:a167e7222c84687c3e7f392f13b23d9f391cac80b6b839052e58617dab714805
    Ports:         8500/TCP, 8301/TCP, 8302/TCP, 8300/TCP, 8600/TCP, 8600/UDP
    Host Ports:    0/TCP, 0/TCP, 0/TCP, 0/TCP, 0/TCP, 0/UDP
    Command:
      /bin/sh
      -ec
      CONSUL_FULLNAME="hashicorp-consul"

      exec /bin/consul agent \
        -advertise="${POD_IP}" \
        -bind=0.0.0.0 \
        -bootstrap-expect=3 \
        -client=0.0.0.0 \
        -config-dir=/consul/config \
        -datacenter=dc1 \
        -data-dir=/consul/data \
        -domain=consul \
        -hcl="connect { enabled = true }" \
        -ui \
        -retry-join=${CONSUL_FULLNAME}-server-0.${CONSUL_FULLNAME}-server.${NAMESPACE}.svc \
        -retry-join=${CONSUL_FULLNAME}-server-1.${CONSUL_FULLNAME}-server.${NAMESPACE}.svc \
        -retry-join=${CONSUL_FULLNAME}-server-2.${CONSUL_FULLNAME}-server.${NAMESPACE}.svc \
        -server

    State:          Running
      Started:      Tue, 14 Jan 2020 20:58:36 +0800
    Ready:          False
    Restart Count:  0
    Readiness:      exec [/bin/sh -ec curl http://127.0.0.1:8500/v1/status/leader 2>/dev/null | \
grep -E '".+"'
] delay=5s timeout=5s period=3s #success=1 #failure=2
    Environment:
      POD_IP:      (v1:status.podIP)
      NAMESPACE:  default (v1:metadata.namespace)
    Mounts:
      /consul/config from config (rw)
      /consul/data from data-default (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from hashicorp-consul-server-token-hhdxc (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  data-default:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  data-default-hashicorp-consul-server-1
    ReadOnly:   false
  config:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      hashicorp-consul-server-config
    Optional:  false
  hashicorp-consul-server-token-hhdxc:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  hashicorp-consul-server-token-hhdxc
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason     Age                    From                  Message
  ----     ------     ----                   ----                  -------
  Warning  Unhealthy  95s (x10683 over 14h)  kubelet, kind-worker  Readiness probe failed:


Name:           hashicorp-consul-server-2
Namespace:      default
Priority:       0
Node:           <none>
Labels:         app=consul
                chart=consul-helm
                component=server
                controller-revision-hash=hashicorp-consul-server-98f4fc994
                hasDNS=true
                release=hashicorp
                statefulset.kubernetes.io/pod-name=hashicorp-consul-server-2
Annotations:    consul.hashicorp.com/connect-inject: false
Status:         Pending
IP:             
IPs:            <none>
Controlled By:  StatefulSet/hashicorp-consul-server
Containers:
  consul:
    Image:       consul:1.6.2
    Ports:       8500/TCP, 8301/TCP, 8302/TCP, 8300/TCP, 8600/TCP, 8600/UDP
    Host Ports:  0/TCP, 0/TCP, 0/TCP, 0/TCP, 0/TCP, 0/UDP
    Command:
      /bin/sh
      -ec
      CONSUL_FULLNAME="hashicorp-consul"

      exec /bin/consul agent \
        -advertise="${POD_IP}" \
        -bind=0.0.0.0 \
        -bootstrap-expect=3 \
        -client=0.0.0.0 \
        -config-dir=/consul/config \
        -datacenter=dc1 \
        -data-dir=/consul/data \
        -domain=consul \
        -hcl="connect { enabled = true }" \
        -ui \
        -retry-join=${CONSUL_FULLNAME}-server-0.${CONSUL_FULLNAME}-server.${NAMESPACE}.svc \
        -retry-join=${CONSUL_FULLNAME}-server-1.${CONSUL_FULLNAME}-server.${NAMESPACE}.svc \
        -retry-join=${CONSUL_FULLNAME}-server-2.${CONSUL_FULLNAME}-server.${NAMESPACE}.svc \
        -server

    Readiness:  exec [/bin/sh -ec curl http://127.0.0.1:8500/v1/status/leader 2>/dev/null | \
grep -E '".+"'
] delay=5s timeout=5s period=3s #success=1 #failure=2
    Environment:
      POD_IP:      (v1:status.podIP)
      NAMESPACE:  default (v1:metadata.namespace)
    Mounts:
      /consul/config from config (rw)
      /consul/data from data-default (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from hashicorp-consul-server-token-hhdxc (ro)
Conditions:
  Type           Status
  PodScheduled   False 
Volumes:
  data-default:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  data-default-hashicorp-consul-server-2
    ReadOnly:   false
  config:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      hashicorp-consul-server-config
    Optional:  false
  hashicorp-consul-server-token-hhdxc:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  hashicorp-consul-server-token-hhdxc
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason            Age                  From               Message
  ----     ------            ----                 ----               -------
  Warning  FailedScheduling  63s (x434 over 18h)  default-scheduler  0/3 nodes are available: 1 node(s) had taints that the pod didn't tolerate, 2 node(s) didn't match pod affinity/anti-affinity.


Name:         hashicorp-consul-vmsmt
Namespace:    default
Priority:     0
Node:         kind-worker/172.17.0.3
Start Time:   Tue, 14 Jan 2020 17:13:57 +0800
Labels:       app=consul
              chart=consul-helm
              component=client
              controller-revision-hash=6bc54657b6
              hasDNS=true
              pod-template-generation=1
              release=hashicorp
Annotations:  consul.hashicorp.com/connect-inject: false
Status:       Running
IP:           10.244.1.9
IPs:
  IP:           10.244.1.9
Controlled By:  DaemonSet/hashicorp-consul
Containers:
  consul:
    Container ID:  containerd://d502870f3476ea074b059361bc52a2c68ced551f5743b8448926bdaa319aabb0
    Image:         consul:1.6.2
    Image ID:      docker.io/library/consul@sha256:a167e7222c84687c3e7f392f13b23d9f391cac80b6b839052e58617dab714805
    Ports:         8500/TCP, 8502/TCP, 8301/TCP, 8301/UDP, 8302/TCP, 8300/TCP, 8600/TCP, 8600/UDP
    Host Ports:    8500/TCP, 8502/TCP, 0/TCP, 0/UDP, 0/TCP, 0/TCP, 0/TCP, 0/UDP
    Command:
      /bin/sh
      -ec
      CONSUL_FULLNAME="hashicorp-consul"

      exec /bin/consul agent \
        -node="${NODE}" \
        -advertise="${ADVERTISE_IP}" \
        -bind=0.0.0.0 \
        -client=0.0.0.0 \
        -node-meta=pod-name:${HOSTNAME} \
        -hcl="ports { grpc = 8502 }" \
        -config-dir=/consul/config \
        -datacenter=dc1 \
        -data-dir=/consul/data \
        -retry-join=${CONSUL_FULLNAME}-server-0.${CONSUL_FULLNAME}-server.${NAMESPACE}.svc \
        -retry-join=${CONSUL_FULLNAME}-server-1.${CONSUL_FULLNAME}-server.${NAMESPACE}.svc \
        -retry-join=${CONSUL_FULLNAME}-server-2.${CONSUL_FULLNAME}-server.${NAMESPACE}.svc \
        -domain=consul

    State:          Running
      Started:      Tue, 14 Jan 2020 20:58:35 +0800
    Ready:          False
    Restart Count:  0
    Readiness:      exec [/bin/sh -ec curl http://127.0.0.1:8500/v1/status/leader 2>/dev/null | \
grep -E '".+"'
] delay=0s timeout=1s period=10s #success=1 #failure=3
    Environment:
      ADVERTISE_IP:   (v1:status.podIP)
      NAMESPACE:     default (v1:metadata.namespace)
      NODE:           (v1:spec.nodeName)
    Mounts:
      /consul/config from config (rw)
      /consul/data from data (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from hashicorp-consul-client-token-4r5tv (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  data:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:     
    SizeLimit:  <unset>
  config:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      hashicorp-consul-client-config
    Optional:  false
  hashicorp-consul-client-token-4r5tv:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  hashicorp-consul-client-token-4r5tv
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/disk-pressure:NoSchedule
                 node.kubernetes.io/memory-pressure:NoSchedule
                 node.kubernetes.io/not-ready:NoExecute
                 node.kubernetes.io/pid-pressure:NoSchedule
                 node.kubernetes.io/unreachable:NoExecute
                 node.kubernetes.io/unschedulable:NoSchedule
Events:
  Type     Reason     Age                   From                  Message
  ----     ------     ----                  ----                  -------
  Warning  Unhealthy  88s (x3207 over 14h)  kubelet, kind-worker  Readiness probe failed:

For the sake of completeness here is my kubelet status:

   Loaded: loaded (/lib/systemd/system/kubelet.service; enabled; vendor preset: enabled)
  Drop-In: /etc/systemd/system/kubelet.service.d
           └─10-kubeadm.conf
   Active: active (running) since Wed 2020-01-15 10:59:06 +08; 1h 5min ago
     Docs: https://kubernetes.io/docs/home/
 Main PID: 11910 (kubelet)
    Tasks: 17
   Memory: 50.3M
      CPU: 1min 16.431s
   CGroup: /system.slice/kubelet.service
           └─11910 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config=/var/lib/kubelet/config.yaml 

Jan 15 12:04:41 ubuntu kubelet[11910]: E0115 12:04:41.610779   11910 kubelet.go:2183] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message
Jan 15 12:04:42 ubuntu kubelet[11910]: W0115 12:04:42.370702   11910 cni.go:237] Unable to update cni config: no networks found in /etc/cni/net.d
Jan 15 12:04:46 ubuntu kubelet[11910]: E0115 12:04:46.612639   11910 kubelet.go:2183] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message
Jan 15 12:04:47 ubuntu kubelet[11910]: W0115 12:04:47.371621   11910 cni.go:237] Unable to update cni config: no networks found in /etc/cni/net.d
Jan 15 12:04:51 ubuntu kubelet[11910]: E0115 12:04:51.614925   11910 kubelet.go:2183] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message
Jan 15 12:04:52 ubuntu kubelet[11910]: W0115 12:04:52.372164   11910 cni.go:237] Unable to update cni config: no networks found in /etc/cni/net.d
Jan 15 12:04:56 ubuntu kubelet[11910]: E0115 12:04:56.616201   11910 kubelet.go:2183] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message
Jan 15 12:04:57 ubuntu kubelet[11910]: W0115 12:04:57.372364   11910 cni.go:237] Unable to update cni config: no networks found in /etc/cni/net.d
Jan 15 12:05:01 ubuntu kubelet[11910]: E0115 12:05:01.617916   11910 kubelet.go:2183] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message
Jan 15 12:05:02 ubuntu kubelet[11910]: W0115 12:05:02.372698   11910 cni.go:237] Unable to update cni config: no networks found in /etc/cni/net.d

Any help is much appreciated.

2
What CNI have you installed? kubectl get pods -n kube-system output?Arghya Sadhu
Is the problem only with consul pods? Can you try running any other pods such as nginx and see if that works?Arghya Sadhu

2 Answers

0
votes

For a consul cluster fault tolerance recommended quorum size is either 3 or 5 refer Deployment Table

Default value for quorum on Helm Charts is 3

replicas (integer: 3) -The number of server agents to run.

`affinity (string) - This value defines the affinity for server pods. It defaults to allowing only a single pod on each node, which minimizes risk of the cluster becoming unusable if a node is lost`

Refer affinity If you need to run more pods per node set this value to null.

So you will need minimum of 3 schedule-able worker nodes fulfilling the affinity requirement in a production grade deployment to install consul with quorum value of 3 [ effectively you need to increase worker nodes to 5 if chose the value to be updated to 5]

Its clearly documented on values.yaml on helm chart what values to use it to be run on system with less number of nodes

By reducing replica count

~/test/consul-helm$ cat values.yaml | grep -i replica
  replicas: 3
  bootstrapExpect: 3 # Should <= replicas count
    # replicas. If you'd like a custom value, you can specify an override here.

By disabling Affinity

~/test/consul-helm$ cat values.yaml | grep -i -A 8 affinity

  # Affinity Settings
  # Commenting out or setting as empty the affinity variable, will allow
  # deployment to single node services such as Minikube
  affinity: |
    podAntiAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        - labelSelector:
            matchLabels:
              app: {{ template "consul.name" . }}
              release: "{{ .Release.Name }}"
              component: server
          topologyKey: kubernetes.io/hostname
1
votes

I replicated your setup creating 3 node cluster (1 master and 2 workers) and deployed consul with helm and saw the same thing as you see. All pods were running beside one that was pending.

In statefulset object you can see there is podAntiAffinity which disallows scheduling 2 or more server pods on the same node. This is why You see one pod in pending state.

There are 4 ways I come up with you can make it work.

  1. Master node has a taint: node-role.kubernetes.io/master:NoSchedule which disallows scheduling any pods on master node. You can delete this taint by running: kubectl taint node kind-control-plane node-role.kubernetes.io/master:NoSchedule- (notice minus sign, it tells k8s to delete the taint) so now scheduler will be able to schedule the one consul-server pod that's left to this node.

  2. You can add one more worker node.

  3. You can remove podAntiAffinity from consul-server statfulset object so scheduler won't care where pods get scheduled.

  4. Change requiredDuringSchedulingIgnoredDuringExecution to preferredDuringSchedulingIgnoredDuringExecution so this affinity rule does not need to be fulfilled, it's only preferred.

Let me know if it helped.