1
votes

My goal is to model a hybrid/heterogeneous Kubernetes cluster, where I have the following setup:

  • Master node runs on AWS (cloud) - ip-172-31-28-6
  • Slave node runs on my laptop - osboxes
  • Slave node runs on a Raspberry Pi - edge-1

Running a Kubernetes cluster with three VMs locally on my laptop is no problem and works fine with both Weave Net. However, there are some communication problems (I guess), when modelling my Kubernetes cluster as depicted above.

As Kubernetes is designed to run on nodes, such that all nodes are located in the same network, I set up an OpenVPN server on AWS and connect with both my laptop and Raspberry Pi to it. I was hoping that this would be enough to run Kubernetes on a heterogeneous setup, when the slave nodes are in a different network. Of course, this was an incorrect assumption.

If I run the Kubernetes dashboard on a slave node and try to access it, I get a timeout. If I run it on the Master node, everything works as expected.

I set up the cluster on AWS with kubeadm init --apiserver-advertise-address= and used kubeadm join to connect with the nodes.

$ kubectl get pods --all-namespaces -o wide:

NAMESPACE     NAME                                     READY     STATUS              RESTARTS   AGE       IP              NODE
kube-system   etcd-ip-172-31-28-6                      1/1       Running             0          5m        172.31.28.6     ip-172-31-28-6
kube-system   kube-apiserver-ip-172-31-28-6            1/1       Running             0          5m        172.31.28.6     ip-172-31-28-6
kube-system   kube-controller-manager-ip-172-31-28-6   1/1       Running             0          5m        172.31.28.6     ip-172-31-28-6
kube-system   kube-dns-6f4fd4bdf-w6ctf                 0/3       ContainerCreating   0          15h       <none>          osboxes
kube-system   kube-proxy-2pl2f                         1/1       Running             0          15h       172.31.28.6     ip-172-31-28-6
kube-system   kube-proxy-7b89c                         0/1       CrashLoopBackOff    15         15h       192.168.2.106   edge-1
kube-system   kube-proxy-qg69g                         1/1       Running             1          15h       10.0.2.15       osboxes
kube-system   kube-scheduler-ip-172-31-28-6            1/1       Running             0          5m        172.31.28.6     ip-172-31-28-6
kube-system   weave-net-pqxfp                          1/2       CrashLoopBackOff    189        15h       172.31.28.6     ip-172-31-28-6
kube-system   weave-net-thhzr                          1/2       CrashLoopBackOff    12         36m       192.168.2.106   edge-1
kube-system   weave-net-v69hj                          2/2       Running             7          15h       10.0.2.15       osboxes

$ kubectl -n kube-system logs --v=7 kube-dns-6f4fd4bdf-w6ctf -c kubedns:

...
I0321 09:04:25.620580   23936 round_trippers.go:414] GET https://<PUBLIC_IP>:6443/api/v1/namespaces/kube-system/pods/kube-dns-6f4fd4bdf-w6ctf/log?container=kubedns
I0321 09:04:25.620605   23936 round_trippers.go:421] Request Headers:
I0321 09:04:25.620611   23936 round_trippers.go:424]     Accept: application/json, */*
I0321 09:04:25.620616   23936 round_trippers.go:424]     User-Agent: kubectl/v1.9.4 (linux/amd64) kubernetes/bee2d15
I0321 09:04:25.713821   23936 round_trippers.go:439] Response Status: 400 Bad Request in 93 milliseconds
I0321 09:04:25.714106   23936 helpers.go:201] server response object: [{
  "metadata": {},
  "status": "Failure",
  "message": "container \"kubedns\" in pod \"kube-dns-6f4fd4bdf-w6ctf\" is waiting to start: ContainerCreating",
  "reason": "BadRequest",
  "code": 400
}]
F0321 09:04:25.714134   23936 helpers.go:119] Error from server (BadRequest): container "kubedns" in pod "kube-dns-6f4fd4bdf-w6ctf" is waiting to start: ContainerCreating

kubectl -n kube-system logs --v=7 kube-proxy-7b89c:

...
I0321 09:06:51.803852   24289 round_trippers.go:414] GET https://<PUBLIC_IP>:6443/api/v1/namespaces/kube-system/pods/kube-proxy-7b89c/log
I0321 09:06:51.803879   24289 round_trippers.go:421] Request Headers:
I0321 09:06:51.803891   24289 round_trippers.go:424]     User-Agent: kubectl/v1.9.4 (linux/amd64) kubernetes/bee2d15
I0321 09:06:51.803900   24289 round_trippers.go:424]     Accept: application/json, */*
I0321 09:08:59.110869   24289 round_trippers.go:439] Response Status: 500 Internal Server Error in 127306 milliseconds
I0321 09:08:59.111129   24289 helpers.go:201] server response object: [{
  "metadata": {},
  "status": "Failure",
  "message": "Get https://192.168.2.106:10250/containerLogs/kube-system/kube-proxy-7b89c/kube-proxy: dial tcp 192.168.2.106:10250: getsockopt: connection timed out",
  "code": 500
}]
F0321 09:08:59.111156   24289 helpers.go:119] Error from server: Get https://192.168.2.106:10250/containerLogs/kube-system/kube-proxy-7b89c/kube-proxy: dial tcp 192.168.2.106:10250: getsockopt: connection timed out

kubectl -n kube-system logs --v=7 weave-net-pqxfp -c weave:

...
I0321 09:12:08.047206   24847 round_trippers.go:414] GET https://<PUBLIC_IP>:6443/api/v1/namespaces/kube-system/pods/weave-net-pqxfp/log?container=weave
I0321 09:12:08.047233   24847 round_trippers.go:421] Request Headers:
I0321 09:12:08.047335   24847 round_trippers.go:424]     Accept: application/json, */*
I0321 09:12:08.047347   24847 round_trippers.go:424]     User-Agent: kubectl/v1.9.4 (linux/amd64) kubernetes/bee2d15
I0321 09:12:08.062494   24847 round_trippers.go:439] Response Status: 200 OK in 15 milliseconds
DEBU: 2018/03/21 09:11:26.847013 [kube-peers] Checking peer "fa:10:a4:97:7e:7b" against list &{[{6e:fd:f4:ef:1e:f5 osboxes}]}
Peer not in list; removing persisted data
INFO: 2018/03/21 09:11:26.880946 Command line options: map[expect-npc:true ipalloc-init:consensus=3 db-prefix:/weavedb/weave-net http-addr:127.0.0.1:6784 ipalloc-range:10.32.0.0/12 nickname:ip-172-31-28-6 host-root:/host name:fa:10:a4:97:7e:7b no-dns:true status-addr:0.0.0.0:6782 datapath:datapath docker-api: port:6783 conn-limit:30]
INFO: 2018/03/21 09:11:26.880995 weave  2.2.1
FATA: 2018/03/21 09:11:26.881117 Inconsistent bridge state detected. Please do 'weave reset' and try again

kubectl -n kube-system logs --v=7 weave-net-thhzr -c weave:

...
I0321 09:15:13.787905   25113 round_trippers.go:414] GET https://<PUBLIC_IP>:6443/api/v1/namespaces/kube-system/pods/weave-net-thhzr/log?container=weave
I0321 09:15:13.787932   25113 round_trippers.go:421] Request Headers:
I0321 09:15:13.787938   25113 round_trippers.go:424]     Accept: application/json, */*
I0321 09:15:13.787946   25113 round_trippers.go:424]     User-Agent: kubectl/v1.9.4 (linux/amd64) kubernetes/bee2d15
I0321 09:17:21.126863   25113 round_trippers.go:439] Response Status: 500 Internal Server Error in 127338 milliseconds
I0321 09:17:21.127140   25113 helpers.go:201] server response object: [{
  "metadata": {},
  "status": "Failure",
  "message": "Get https://192.168.2.106:10250/containerLogs/kube-system/weave-net-thhzr/weave: dial tcp 192.168.2.106:10250: getsockopt: connection timed out",
  "code": 500
}]
F0321 09:17:21.127167   25113 helpers.go:119] Error from server: Get https://192.168.2.106:10250/containerLogs/kube-system/weave-net-thhzr/weave: dial tcp 192.168.2.106:10250: getsockopt: connection timed out

$ ifconfig (Kubernetes master on AWS):

datapath  Link encap:Ethernet  HWaddr ae:90:9a:b2:7e:d9
          inet6 addr: fe80::ac90:9aff:feb2:7ed9/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1376  Metric:1
          RX packets:29 errors:0 dropped:0 overruns:0 frame:0
          TX packets:14 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1
          RX bytes:1904 (1.9 KB)  TX bytes:1188 (1.1 KB)

docker0   Link encap:Ethernet  HWaddr 02:42:50:39:1f:c7
          inet addr:172.17.0.1  Bcast:0.0.0.0  Mask:255.255.0.0
          UP BROADCAST MULTICAST  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)

eth0      Link encap:Ethernet  HWaddr 06:a3:d0:8e:19:72
          inet addr:172.31.28.6  Bcast:172.31.31.255  Mask:255.255.240.0
          inet6 addr: fe80::4a3:d0ff:fe8e:1972/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:9001  Metric:1
          RX packets:10323322 errors:0 dropped:0 overruns:0 frame:0
          TX packets:9418208 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:3652314289 (3.6 GB)  TX bytes:3117288442 (3.1 GB)

lo        Link encap:Local Loopback
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:65536  Metric:1
          RX packets:11388236 errors:0 dropped:0 overruns:0 frame:0
          TX packets:11388236 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1
          RX bytes:2687297929 (2.6 GB)  TX bytes:2687297929 (2.6 GB)

tun0      Link encap:UNSPEC  HWaddr 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00
          inet addr:10.8.0.1  P-t-P:10.8.0.2  Mask:255.255.255.255
          UP POINTOPOINT RUNNING NOARP MULTICAST  MTU:1500  Metric:1
          RX packets:97222 errors:0 dropped:0 overruns:0 frame:0
          TX packets:164607 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:100
          RX bytes:13381022 (13.3 MB)  TX bytes:209129403 (209.1 MB)

vethwe-bridge Link encap:Ethernet  HWaddr 12:59:54:73:0f:91
          inet6 addr: fe80::1059:54ff:fe73:f91/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1376  Metric:1
          RX packets:18 errors:0 dropped:0 overruns:0 frame:0
          TX packets:36 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:1476 (1.4 KB)  TX bytes:2940 (2.9 KB)

vethwe-datapath Link encap:Ethernet  HWaddr 8e:75:1c:92:93:0d
          inet6 addr: fe80::8c75:1cff:fe92:930d/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1376  Metric:1
          RX packets:36 errors:0 dropped:0 overruns:0 frame:0
          TX packets:18 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:2940 (2.9 KB)  TX bytes:1476 (1.4 KB)

vxlan-6784 Link encap:Ethernet  HWaddr a6:02:da:5e:d5:2a
          inet6 addr: fe80::a402:daff:fe5e:d52a/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:65485  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:8 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)

$ sudo systemctl status kubelet.service (on AWS):

Mar 21 09:34:59 ip-172-31-28-6 kubelet[19676]: W0321 09:34:59.202058   19676 cni.go:171] Unable to update cni config: No networks found in /etc/cni/net.d
Mar 21 09:34:59 ip-172-31-28-6 kubelet[19676]: E0321 09:34:59.202452   19676 kubelet.go:2109] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized
Mar 21 09:35:01 ip-172-31-28-6 kubelet[19676]: I0321 09:35:01.535541   19676 kuberuntime_manager.go:514] Container {Name:weave Image:weaveworks/weave-kube:2.2.1 Command:[/home/weave/launch.sh] Args:[] WorkingDir: Ports:[] EnvFrom:[] Env:[{Name:HOSTNAME Value: ValueFrom:&EnvVarSource{FieldRef:&ObjectFieldSelector{APIVersion:v1,FieldPath:spec.nodeName,},ResourceFieldRef:nil,ConfigMapKeyRef:nil,SecretKeyRef:nil,}}] Resources:{Limits:map[] Requests:map[cpu:{i:{value:10 scale:-3} d:{Dec:<nil>} s:10m Format:DecimalSI}]} VolumeMounts:[{Name:weavedb ReadOnly:false MountPath:/weavedb SubPath: MountPropagation:<nil>} {Name:cni-bin ReadOnly:false MountPath:/host/opt SubPath: MountPropagation:<nil>} {Name:cni-bin2 ReadOnly:false MountPath:/host/home SubPath: MountPropagation:<nil>} {Name:cni-conf ReadOnly:false MountPath:/host/etc SubPath: MountPropagation:<nil>} {Name:dbus ReadOnly:false MountPath:/host/var/lib/dbus SubPath: MountPropagation:<nil>} {Name:lib-modules ReadOnly:false MountPath:/lib/modules SubPath: MountPropagation:<nil>} {Name:weave-net-token-vn8rh ReadOnly:true MountPath:/var/run/secrets/kubernetes.io/serviceaccount SubPath: MountPropagation:<nil>}] VolumeDevices:[] LivenessProbe:&Probe{Handler:Handler{Exec:nil,HTTPGet:&HTTPGetAction{Path:/status,Port:6784,Host:127.0.0.1,Scheme:HTTP,HTTPHeaders:[],},TCPSocket:nil,},InitialDelaySeconds:30,TimeoutSeconds:1,PeriodSeconds:10,SuccessThreshold:1,FailureThreshold:3,} ReadinessProbe:nil Lifecycle:nil TerminationMessagePath:/dev/termination-log TerminationMessagePolicy:File ImagePullPolicy:IfNotPresent SecurityContext:&SecurityContext{Capabilities:nil,Privileged:*true,SELinuxOptions:nil,RunAsUser:nil,RunAsNonRoot:nil,ReadOnlyRootFilesystem:nil,AllowPrivilegeEscalation:nil,} Stdin:false StdinOnce:false TTY:false} is dead, but RestartPolicy says that we should restart it.
Mar 21 09:35:01 ip-172-31-28-6 kubelet[19676]: I0321 09:35:01.536504   19676 kuberuntime_manager.go:758] checking backoff for container "weave" in pod "weave-net-pqxfp_kube-system(c6450070-2c61-11e8-a50d-06a3d08e1972)"
Mar 21 09:35:01 ip-172-31-28-6 kubelet[19676]: I0321 09:35:01.536636   19676 kuberuntime_manager.go:768] Back-off 5m0s restarting failed container=weave pod=weave-net-pqxfp_kube-system(c6450070-2c61-11e8-a50d-06a3d08e1972)
Mar 21 09:35:01 ip-172-31-28-6 kubelet[19676]: E0321 09:35:01.536664   19676 pod_workers.go:186] Error syncing pod c6450070-2c61-11e8-a50d-06a3d08e1972 ("weave-net-pqxfp_kube-system(c6450070-2c61-11e8-a50d-06a3d08e1972)"), skipping: failed to "StartContainer" for "weave" with CrashLoopBackOff: "Back-off 5m0s restarting failed container=weave pod=weave-net-pqxfp_kube-system(c6450070-2c61-11e8-a50d-06a3d08e1972)"

$ sudo systemctl status kubelet.service (on Laptop)

Mar 21 05:47:18 osboxes kubelet[715]: E0321 05:47:18.662670     715 remote_runtime.go:92] RunPodSandbox from runtime service failed: rpc error: code = DeadlineExceeded desc = context deadline exceeded
Mar 21 05:47:18 osboxes kubelet[715]: E0321 05:47:18.663412     715 kuberuntime_sandbox.go:54] CreatePodSandbox for pod "kube-dns-6f4fd4bdf-w6ctf_kube-system(11886465-2c61-11e8-a50d-06a3d08e1972)" failed: rpc error: code = DeadlineExceeded desc = context deadline exceeded
Mar 21 05:47:18 osboxes kubelet[715]: E0321 05:47:18.663869     715 kuberuntime_manager.go:647] createPodSandbox for pod "kube-dns-6f4fd4bdf-w6ctf_kube-system(11886465-2c61-11e8-a50d-06a3d08e1972)" failed: rpc error: code = DeadlineExceeded desc = context deadline exceeded
Mar 21 05:47:18 osboxes kubelet[715]: E0321 05:47:18.664295     715 pod_workers.go:186] Error syncing pod 11886465-2c61-11e8-a50d-06a3d08e1972 ("kube-dns-6f4fd4bdf-w6ctf_kube-system(11886465-2c61-11e8-a50d-06a3d08e1972)"), skipping: failed to "CreatePodSandbox" for "kube-dns-6f4fd4bdf-w6ctf_kube-system(11886465-2c61-11e8-a50d-06a3d08e1972)" with CreatePodSandboxError: "CreatePodSandbox for pod \"kube-dns-6f4fd4bdf-w6ctf_kube-system(11886465-2c61-11e8-a50d-06a3d08e1972)\" failed: rpc error: code = DeadlineExceeded desc = context deadline exceeded"
Mar 21 05:47:20 osboxes kubelet[715]: W0321 05:47:20.536161     715 pod_container_deletor.go:77] Container "bbf490835face43b70c24dbcb67c3f75872e7831b5e2605dc8bb71210910e273" not found in pod's containers

$ sudo systemctl status kubelet.service (on Raspberry Pi):

Mar 21 09:29:01 edge-1 kubelet[339]: I0321 09:29:01.188199     339 kuberuntime_manager.go:514] Container {Name:kube-proxy Image:gcr.io/google_containers/kube-proxy-amd64:v1.9.5 Command:[/usr/local/bin/kube-proxy --config=/var/lib/kube-proxy/config.conf] Args:[] WorkingDir: Ports:[] EnvFrom:[] Env:[] Resources:{Limits:map[] Requests:map[]} VolumeMounts:[{Name:kube-proxy ReadOnly:false MountPath:/var/lib/kube-proxy SubPath: MountPropagation:<nil>} {Name:xtables-lock ReadOnly:false MountPath:/run/xtables.lock SubPath: MountPropagation:<nil>} {Name:lib-modules ReadOnly:true MountPath:/lib/modules SubPath: MountPropagation:<nil>} {Name:kube-proxy-token-px7dt ReadOnly:true MountPath:/var/run/secrets/kubernetes.io/serviceaccount SubPath: MountPropagation:<nil>}] VolumeDevices:[] LivenessProbe:nil ReadinessProbe:nil Lifecycle:nil TerminationMessagePath:/dev/termination-log TerminationMessagePolicy:File ImagePullPolicy:IfNotPresent SecurityContext:&SecurityContext{Capabilities:nil,Privileged:*true,SELinuxOptions:nil,RunAsUser:nil,RunAsNonRoot:nil,ReadOnlyRootFilesystem:nil,AllowPrivilegeEscalation:nil,} Stdin:false StdinOnce:false TTY:false} is dead, but RestartPolicy says that we should restart it.
Mar 21 09:29:01 edge-1 kubelet[339]: I0321 09:29:01.189023     339 kuberuntime_manager.go:758] checking backoff for container "kube-proxy" in pod "kube-proxy-7b89c_kube-system(5bebafa1-2c61-11e8-a50d-06a3d08e1972)"
Mar 21 09:29:01 edge-1 kubelet[339]: I0321 09:29:01.190174     339 kuberuntime_manager.go:768] Back-off 5m0s restarting failed container=kube-proxy pod=kube-proxy-7b89c_kube-system(5bebafa1-2c61-11e8-a50d-06a3d08e1972)
Mar 21 09:29:01 edge-1 kubelet[339]: E0321 09:29:01.190518     339 pod_workers.go:186] Error syncing pod 5bebafa1-2c61-11e8-a50d-06a3d08e1972 ("kube-proxy-7b89c_kube-system(5bebafa1-2c61-11e8-a50d-06a3d08e1972)"), skipping: failed to "StartContainer" for "kube-proxy" with CrashLoopBackOff: "Back-off 5m0s restarting failed container=kube-proxy pod=kube-proxy-7b89c_kube-system(5bebafa1-2c61-11e8-a50d-06a3d08e1972)"
Mar 21 09:29:02 edge-1 kubelet[339]: W0321 09:29:02.278342     339 cni.go:171] Unable to update cni config: No networks found in /etc/cni/net.d
Mar 21 09:29:02 edge-1 kubelet[339]: E0321 09:29:02.282534     339 kubelet.go:2120] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized
2

2 Answers

1
votes

You definitely have a problem with networking between Kubernetes master and nodes.

But, first of all, that is not a best idea to create that kind of hybrid installation. You must have a stable networking between master(s) and nodes, or it will cause many problems. But that is a hard to achieve using Internet connection.

If you want to prepare a Hybrid installation you can use Federation between Kubernetes cluster in AWS and on your local hardware.

But, regarded to your problem, I see that you have a problem with Weave net on a Master and on a edge-1 node. It is not clear from logs which kind of problem you have, try to run Weave container with WEAVE_DEBUG=1 environment variable. Without networking other pods like kube-dns will not work properly.

Also, how did you setup OpenVPN. You must have routing between subnet on AWS and client-to-client. So, all addresses which you using for setup your cluster on all nodes has to be routed between each other. Check another one time to which address you bind Kubernetes components and Weave and are that addresses routable.

1
votes
  1. This message explains one of the crashes:

FATA: 2018/03/21 09:11:26.881117 Inconsistent bridge state detected. Please do 'weave reset' and try again

Since it's slightly complicated to run the weave command on a Kubernetes node, just reboot the node and the bridge should be recreated from scratch.

  1. This message says it couldn't contact the node to get logs:

F0321 09:08:59.111156 24289 helpers.go:119] Error from server: Get https://192.168.2.106:10250/containerLogs/kube-system/kube-proxy-7b89c/kube-proxy: dial tcp 192.168.2.106:10250: getsockopt: connection timed out

Consider whether those hosts can reach each other on their regular network.