Kubernetes cluster recreated from snapshots issue

Question

OVERVIEW:: I am studying for the Kubernetes Administrator certification. To complete the training course, I created a dual node Kubernetes cluster on Google Cloud, 1 master and 1 slave. As I don't want to leave the instances alive all the time, I took snapshots of them to deploy new instances with the Kubernetes cluster already setup. I am aware that I would need to update the ens4 ip used by kubectl, as this will have changed, which I did.

ISSUE:: When I run "kubectl get pods --all-namespaces" I get the error "The connection to the server localhost:8080 was refused - did you specify the right host or port?"

QUESTION:: Would anyone have had similar issues and know if its possible to recreate a Kubernetes cluster from snapshots?

Adding -v=10 to command, the url matches info in .kube/config file

kubectl get pods --all-namespaces -v=10 I0214 17:11:35.317678 6246 loader.go:375] Config loaded from file: /home/student/.kube/config I0214 17:11:35.321941 6246 round_trippers.go:423] curl -k -v -XGET -H "User-Agent: kubectl/v1.16.1 (linux/amd64) kubernetes/d647ddb" -H "Accept: application/json, /" 'https://k8smaster:6443/api?timeout=32s' I0214 17:11:35.333308 6246 round_trippers.go:443] GET https://k8smaster:6443/api?timeout=32s in 11 milliseconds I0214 17:11:35.333335 6246 round_trippers.go:449] Response Headers: I0214 17:11:35.333422 6246 cached_discovery.go:121] skipped caching discovery info due to Get https://k8smaster:6443/api?timeout=32s: dial tcp 10.128.0.7:6443: connect: connection refused I0214 17:11:35.333858 6246 round_trippers.go:423] curl -k -v -XGET -H "Accept: application/json, /" -H "User-Agent: kubectl/v1.16.1 (linux/amd64) kubernetes/d647ddb" 'https://k8smaster:6443/api?timeout=32s' I0214 17:11:35.334234 6246 round_trippers.go:443] GET https://k8smaster:6443/api?timeout=32s in 0 milliseconds I0214 17:11:35.334254 6246 round_trippers.go:449] Response Headers: I0214 17:11:35.334281 6246 cached_discovery.go:121] skipped caching discovery info due to Get https://k8smaster:6443/api?timeout=32s: dial tcp 10.128.0.7:6443: connect: connection refused I0214 17:11:35.334303 6246 shortcut.go:89] Error loading discovery information: Get https://k8smaster:6443/api?timeout=32s: dial tcp 10.128.0.7:6443: connect: connection refused

To me it looks like a kubeconfig is missing. Please make sure you have a .kube/config file and that it contains proper configuration. — Matt
If you created your cluster with kubeadm - copy file /etc/kubernetes/admin.conf to ~/.kube/config — Matt
Hey, I checked the ".kube/config" file to verify it was using the correct IP and it exists — liam08
I also checked and file "/etc/kubernetes/admin.conf" matches "~/.kube/config" — liam08
Run the same kubectl command but with -v=10 parameter and add the output to your question — Matt

Matt Matt · Accepted Answer · 2020-02-17T12:07:04

I replicated you issue and wrote this step by step debugging process for you so you can see what was my thinking.

I created 2 node cluster (master + worker) with kubeadm and made a snapshot. Then I deleted all nodes and recreated them from snapshots.

After recreating master node from snapshot I started seeing the same error you are seeing:

@kmaster ~]$ kubectl get po -v=10
I0217 11:04:38.397823    3372 loader.go:375] Config loaded from file:  /home/user/.kube/config
I0217 11:04:38.398909    3372 round_trippers.go:423] curl -k -v -XGET  -H "Accept: application/json, */*" -H "User-Agent: kubectl/v1.17.3 (linux/amd64) kubernetes/06ad960" 'https://10.156.0.20:6443/api?timeout=32s'
^C

The connection was hanging so I interrupted it (ctrl+c). First I noticed was that IP address of where kubectl was connecting was different than node ip, so I modified .kube/config file providing proper IP.

After doing this, here is what running kubectl showed:

$ kubectl get po -v=10
I0217 11:26:57.020744   15929 loader.go:375] Config loaded from file:  /home/user/.kube/config
...
I0217 11:26:57.025155   15929 helpers.go:221] Connection error: Get https://10.156.0.23:6443/api?timeout=32s: dial tcp 10.156.0.23:6443: connect: connection refused
F0217 11:26:57.025201   15929 helpers.go:114] The connection to the server 10.156.0.23:6443 was refused - did you specify the right host or port?

As you see, connection to apiserver was beeing refused so I checked if apiserver was running:

$ sudo docker ps -a | grep apiserver
5e957ff48d11        90d27391b780             "kube-apiserver --ad…"   24 seconds ago      Exited (2) 3 seconds ago                           k8s_kube-apiserver_kube-apiserver-kmaster_kube-system_997514ff25ec38012de6a5be7c43b0ae_14
d78e179f1565        k8s.gcr.io/pause:3.1     "/pause"                 26 minutes ago      Up 26 minutes                                      k8s_POD_kube-apiserver-kmaster_kube-system_997514ff25ec38012de6a5be7c43b0ae_1

api-server was exiting for some reason. I checked its logs (I am only including relevant logs for readability):

$ sudo docker logs 5e957ff48d11
...
W0217 11:30:46.710541       1 clientconn.go:1120] grpc: addrConn.createTransport failed to connect to {https://127.0.0.1:2379 0  <nil>}. Err :connection error: desc = "transport: Error while dialing dial tcp 127.0.0.1:2379: connect: connection refused". Reconnecting...
panic: context deadline exceeded

Notice apiserver was trying to connect to etcd (notice port: 2379) and receiving connection refused. My first guess was etcd wasn't running, so I checked etcd container:

$ sudo docker ps -a | grep etcd
4a249cb0743b        303ce5db0e90             "etcd --advertise-cl…"   2 minutes ago        Exited (1) 2 minutes ago                           k8s_etcd_etcd-kmaster_kube-system_9018aafee02ebb028a7befd10063ec1e_19
b89b7e7227de        k8s.gcr.io/pause:3.1     "/pause"                 30 minutes ago       Up 30 minutes                                      k8s_POD_etcd-kmaster_kube-system_9018aafee02ebb028a7befd10063ec1e_1

I was right: Exited (1) 2 minutes ago. I checked its logs:

$ sudo docker logs 4a249cb0743b
...
2020-02-17 11:34:31.493215 C | etcdmain: listen tcp 10.156.0.20:2380: bind: cannot assign requested address

etcd was trying to bind with old IP address.

I modified /etc/kubernetes/manifests/etcd.yaml and changed old IP address to new IP everywhere in file.

Quick sudo docker ps | grep etcd showed its running. After a while apierver also started running.

Then I tried running kubectl:

$ kubectl get po
Unable to connect to the server: x509: certificate is valid for 10.96.0.1, 10.156.0.20, not 10.156.0.23

Invalid apiserver certificate. SSL certificate was genereated for old IP so that would mean I need to generate new certificate with new IP.

$ sudo kubeadm init phase certs apiserver
...
[certs] Using existing apiserver certificate and key on disk

That's not what I expected. I wanted to generate new certificates, not use old ones.

I deleted old certificates:

$ sudo rm /etc/kubernetes/pki/apiserver.crt \
          /etc/kubernetes/pki/apiserver.key

And tried to generate certificates one more time:

$ sudo kubeadm init phase certs apiserver
[certs] Generating "apiserver" certificate and key
[certs] apiserver serving cert is signed for DNS names [kmaster kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [10.96.0.1 10.156.0.23]

Looks good. Now let's try using kubectl:

$ kubectl get no
NAME          STATUS   ROLES    AGE    VERSION
instance-21   Ready    master   102m   v1.17.3
instance-22   Ready    <none>   95m    v1.17.3

As you can see now its working.

Kubernetes cluster recreated from snapshots issue

1 Answers