I accidentally drained all nodes in Kubernetes (even master). How can I bring my Kubernetes back? kubectl is not working anymore:
kubectl get nodes
Result:
The connection to the server 172.16.16.111:6443 was refused - did you specify the right host or port?
Here is the output of systemctl status kubelet
on master node (node1):
● kubelet.service - Kubernetes Kubelet Server
Loaded: loaded (/etc/systemd/system/kubelet.service; enabled; vendor preset: enabled)
Active: active (running) since Tue 2020-06-23 21:42:39 UTC; 25min ago
Docs: https://github.com/GoogleCloudPlatform/kubernetes
Main PID: 15541 (kubelet)
Tasks: 0 (limit: 4915)
CGroup: /system.slice/kubelet.service
└─15541 /usr/local/bin/kubelet --logtostderr=true --v=2 --node-ip=172.16.16.111 --hostname-override=node1 --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --config=/etc/kubernetes/kubelet-config.yaml --kubeconfig=/etc/kubernetes/kubelet.conf --pod-infra-container-image=gcr.io/google_containers/pause-amd64:3.1 --runtime-cgroups=/systemd/system.slice --cpu-manager-policy=static --kube-reserved=cpu=1,memory=2Gi,ephemeral-storage=1Gi --system-reserved=cpu=1,memory=2Gi,ephemeral-storage=1Gi --network-plugin=cni --cni-conf-dir=/etc/cni/net.d --cni-bin-dir=/opt/cni/bin
Jun 23 22:08:34 node1 kubelet[15541]: I0623 22:08:34.330009 15541 kubelet_node_status.go:286] Setting node annotation to enable volume controller attach/detach
Jun 23 22:08:34 node1 kubelet[15541]: I0623 22:08:34.330201 15541 setters.go:73] Using node IP: "172.16.16.111"
Jun 23 22:08:34 node1 kubelet[15541]: I0623 22:08:34.331475 15541 kubelet_node_status.go:472] Recording NodeHasSufficientMemory event message for node node1
Jun 23 22:08:34 node1 kubelet[15541]: I0623 22:08:34.331494 15541 kubelet_node_status.go:472] Recording NodeHasNoDiskPressure event message for node node1
Jun 23 22:08:34 node1 kubelet[15541]: I0623 22:08:34.331500 15541 kubelet_node_status.go:472] Recording NodeHasSufficientPID event message for node node1
Jun 23 22:08:34 node1 kubelet[15541]: I0623 22:08:34.331661 15541 policy_static.go:244] [cpumanager] static policy: RemoveContainer (container id: 6dd59735cabf973b6d8b2a46a14c0711831daca248e918bfcfe2041420931963)
Jun 23 22:08:34 node1 kubelet[15541]: E0623 22:08:34.332058 15541 pod_workers.go:191] Error syncing pod 93ff1a9840f77f8b2b924a85815e17fe ("kube-apiserver-node1_kube-system(93ff1a9840f77f8b2b924a85815e17fe)"), skipping: failed to "StartContainer" for "kube-apiserver" with CrashLoopBackOff: "back-off 5m0s restarting failed container=kube-apiserver pod=kube-apiserver-node1_kube-system(93ff1a9840f77f8b2b924a85815e17fe)"
Jun 23 22:08:34 node1 kubelet[15541]: E0623 22:08:34.427587 15541 kubelet.go:2267] node "node1" not found
Jun 23 22:08:34 node1 kubelet[15541]: E0623 22:08:34.506152 15541 reflector.go:123] k8s.io/kubernetes/pkg/kubelet/kubelet.go:450: Failed to list *v1.Service: Get https://172.16.16.111:6443/api/v1/services?limit=500&resourceVersion=0: dial tcp 172.16.16.111:6443: connect: connection refused
Jun 23 22:08:34 node1 kubelet[15541]: E0623 22:08:34.527813 15541 kubelet.go:2267] node "node1" not found
I'm using Ubuntu 18.04, and there are 7 compute nodes in my cluster. All drained (accidentally, kind of!)! I've installed my K8s cluster using Kubespray.
Is there any way to uncordon any of these nodes? So that k8s necessary pods can be scheduled.
Any help would be appreciated.
Update:
I asked a seperate question about how to connect to etcd here: Can't connect to the ETCD of Kubernetes