I have few clusters with 3 node in each cluster's nodepools in my GCP project and has auto-upgrade and repair enabled.
The auto upgrade began approximately 3 days ago and is still running for the GKE version: 1.12.10-gke.17.
Now as my clusters are opted in for the auto-upgrade and auto repair, few clusters are getting upgraded without issues and few others are running update/upgrade with issues
ON my first cluster, few of my pods went unschedulable and the suggested possible actions by GCP is to
- Enable Autoscaling in one or more node pools that have autoscaling disabled.
- Increase size of one or more node pools manually.
when I run "gcloud container clusters describe "clustername" "zone" "
I get details of the cluster. however, under the nodepools section
status: RUNNING_WITH_ERROR
statusMessage: 'asia-south1-a: Timed out waiting for cluster initialization; cluster
API may not be available: k8sclient: 7 - 404 status code returned. Requested resource
not found.'
version: 1.12.10-gke.17
NOTE:
I also see that the GCP suggests to
- Enable autoscaling in one or more node pools that have autoscaling disabled.
- Shrink one or more node pools manually.
because there is low resource requests.
Please let me know what other logs I can provide to resolve this issue.
UPDATE:
We went through these logs and google support believes that it could be that the kubelet might be failing to submit a Certificate Signing Request (CSR) or that it might have old invalid credentials. To assist on the troubleshooting, might you answer these questions:
- sudo journalctl -u kubelet > kubelet.log
- sudo journalctl -u kube-node-installation > kube-node-installation.log
- sudo journalctl -u kube-node-configuration > kube-node-configuration.log
- sudo journalctl -u node-problem-detector > node-problem-detector.log
- sudo journalctl -u docker > docker.log
- sudo journalctl -u cloud-init > cloud-init.log
Any node that starts running 1.13.12-gke.13 fails to connect to master. Anything else that's happening to nodes (e.g. recreation) is because they are trying to fix them in a repair loop and doesn't seem to be causing additional problems.
1.13.11-gke.14
? - Patrick W