I created an azure AKS with 3 nodes(Standard DS3 v2 (4 vcpus, 14 GB memory)). I was fiddling with the cluster and created a Deployment with 1000 replicas.After this complete cluster went down.
azureuser@saa:~$ k get cs
NAME STATUS MESSAGE ERROR
controller-manager Unhealthy Get http://127.0.0.1:10252/healthz: dial tcp 127.0.0.1:10252: getsockopt: connection refused
scheduler Unhealthy Get http://127.0.0.1:10251/healthz: dial tcp 127.0.0.1:10251: getsockopt: connection refused
etcd-0 Healthy {"health": "true"}
From debugging it seems both Scheduler and Controller-manager went down. How to Fix this?
What exactly happened when created a Deployment with 1000 replicas? Should it be taken care by k8s?
Few debugging commands output:
kubectl cluster-info
Kubernetes master is running at https://cg-games-e5252212.hcp.eastus.azmk8s.io:443
Heapster is running at https://cg-games-e5252212.hcp.eastus.azmk8s.io:443/api/v1/namespaces/kube-system/services/heapster/proxy
KubeDNS is running at https://cg-games-e5252212.hcp.eastus.azmk8s.io:443/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy
kubernetes-dashboard is running at https://cg-games-e5252212.hcp.eastus.azmk8s.io:443/api/v1/namespaces/kube-system/services/kubernetes-dashboard/proxy
Logs for kubectl cluster-info dump @ http://termbin.com/e6wb
azureuser@sim:~$ az aks scale -n cg -g cognitive-games -c 4 --verbose
Deployment failed. Correlation ID: 4df797b2-28bf-4c18-a26a-4e341xxxxx. Operation failed with status: 200. Details: Resource state Failed
no nodes displayed
azureuser@si:~$ k get nodes
No resources found
kubectl get events- Suresh Vishnoikubectl get eventssaysNo resources found. - StateLesskubectl cluster-info- Suresh Vishnoi