I have a sandbox gke cluster, with some services and some internal load balancers.
Services are mostly defined like this:
apiVersion: v1
kind: Service
metadata:
labels:
app: my-app
name: my-app
annotations:
service.beta.kubernetes.io/aws-load-balancer-internal: 0.0.0.0/0
cloud.google.com/load-balancer-type: "Internal"
spec:
ports:
- port: 80
protocol: TCP
targetPort: 8080
selector:
app: my-app
sessionAffinity: None
type: LoadBalancer
But eventually someone reports that the endpoint is not working anymore (like twice a week), I go investigate and the load balancer has no instance groups attached anymore.
The only "weird" things we do are to scale all our app's pods down to 0 replicas when out of business hours and to use preemptible instances on the node pool... I thought it could be related to the first, but I forced scale down some services now and their load balancers still fine.
It may be related to preemptible though, seems like that if the pods are all in one instance (specially kube-system pods), when the node goes down the pods go down at all once, and it seems like it can recover properly from that.
Other weird thing I see happening is the k8s-ig--foobar
coming to have 0 instances.
Has anyone experienced something like this? I couldn't find any docs about this.
kubectl describe service
says? Any interesting annotations or logs? – Yves Junqueira