I am for this example running the "echoheaders" Nginx in a deployment with 2 replicas. When I delete 1 pod, I sometimes get slow responses and errors for ~40 seconds.
We are running our API-gateway in Kubernetes, and need to be able to allow the Kubernetes scheduler to handle the pods as it sees fit.
We recently wanted to introduce session affinity, and for that, we wanted to migrate to the new and shiny NEG's: Network Endpoint Groups: https://cloud.google.com/load-balancing/docs/negs/
When using NEG we experience problems during failover. Without NEG we're fine.
deployment.yaml
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: echoheaders
labels:
app: echoheaders
spec:
replicas: 2
selector:
matchLabels:
app: echoheaders
template:
metadata:
labels:
app: echoheaders
spec:
containers:
- image: brndnmtthws/nginx-echo-headers
imagePullPolicy: Always
name: echoheaders
readinessProbe:
httpGet:
path: /
port: 8080
lifecycle:
preStop:
exec:
# Hack: wait for kube-proxy to remove endpoint before exiting, and
# gracefully shut down
command: ["bash", "-c", "sleep 10; nginx -s quit; sleep 40"]
restartPolicy: Always
terminationGracePeriodSeconds: 60
service.yaml
apiVersion: v1 kind: Service metadata: name: echoheaders labels: app: echoheaders annotations: cloud.google.com/neg: '{"ingress": true}' spec: ports: - port: 80 protocol: TCP targetPort: 8080 selector: app: echoheaders
ingress.yaml
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
annotations:
kubernetes.io/ingress.global-static-ip-name: echoheaders-staging
name: echoheaders-staging
spec:
backend:
serviceName: echoheaders
servicePort: 80
When deleting a pod I get errors as shown in this image of
$ httping -G -K 35.190.69.21
(https://i.imgur.com/u14MvHN.png)
This is new behaviour when using NEG. Disabling NEG gives the old behaviour with working failover.
Any way to use Google LB, ingress, NEG and Kubernetes without errors during pod deletion?