1
votes

I have a Spring Boot app running with Spring Actuator enabled. I am using the Spring Actuator health endpoint to serve as the readiness and liveliness checks. All works fine with a single replica. When I scale out to 2 replicas both pods crash. They both fail readiness checks and end up in an endless destroy/re-create loop. If I scale them back in to 1 replica the cluster recovers and the Spring Boot app becomes available. Any ideas what might be causing this issue?

Here is the deployment config (the context root of the Spring Boot app is /dept):

apiVersion: apps/v1
kind: Deployment
metadata:
  name: gl-dept-deployment
  labels:
    app: gl-dept
spec:
  replicas: 1
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 1
      maxSurge: 1
  selector:
    matchLabels:
      app: gl-dept
  template:
    metadata:
      labels:
        app: gl-dept
    spec:
      containers:
      - name: gl-dept
        image: zmad5306/gl-dept:latest
        imagePullPolicy: Always
        ports:
        - containerPort: 8080
        livenessProbe:
          httpGet:
            path: /dept/actuator/health
            port: 8080
          initialDelaySeconds: 15
          periodSeconds: 10
          timeoutSeconds: 10
          successThreshold: 1
          failureThreshold: 5
        readinessProbe:
          httpGet:
            path: /dept/actuator/health
            port: 8080
          initialDelaySeconds: 15
          periodSeconds: 10
          timeoutSeconds: 10
          successThreshold: 1
          failureThreshold: 5
1
This is running in minikube in case that matters.zmad5306
When you curl the /dept/actuator/health endpoint (and you should be able to via kubectl exec since they will live at least 50 seconds), what is the error text that accompanies the non-200 response?mdaniel
The curl command hangs. It appears the entire minikube server hangs, dashboard quits responding. I do see this on the dashboard: Readiness probe failed: Get 172.17.0.2:8080/dept/actuator/health: dial tcp 172.17.0.2:8080: getsockopt: connection refused Other Spring Boot pods start failing their actuator based health check as well. With a slightly different message on the dashboard: Readiness probe failed: Get 172.17.0.10:8080/list/actuator/health: net/http: request canceled (Client.Timeout exceeded while awaiting headers) I wonder if I need to increase the health check timeout...zmad5306
This is quite strange, the dashboard locks up until I scale this deployment from 2 to 1. Once I issue: kubectl scale deployment gl-dept-deployment --replicas=1, the dashboard starts responding immediately. Also scaling my web server (Apache) doesn't seem to have this effect, its just the Spring Boot apps.zmad5306
Separately, that very problem is why the inclusion of resources: limits: memory: in a PodSpec's containers: is a great, great idea -- not just for minikube, but for the cluster, too, since kubernetes cannot make intelligent scheduling decisions without knowing how big each moving part ismdaniel

1 Answers

1
votes

The curl command hangs. It appears the entire minikube server hangs, dashboard quits responding

So in that case, I would guess the VM backing minikube is sized too small to handle all the items that are deployed inside it. I haven't played with minikube in order to know how much it carries over from its libmachine underpinnings, but in the case of docker-machine, one can provide --virtualbox-memory=4096 (or set an environment variable env VIRTUALBOX_MEMORY_SIZE=4096 docker-machine ...). And, of course, one should use the memory settings that correspond to the driver in use by minikube (so, HyperKit, xhyve, HyperV, whatever).