Readiness Probe does not allow access to an internal kubernetes service while pod is not ready

Question

Readiness Probe keeps the application in at a non-ready state. While being in this state the application cannot connect to any kubernetes service.

I'm using Ubuntu 18 for both master and nodes for my kubernetes cluster. (The problem still appeared when I used only master in the cluster, so I don't think this is a master node kind of problem).

I set up my kubernetes cluster with an Spring application, which uses hazelcast in order to manage cache. So, while using readiness probe, the application can't access a kubernetes service I created in order to connect the applications via hazelcast using the hazelcast-kubernetes plugin.

When I take out the readiness-probe, the application connects as soon as it can to the service creating hazelcast clusters successfully and everything works properly.

The readiness probe will connect to a rest api which its only response is a 200 code. However, while the application is going up, in the middle of the process it will start the hazelcast cluster, and as such, it will try to connect to the kubernetes hazelcast service which connects the app's cache with other pods, while the readiness probe hasn't been cleared and the pod is in a non-ready state due to the probe. This is when the application will not be able to connect to the kubernetes service and it will either fail or get stuck as a consequence of the configuration I add.

service.yaml:

apiVersion: v1
kind: Service
metadata:
  name: my-app-cluster-hazelcast
spec:
  selector:
    app: my-app
  ports:
  - name: hazelcast
    port: 5701

deployment.yaml:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-app-deployment
  labels:
    app: my-app-deployment
spec:
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 1
      maxSurge: 2
  replicas: 2
  selector:
    matchLabels:
      app: my-app
  template:
    metadata:
      labels:
        app: my-app
    spec:
      terminationGracePeriodSeconds: 180
      containers:
      - name: my-app
        image: my-repo:5000/my-app-container
        imagePullPolicy: Always
        ports:
        - containerPort: 5701
        - containerPort: 9080
        readinessProbe:
          httpGet:
            path: /app/api/excluded/sample
            port: 9080
          initialDelaySeconds: 120
          periodSeconds: 15
        securityContext:
          capabilities:
            add:
              - SYS_ADMIN
        env:
          - name: container
            value: docker

hazelcast.xml:

<?xml version="1.0" encoding="UTF-8"?>

<hazelcast
        xsi:schemaLocation="http://www.hazelcast.com/schema/config http://www.hazelcast.com/schema/config/hazelcast-config-3.11.xsd"
        xmlns="http://www.hazelcast.com/schema/config"
        xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">

    <properties>
        <property name="hazelcast.jmx">false</property>
        <property name="hazelcast.logging.type">slf4j</property>
    </properties>

    <network>
        <port auto-increment="false">5701</port>
            <outbound-ports>
                <ports>49000,49001,49002,49003</ports>
            </outbound-ports>
        <join>
            <multicast enabled="false"/>
            <kubernetes enabled="true">
                <namespace>default</namespace>
                <service-name>my-app-cluster-hazelcast</service-name>
            </kubernetes>
        </join>
    </network>
</hazelcast>

hazelcast-client.xml:

<?xml version="1.0" encoding="UTF-8"?>
<hazelcast-client
        xsi:schemaLocation="http://www.hazelcast.com/schema/client-config http://www.hazelcast.com/schema/client-config/hazelcast-client-config-3.11.xsd"
        xmlns="http://www.hazelcast.com/schema/client-config"
        xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">

    <properties>
        <property name="hazelcast.logging.type">slf4j</property>
    </properties>

    <connection-strategy async-start="false" reconnect-mode="ON">
        <connection-retry enabled="true">
            <initial-backoff-millis>1000</initial-backoff-millis>
            <max-backoff-millis>60000</max-backoff-millis>
        </connection-retry>
    </connection-strategy>

    <network>
        <kubernetes enabled="true">
            <namespace>default</namespace>
            <service-name>my-app-cluster-hazelcast</service-name>
        </kubernetes>
    </network>
</hazelcast-client>

Expected result:

The service is able to connect to the pods, creating endpoints in its description.

$ kubectl describe service my-app-cluster-hazelcast

Name:              my-app-cluster-hazelcast
Namespace:         default
Labels:            <none>
Annotations:       kubectl.kubernetes.io/last-applied-configuration:
                     {"apiVersion":"v1","kind":"Service","metadata":{"annotations":{},"name":"my-app-cluster-hazelcast","namespace":"default"},"spec":{"ports...
Selector:          app=my-app
Type:              ClusterIP
IP:                10.244.28.132
Port:              hazelcast  5701/TCP
TargetPort:        5701/TCP
Endpoints:         10.244.4.10:5701,10.244.4.9:5701
Session Affinity:  None
Events:            <none>

The application runs properly and shows two members in its hazelcast cluster and the deployment is shown as ready, the application can be fully accessed:

logs:

2019-08-26 23:07:36,614 TRACE [hz._hzInstance_1_dev.InvocationMonitorThread] (com.hazelcast.spi.impl.operationservice.impl.InvocationMonitor): [10.244.4.10]:5701 [dev] [3.11] Broadcasting operation control packets to: 2 members

$ kubectl get deployments

NAME                  READY   UP-TO-DATE   AVAILABLE   AGE
my-app-deployment   2/2     2            2           2m27s

Actual Result:

The service doesn't get any endpoint.

$ kubectl describe service my-app-cluster-hazelcast

Name:              my-app-cluster-hazelcast
Namespace:         default
Labels:            <none>
Annotations:       kubectl.kubernetes.io/last-applied-configuration:
                     {"apiVersion":"v1","kind":"Service","metadata":{"annotations":{},"name":"my-app-cluster-hazelcast","namespace":"default"},"spec":{"ports...
Selector:          app=my-app
Type:              ClusterIP
IP:                10.244.28.132
Port:              hazelcast  5701/TCP
TargetPort:        5701/TCP
Endpoints:
Session Affinity:  None
Events:            <none>

The application gets stuck with the connection-strategy enabled in hazelcast-client.xml with the following logs, keeping its own cluster with no communication and the deployment in a non-ready state forever:

logs:

22:54:11.236 [hz.client_0.cluster-] WARN com.hazelcast.client.connection.ClientConnectionManager - hz.client_0 [dev] [3.11] Unable to get alive cluster connection, try in 57686 ms later, attempt 52 , cap retrytimeout millis 60000
22:55:02.036 [hz._hzInstance_1_dev.cached.thread-4] DEBUG com.hazelcast.internal.cluster.impl.MembershipManager - [10.244.4.8]:5701 [dev] [3.11] Sending member list to the non-master nodes:

Members {size:1, ver:1} [
        Member [10.244.4.8]:5701 - 6a4c7184-8003-4d24-8023-6087d68e9709 this
]

22:55:08.968 [hz.client_0.cluster-] WARN com.hazelcast.client.connection.ClientConnectionManager - hz.client_0 [dev] [3.11] Unable to get alive cluster connection, try in 51173 ms later, attempt 53 , cap retrytimeout millis 60000
22:56:00.184 [hz.client_0.cluster-] WARN com.hazelcast.client.connection.ClientConnectionManager - hz.client_0 [dev] [3.11] Unable to get alive cluster connection, try in 55583 ms later, attempt 54 , cap retrytimeout millis 60000

$ kubectl get deployments

NAME                  READY   UP-TO-DATE   AVAILABLE   AGE
my-app-deployment   0/2     2            0           45m

Seems your probe don't work, try to find out why. Could be timeout if it's take more than 1 second to respond or any other reason. — EAT
@EAT_Py It is as intended, since the application is not up yet. For the application to run, hazelcast has to work, but since the readiness probe knows the application is not up, it seems that the app does not connect to the internal hazelcast kubernetes cluster, and because hazelcast does not work, the application never goes up, ending up in a deadlock. — Cristian Cordova
I guess the question would be, why cant I connect to a kubernetes service while the readiness probe hasn't been cleared? — Cristian Cordova
If readiness probe is not ok where the service will connect to? (as there is no instance of your application running in the cluster? — Bimal
I read in the docs that readiness probe blocks all access to the pod, so even if the pod tries to connect to the service, the service wont connect to anything back due to the readiness probe working as it should be. — Cristian Cordova

Mark Mark · Accepted Answer · 2019-08-27T15:10:52

Just to clarify:

As described by OP with reference to readiness probe:

The kubelet uses readiness probes to know when a Container is ready to start accepting traffic. A Pod is considered ready when all of its Containers are ready. One use of this signal is to control which Pods are used as backends for Services. When a Pod is not ready, it is removed from Service load balancers

Readiness Probe does not allow access to an internal kubernetes service while pod is not ready

2 Answers