Readiness Probe keeps the application in at a non-ready state. While being in this state the application cannot connect to any kubernetes service.
I'm using Ubuntu 18 for both master and nodes for my kubernetes cluster. (The problem still appeared when I used only master in the cluster, so I don't think this is a master node kind of problem).
I set up my kubernetes cluster with an Spring application, which uses hazelcast in order to manage cache. So, while using readiness probe, the application can't access a kubernetes service I created in order to connect the applications via hazelcast using the hazelcast-kubernetes plugin.
When I take out the readiness-probe, the application connects as soon as it can to the service creating hazelcast clusters successfully and everything works properly.
The readiness probe will connect to a rest api which its only response is a 200 code. However, while the application is going up, in the middle of the process it will start the hazelcast cluster, and as such, it will try to connect to the kubernetes hazelcast service which connects the app's cache with other pods, while the readiness probe hasn't been cleared and the pod is in a non-ready state due to the probe. This is when the application will not be able to connect to the kubernetes service and it will either fail or get stuck as a consequence of the configuration I add.
service.yaml:
apiVersion: v1
kind: Service
metadata:
name: my-app-cluster-hazelcast
spec:
selector:
app: my-app
ports:
- name: hazelcast
port: 5701
deployment.yaml:
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-app-deployment
labels:
app: my-app-deployment
spec:
strategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 1
maxSurge: 2
replicas: 2
selector:
matchLabels:
app: my-app
template:
metadata:
labels:
app: my-app
spec:
terminationGracePeriodSeconds: 180
containers:
- name: my-app
image: my-repo:5000/my-app-container
imagePullPolicy: Always
ports:
- containerPort: 5701
- containerPort: 9080
readinessProbe:
httpGet:
path: /app/api/excluded/sample
port: 9080
initialDelaySeconds: 120
periodSeconds: 15
securityContext:
capabilities:
add:
- SYS_ADMIN
env:
- name: container
value: docker
hazelcast.xml:
<?xml version="1.0" encoding="UTF-8"?>
<hazelcast
xsi:schemaLocation="http://www.hazelcast.com/schema/config http://www.hazelcast.com/schema/config/hazelcast-config-3.11.xsd"
xmlns="http://www.hazelcast.com/schema/config"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<properties>
<property name="hazelcast.jmx">false</property>
<property name="hazelcast.logging.type">slf4j</property>
</properties>
<network>
<port auto-increment="false">5701</port>
<outbound-ports>
<ports>49000,49001,49002,49003</ports>
</outbound-ports>
<join>
<multicast enabled="false"/>
<kubernetes enabled="true">
<namespace>default</namespace>
<service-name>my-app-cluster-hazelcast</service-name>
</kubernetes>
</join>
</network>
</hazelcast>
hazelcast-client.xml:
<?xml version="1.0" encoding="UTF-8"?>
<hazelcast-client
xsi:schemaLocation="http://www.hazelcast.com/schema/client-config http://www.hazelcast.com/schema/client-config/hazelcast-client-config-3.11.xsd"
xmlns="http://www.hazelcast.com/schema/client-config"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<properties>
<property name="hazelcast.logging.type">slf4j</property>
</properties>
<connection-strategy async-start="false" reconnect-mode="ON">
<connection-retry enabled="true">
<initial-backoff-millis>1000</initial-backoff-millis>
<max-backoff-millis>60000</max-backoff-millis>
</connection-retry>
</connection-strategy>
<network>
<kubernetes enabled="true">
<namespace>default</namespace>
<service-name>my-app-cluster-hazelcast</service-name>
</kubernetes>
</network>
</hazelcast-client>
Expected result:
The service is able to connect to the pods, creating endpoints in its description.
$ kubectl describe service my-app-cluster-hazelcast
Name: my-app-cluster-hazelcast
Namespace: default
Labels: <none>
Annotations: kubectl.kubernetes.io/last-applied-configuration:
{"apiVersion":"v1","kind":"Service","metadata":{"annotations":{},"name":"my-app-cluster-hazelcast","namespace":"default"},"spec":{"ports...
Selector: app=my-app
Type: ClusterIP
IP: 10.244.28.132
Port: hazelcast 5701/TCP
TargetPort: 5701/TCP
Endpoints: 10.244.4.10:5701,10.244.4.9:5701
Session Affinity: None
Events: <none>
The application runs properly and shows two members in its hazelcast cluster and the deployment is shown as ready, the application can be fully accessed:
logs:
2019-08-26 23:07:36,614 TRACE [hz._hzInstance_1_dev.InvocationMonitorThread] (com.hazelcast.spi.impl.operationservice.impl.InvocationMonitor): [10.244.4.10]:5701 [dev] [3.11] Broadcasting operation control packets to: 2 members
$ kubectl get deployments
NAME READY UP-TO-DATE AVAILABLE AGE
my-app-deployment 2/2 2 2 2m27s
Actual Result:
The service doesn't get any endpoint.
$ kubectl describe service my-app-cluster-hazelcast
Name: my-app-cluster-hazelcast
Namespace: default
Labels: <none>
Annotations: kubectl.kubernetes.io/last-applied-configuration:
{"apiVersion":"v1","kind":"Service","metadata":{"annotations":{},"name":"my-app-cluster-hazelcast","namespace":"default"},"spec":{"ports...
Selector: app=my-app
Type: ClusterIP
IP: 10.244.28.132
Port: hazelcast 5701/TCP
TargetPort: 5701/TCP
Endpoints:
Session Affinity: None
Events: <none>
The application gets stuck with the connection-strategy enabled in hazelcast-client.xml with the following logs, keeping its own cluster with no communication and the deployment in a non-ready state forever:
logs:
22:54:11.236 [hz.client_0.cluster-] WARN com.hazelcast.client.connection.ClientConnectionManager - hz.client_0 [dev] [3.11] Unable to get alive cluster connection, try in 57686 ms later, attempt 52 , cap retrytimeout millis 60000
22:55:02.036 [hz._hzInstance_1_dev.cached.thread-4] DEBUG com.hazelcast.internal.cluster.impl.MembershipManager - [10.244.4.8]:5701 [dev] [3.11] Sending member list to the non-master nodes:
Members {size:1, ver:1} [
Member [10.244.4.8]:5701 - 6a4c7184-8003-4d24-8023-6087d68e9709 this
]
22:55:08.968 [hz.client_0.cluster-] WARN com.hazelcast.client.connection.ClientConnectionManager - hz.client_0 [dev] [3.11] Unable to get alive cluster connection, try in 51173 ms later, attempt 53 , cap retrytimeout millis 60000
22:56:00.184 [hz.client_0.cluster-] WARN com.hazelcast.client.connection.ClientConnectionManager - hz.client_0 [dev] [3.11] Unable to get alive cluster connection, try in 55583 ms later, attempt 54 , cap retrytimeout millis 60000
$ kubectl get deployments
NAME READY UP-TO-DATE AVAILABLE AGE
my-app-deployment 0/2 2 0 45m