Split-brain discovery in Hazelcast cluster in Kubernetes

Question

I have the following setup.

My Vert.x verticles are clustered with Hazelcast and deployed on Kubernetes cluster with following network info:

------------------------------------------------
           TCP/IP NETWORK INFORMATION
------------------------------------------------
IP Entered = ..................: 10.60.0.0
CIDR = ........................: /14
Netmask = .....................: 255.252.0.0
Netmask (hex) = ...............: 0xfffc0000
Wildcard Bits = ...............: 0.3.255.255
------------------------------------------------
Network Address = .............: 10.60.0.0
Broadcast Address = ...........: 10.63.255.255
Usable IP Addresses = .........: 262,142
First Usable IP Address = .....: 10.60.0.1
Last Usable IP Address = ......: 10.63.255.254

The Hazelcast's cluster.xml has the following section:

<join>
  <multicast enabled="true">
    <multicast-group>224.2.2.3</multicast-group>
    <multicast-port>54327</multicast-port>
  </multicast>
</join>

All seems fine. When I start verticles in pods, I get the output (abbreviated):

>kubectl get pods --namespace develop -o wide

READY   STATUS    RESTARTS   AGE   IP        
1/1     Running   0          52m   10.60.4.18
1/1     Running   0          4m    10.60.6.19
1/1     Running   0          4m    10.60.1.16
1/1     Running   0          4m    10.60.1.18
1/1     Running   0          4m    10.60.6.18  
1/1     Running   0          4m    10.60.1.17
1/1     Running   0          4m    10.60.4.23
1/1     Running   0          8m    10.60.6.17
1/1     Running   0          4m    10.60.4.22
1/1     Running   0          4m    10.60.4.21
1/1     Running   0          4m    10.60.6.20
1/1     Running   0          5d    10.60.4.9

The problem is, that the clusters are groupped not by the group name specified, but rather by the 3rd number of the ip address. So, I'm getting a cluster of:

                      masterAddress=[10.60.1.17]:5701
                      Members[
                              [10.60.1.17]:5701
                              [10.60.1.16]:5701
                              [10.60.1.18]:5701]]

then 5 members for "cluster" 10.60.4.*, 4 members for 10.60.6.* and so on and they are not merging...

What am I missing?

TIA

Most likely the cluster members doesn't have access to other subnets. Can you please share the full log of a member? We can see if it discovered other members or not. — Alparslan Avci
I wouldn't use multicast on Kubernetes. The Vert.x Hazelcast cluster manager has documentation for Kubernetes deployment. It's not on the official website yet (in master branch, will be when 3.6 is released). — tsegismont
@tsegismont well we used to have it running with com.hazelcast:hazelcast-kubernetes:1.0.0 plugin but it had some limitations that all pods would have to run on the same ip address (as our devops-guy explained me). With multicast we could use different IPs. What I don't understand now, is why multicast works but creating several clusters instead of 1. — injecteer

Rafał Leszko Rafał Leszko · Accepted Answer · 2018-11-23T15:24:30

Hazelcast has a dedicated plugin for the discovery in Kubernetes. Please check: hazelcast-kubernetes.

Mutlicast may or may not work, since it depends on the underlying network. In my experience on GKE, it sometimes works, sometimes it doesn't. That is why multicast-based discovery is never recommended for Kubernetes.

Resources:

Split-brain discovery in Hazelcast cluster in Kubernetes

1 Answers