0
votes

I faced a problem using hazelcast IAtomic cluster logic. My Configuration is like that I have 4 nodes and first Node is master for hazelcast. Second node will be master if first node is going to down. Scenario steps are:

  1. I kill the first node and third node the same time.
  2. Hazelcast decided to select a new master node because of master is died. The new master node is 2nd Node.
  3. 2nd Node try to make migration and in here, system is going to out of memory because of Hazelcast. Hazelcast is trying to connect third node again-again (like infinite loop). In the end there will be no memory and system throw OutOfMemoryException

Hazelcast logs below

    Jul 10, 2018 2:52:36 PM com.hazelcast.nio.tcp.TcpIpConnection
INFO: [198.168.10.11]:5702 [dev] [3.10.2] Connection[id=21282, /198.168.10.14:50491->/198.168.10.14:5702, endpoint=[198.168.10.14]:5702, alive=false, type=MEMBER] closed. Reason: Connection closed by the other side
Jul 10, 2018 2:52:36 PM com.hazelcast.nio.tcp.TcpIpConnector
INFO: [198.168.10.11]:5702 [dev] [3.10.2] Connecting to /198.168.10.14:5702, timeout: 0, bind-any: true
Jul 10, 2018 2:52:36 PM com.hazelcast.nio.tcp.TcpIpAcceptor
INFO: [198.168.10.11]:5702 [dev] [3.10.2] Accepting socket connection from /198.168.10.14:56774
Jul 10, 2018 2:52:36 PM com.hazelcast.nio.tcp.TcpIpConnectionManager
INFO: [198.168.10.11]:5702 [dev] [3.10.2] Established socket connection between /198.168.10.14:5702 and /198.168.10.14:56774
Jul 10, 2018 2:52:36 PM com.hazelcast.nio.tcp.TcpIpConnectionManager
INFO: [198.168.10.11]:5702 [dev] [3.10.2] Established socket connection between /198.168.10.14:56774 and /198.168.10.14:5702
Jul 10, 2018 2:52:36 PM com.hazelcast.nio.tcp.TcpIpConnectionManager
WARNING: [198.168.10.11]:5702 [dev] [3.10.2] Wrong bind request from [198.168.10.11]:5701! This node is not the requested endpoint: [198.168.10.14]:5702
Jul 10, 2018 2:52:36 PM com.hazelcast.nio.tcp.TcpIpConnection
INFO: [198.168.10.11]:5702 [dev] [3.10.2] Connection[id=21284, /198.168.10.14:5702->/198.168.10.14:55083, endpoint=null, alive=false, type=MEMBER] closed. Reason: Wrong bind request from [198.168.10.11]:5701! This node is not the requested endpoint: [198.168.10.14]:5702
Jul 10, 2018 2:52:36 PM com.hazelcast.nio.tcp.TcpIpAcceptor
INFO: [198.168.10.11]:5702 [dev] [3.10.2] Accepting socket connection from /198.168.10.14:51198
Jul 10, 2018 2:52:36 PM com.hazelcast.nio.tcp.TcpIpConnectionManager
INFO: [198.168.10.11]:5702 [dev] [3.10.2] Established socket connection between /198.168.10.14:5702 and /198.168.10.14:51198
Jul 10, 2018 2:52:36 PM com.hazelcast.nio.tcp.TcpIpConnectionManager
WARNING: [198.168.10.11]:5702 [dev] [3.10.2] Wrong bind request from [198.168.10.11]:5701! This node is not the requested endpoint: [198.168.10.14]:5702
Jul 10, 2018 2:52:36 PM com.hazelcast.nio.tcp.TcpIpConnection

Node 1 and node 3 are on running the same server. Node 2 and Node 4 are on running the same server

The configuration(to initialize hazelcast) is :

Config config = new Config();
    NetworkConfig network = config.getNetworkConfig();
    JoinConfig join = network.getJoin();
    join.getMulticastConfig().setEnabled(true);
    join.getTcpIpConfig().setEnabled(false);

    config.setNetworkConfig(network);
    config.setInstanceName("instance");
1
I'm using Hazelcast version 3.10.2 - Okay Atalay
It's possible that remaining 2 members does not have enough heap space to carry all the data of 4 members, hence the oome. Did you consider the total data size you have on the cluster (lncluding backups), also the total heap size you have? - sertug
remaining nodes have 2GB memory when they are initialized. And also, Server has 4 gb extra memory. totally 6 gb memory is used. i think it is enough. If a kill only node 1 or node 3, there will be not problem. The problem is reprocuded only node1(master) and node3 are killed the same time. - Okay Atalay
is there anyone to suggest me a way to fix this issue ? - Okay Atalay
The use case and the description is not clear. It's best if you can attach logs of all members, and also take a heap dump by -XX:HeapDumpOnOutOfMemoryError during OOME, so it can be investigated. You can attach the logs and maybe screenshot of the heap dump in a google post at groups.google.com/forum/#!forum/hazelcast - sertug

1 Answers

0
votes

When i get the heap dump. I use MAT to analiyze and result is : MAT result

MAT Result