Apache Ignite - data rebalancing doesn't work properly

Question

I have a cluster of 2 nodes (A & B).

I put 2 data elements in cache from Node A (e.g. I put firstName and lastName in cache).

Now I am reading those data elements from cache on Node B. Node B is able to read both of elements from cache successfully.

Now we shut down Node A. Node B isn't able to read one of data object from cache. I have added a fully working application to reproduce the issue. I have also added README.md file in code with exact steps to reproduce the issue.

https://github.com/manish-panwar/ignite-data-rebalancing-issue

I am making sure that backup count is set to 1 and I can see both the nodes are joining fine as you can see it from the logs below. These logs are from younger node B.

Topology snapshot [ver=2, servers=2, clients=0, CPUs=9, heap=3.7GB]
Added listener for disabled event type: CACHE_OBJECT_REMOVED
Skipping rebalancing (nothing scheduled) [top=AffinityTopologyVersion [topVer=2, minorTopVer=1], evt=DISCOVERY_CUSTOM_EVT, node=1d01e477-172d-4c57-aade-6abe9773aa99]

These logs are from older Node A :

    Added new node to topology: TcpDiscoveryNode [id=bea211c9-8806-4c5c-91f3-c07dab543de9, addrs=[10.44.72.188], sockAddrs=[/10.44.72.188:47500], discPort=47500, order=2, intOrder=2, lastExchangeTime=1474674697922, loc=false, ver=1.7.0#20160801-sha1:383273e3, isClient=false]
    Topology snapshot [ver=2, servers=2, clients=0, CPUs=9, heap=3.7GB]
    Skipping rebalancing (nothing scheduled) [top=AffinityTopologyVersion [topVer=2, minorTopVer=0], evt=NODE_JOINED, node=bea211c9-8806-4c5c-91f3-c07dab543de9]
    Skipping rebalancing (nothing scheduled) [top=AffinityTopologyVersion [topVer=2, minorTopVer=1], evt=DISCOVERY_CUSTOM_EVT, node=1d01e477-172d-4c57-aade-6abe9773aa99]

Valentin Kulichenko Valentin Kulichenko · Accepted Answer · 2016-09-23T22:53:59

You set ipFinder.setShared(true) which is wrong. When TcpDiscoveryVmIpFinder is used in shared mode, nodes will discover each other only if they are running in the same JVM and are sharing the same instance of the finder. So if you used this exact code, nodes were not discovering each other. Check that the older node log has a line like below, it should be printed out when the second node joins topology.

Topology snapshot [ver=2, servers=2, clients=0, CPUs=4, heap=7.1GB]

Also make sure that you have at least one backup configured. Otherwise you will most likely lose part or all the data after losing a node.

--UPDATE--

IgniteCacheConfig sets groupName as the cache name (SEG by default). This means that Ignite will create a cache with this name on startup. But the app then uses cache with name someCache. Since there is no configuration for this cache, default settings will be used and this means no backups. When I do setName("someCache") in the cache configuration, I never lose the data when stopping one of the nodes.

Apache Ignite - data rebalancing doesn't work properly

1 Answers