I am currently doing a POC for developing a distributed, fault tolerant, ETL ecosystem. I have selected Hazelcast for for my clustering (data+notification) purpose. Googling through Hazelcast resources took me to this link and it exactly matches how I was thinking to go about, using a map based solution.
I need to understand one point. Before that, allow me to give a canonical idea of our architecture:
Say we have 2 nodes A,B running our server instance clustered through hazelcast. One of them is a listener accepting requests (but can change on a fail over), say A.
A gets a request and puts it to a distributed map. This map is write-through backed by a persistent store and a single memory backup is configured on nodes.
Each instance has a local map entry listener, which on entry added event, would (asynchronous/queuing) process that entry and then remove it from the distributed map.
This is working as expected.
Question:
Say 10 requests have been received and distributed with 5 on each nodes. 2 entries on each node has been processed and now both instance crashes.
So there are total 6 entries present in the backing datastore now.
Now we bring up both the instances. As per documentation - "As of 1.9.3 MapLoader has the new MapLoader.loadAllKeys API. It is used for pre-populating the in-memory map when the map is first touched/used"
We implement loadAllKeys() by simply loading all the key values present in the store.
So does that mean there is a possibility where, both the instances will now load the 6 entries and process them (thus resulting in duplicate processing)? Or is it handled in a synchronized way so that loading is done only once in a cluster?
On server startup I need to process the pending entries(if any). I see that the data is loaded, however the entryAdded event is not fired. How can make the entryAdded event fire (or any other elegant way, by which I will know that there are pending entries on startup)?
Requesting suggestions.
Thanks, Sutanu