I am running a 12 node jvm ignite cluster. Eeach jvm runs on its own vmware node. I am using zookeeper to keep these ignite nodes in sync using tcp discovery. I have been seeing lot of node failures in zookeeper logs although the java processes are running, I don't know why some ignite nodes leave the cluster with "node failed" kind of errors. Vmware uses vmotion to do something what they call as "migration".I am assuming that is some kind of filesystem sync process between vmware nodes. I am also seeing pretty frequent "dumping pending object" and "Failed to wait for partition map exchange" kind of messages in the jvm logs for ignite. My env setup is as follows:
- Apache Ignite 1.9.0
- RHEL 7.2 (Maipo) runs on each of the 12 nodes
- Oracle Jdk1.8.
- Zookeeper 3.4.9
Please let me know your thoughts.
TIA