3
votes

I am running a 12 node jvm ignite cluster. Eeach jvm runs on its own vmware node. I am using zookeeper to keep these ignite nodes in sync using tcp discovery. I have been seeing lot of node failures in zookeeper logs although the java processes are running, I don't know why some ignite nodes leave the cluster with "node failed" kind of errors. Vmware uses vmotion to do something what they call as "migration".I am assuming that is some kind of filesystem sync process between vmware nodes. I am also seeing pretty frequent "dumping pending object" and "Failed to wait for partition map exchange" kind of messages in the jvm logs for ignite. My env setup is as follows:

  • Apache Ignite 1.9.0
  • RHEL 7.2 (Maipo) runs on each of the 12 nodes
  • Oracle Jdk1.8.
  • Zookeeper 3.4.9

Please let me know your thoughts.

TIA

2
Could ntp settings cause any weird behavior? Some of the 12 nodes have ntp turned on and some don't. Some are ntp synchronized and others are not. - ZeroGraviti

2 Answers

1
votes

There are generally two possible reasons:

0
votes

VM Migrations sometimes involve suspending the VM. If the VM is suspended, it won't have a clean way to communicate with the rest of the cluster and will appear down.