We have configured 3 node cassandra cluster in RHEL 7.2 version and we are doing cluster testing. When we start cassandra in all 3 nodes they form a cluster and they work fine.
But when we bring one node down using "init 6" or "reboot" command, the rebooted node takes more time to join the cluster, however if we manually kill and start cassandra process the nodes join cluster immediately without any issues.
We have provided all 3 IPs as seed nodes and the cluster name is same for all 3 nodes and their respective IP as listen address.
Please help us in resolving this issue.
Thanks
Update Cassandra - 3.9 version
While investigating the issue further we noticed Node 1 (rebooted node) able to send "SYN", "ACK2" messages for both the nodes (Node 2, Node 3) even though nodetool status displays "Node 2 and 3 as "DN"" only in "Node 1" enter code here
After 10 - 15min we noticed "Connection Timeout" exception in Node 2 and 3. being thrown from OutboundTcpConnection.java (line # 311) which triggers a state change event to "Node 1" and changes the state as "UN".
if (logger.isTraceEnabled()) logger.trace("error writing to {}", poolReference.endPoint(), e);
Please let us know what triggers "Connection TimeOut" exception in "Node 2 and 3" and ways to resolve this.
We believe this issue is similar to https://issues.apache.org/jira/browse/CASSANDRA-9630