1
votes

I am deploying cassandra on two public networks, when nodes are started i can see all the node has joined the ring. Also nodetool describecluster shows all nodes are reachable.

After sometime i see nodes are not able to connect to each other and nodetool describecluster shows all nodes in unreachable list.

FYI, i have used public_ip as BROADCAST_ADDRESS AND RPC_ADDRESS. Listen address is the private_ip.

1
Hmm, might want to tail the system.log files on each, and see why they are going unresponsive.Aaron

1 Answers

0
votes

One reason this can happen, is that firewalls are sometimes configured to find and kill idle connections. The Linux kernel has default TCP "keepalive" settings that it can use to refresh long-running connections. The default values for these settings can be seen using sysctl:

$ sudo sysctl -a | grep keepalive
net.ipv4.tcp_keepalive_intvl = 75
net.ipv4.tcp_keepalive_probes = 9
net.ipv4.tcp_keepalive_time = 7200

In an effort to combat this problem, DataStax recommends adjusting these values in production deployments:

$ sudo sysctl -w \
net.ipv4.tcp_keepalive_time=60 \
net.ipv4.tcp_keepalive_probes=3 \
net.ipv4.tcp_keepalive_intvl=10

You can also add each of those values to your system's equivalent of the/etc/sysctl.conf file (minus the backslashes) and implement that via sysctl also:

sudo sysctl -p /etc/sysctl.conf