2
votes

I have a nimbus server and 3 zookeeper nodes.

My storm.yaml file looks like this:

storm.zookeeper.servers:
 - "server1"
 - "server2"
 - "server3"

nimbus.host: "nimbus-server"

storm.local.dir: "/var/storm"

My zoo.cfg files all look like this:

tickTime=2000
initLimit=10
syncLimit=5
dataDir=/var/zookeeper
clientPort=2181
autopurge.snapRetainCount=3
autopurge.purgeInterval=1
server.3=server1:2888:3888
server.4=server2:2888:3888
server.5=server3:2888:3888

When all three zookeeper nodes are running, everything is fine according to the storm_ui. If I shut down one of these three nodes, the nimbus server complains that it can't connect to the zookeeper cluster and it dies. I can't find anywhere why this might be happening. The documentation says that if I have three zookeeper nodes, it should tolerate one of them dying. Is there something that has to be set in one of these for this to work?

1
Hi @numb3rs1x, do you use a supervisory process to manage your zookeepers?jumping_monkey
Hi @jumping_monkey I wish I could remember. We abandoned the project pretty shortly after this if I recall correctly. If I had to do it today, and I was not going to do it in kubernetes, I would use systemd on centos since it's what I'm familiar with. I guess that's a long way of "saying yes I (do) would".numb3rs1x
Thanks a lot for getting back @numb3rs1x. Unfortunately, my box is a Solaris 11, can't use systemd as i read(quick google) that those distros who adopted it, are Linux distros and i also do not want to consider SMF on Solaris(in case we move from Solaris to Linux in the future). Again, thanks!jumping_monkey

1 Answers

2
votes

This turned out to be iptables. There never was a quorum between the zookeeper servers, so in effect, after the one I stopped was out, it behaved just as it should have. I opened port 2181, 2888, and 3888 on the one server that didn't have them opened and now I can kill one of them with storm still alive.