I'm trying to achieve HA with three machines and having masters & slaves like below. I'm using VM's for local test setup and my observations are below.
Case 1:
m1 -> leader master
m2 -> non-leader master, slave1
m3 -> non-leader master, slave2
Case1.1: When I power off VM m1 machine, one of non-leader becomes leading and able to access cluster, working properly.
Case1.2: I power off m2 or m3 (any one of the vm with non-master & slave). I've seen message on webpage of m3 or m2 'No Master is currently leading'. when I try to access mesos in m1 and any one of the available machine(m2 or m3).
Case2:
m1->non-leader
m2->leader,slave1,
m3->non-leader,slave2
Case2.1: When I power off VM m1 machine, leader in m2 will be sustained and cluster works properly.
Case2.2: When I power off m2 (leader with slave), cluster becomes unavailable with error message 'No Master is currently leading' on web page.
Case2.3: When I power off m3 (non-leader with slave),cluster becomes unavailable with error message 'No Master is currently leading' on web page.
Apologies for trying HA with only 3 machines and lengthy problem explanation.
Questions :
Killing machine with both master(leading/non-leading) and slave will always lead to cluster unavailability? (case 1.2,2.2,2.3)
Can we achieve HA with three machines like above i.e having 3 masters and 2 slaves with masters and slaves on same machines?
Following are the configuration.
Masters :
m1 : mesos-master --ip=192.168.1.36 --hostname=192.168.1.36 --port=6060 --quorum=2 --cluster=mesosCluster --zk=zk://192.168.1.36:2181,192.168.1.42:2181,192.168.1.45:2181/mesos --work_dir=/opt/ncms/mesosWorkDir/ --log_dir=/opt/ncms/mesosWorkDir/logs
m2 : mesos-master --ip=192.168.1.42 --hostname=192.168.1.42 --port=6060 --quorum=2 --cluster=mesosCluster --zk=zk://192.168.1.36:2181,192.168.1.42:2181,192.168.1.45:2181/mesos --work_dir=/opt/ncms/mesosWorkDir/ --log_dir=/opt/ncms/mesosWorkDir/logs
m3 : mesos-master --ip=192.168.1.45 --hostname=192.168.1.45 --port=6060 --quorum=2 --cluster=mesosCluster --zk=zk://192.168.1.36:2181,192.168.1.42:2181,192.168.1.45:2181/mesos --work_dir=/opt/ncms/mesosWorkDir/ --log_dir=/opt/ncms/mesosWorkDir/logs
Slaves :
m2 : mesos-slave --ip=192.168.1.42 --hostname=192.168.1.42 --executor_registration_timeout=10mins --systemd_enable_support=false --master=zk://192.168.1.42:2181,192.168.1.45:2181,192.168.1.36:2181/mesos --containerizers=mesos,docker
m3 : mesos-slave --ip=192.168.1.45 --hostname=192.168.1.45 --executor_registration_timeout=10mins --systemd_enable_support=false --master=zk://192.168.1.42:2181,192.168.1.45:2181,192.168.1.36:2181/mesos --containerizers=mesos,docker
Zookeeper Config :
tickTime=2000
initLimit=10
syncLimit=5
dataDir=/opt/ncms/zkWorkDir
clientPort=2181
server.1=192.168.1.42:2888:3888 server.3=192.168.1.36:2888:3888
server.5=192.168.1.45:2888:3888
Setup :
Host: Windows 7 (64GB RAM, 24 Cores )
Virtual Box : each vm(m1, m2, m3) has 2 cores and 2 GB RAM with RHEL 7.2