1
votes

I’m building an Ansible recipe to deploy a mesos/marathon cluster (https://github.com/gridpocket/ansible-mesos-cluster).

Once everything is setup, the mesos and marathon ui are up but I have 2 problems:
- from the mesos ui I cannot see any slave registered
- the same ui also indicates "No master is currently leading..."

The setup is the following one:
- 3 mesos master (192.168.1.191, 192, 193): each running mesos-master, zookeeper, marathon
- 3 mesos slaves (192.168.1.194, 195, 196): each running mesos-slave, docker

Slaves configuration

In each slave:

/etc/mesos/zk:    
zk://192.168.1.191:2181,192.168.1.192:2181,192.168.1.193:2181/mesos

Masters configuration

On Each master:

/etc/mesos/zk: 
zk://192.168.1.191:2181,192.168.1.192:2181,192.168.1.193:2181/mesos

/etc/mesos-master/quorum:      
2

/etc/mesos-master/hostname and /etc/mesos-master/ip
IP_OF_THE_MASTER

Am I missing something in the configuration ?

EDIT

I rebuilt the whole cluster and corrected a zookeeper configuration (dataDir). Now,
- mesos master interface is working and indicates the master node
- marathon ui is working

On a slave machine, the mesos-slave process stops as soon as I start it.

The mesos-slave log is not very verbose about this problem:

Log file created at: 2015/07/09 15:51:15
Running on machine: vagrant-ubuntu-trusty-64
Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
I0709 15:51:15.487542  8133 logging.cpp:172] INFO level logging started!
I0709 15:51:15.488011  8133 main.cpp:156] Build: 2015-05-05 06:15:50 by root
I0709 15:51:15.488081  8133 main.cpp:158] Version: 0.22.1
I0709 15:51:15.488137  8133 main.cpp:161] Git tag: 0.22.1
I0709 15:51:15.488190  8133 main.cpp:165] Git SHA: d6309f92a7f9af3ab61a878403e3d9c284ea87e0

EDIT 2

When I start a slave manually, indicating the zk string, the slave starts correctly:

sudo /usr/sbin/mesos-slave --master=zk://192.168.1.191:2181,192.168.1.192:2181,192.168.1.193:2181/mesos

But the "sudo service mesos-slave start" does not enable to start the slave.

EDIT 3

I've changed the state from "latest" to "present" in the ansible playbook:

- name: install mesos + zookeeper
  apt: name=mesos state=present

- name: install marathon
  apt: name=marathon state=present

It is fine now, the slaves appears in the activated state in the mesos UI.

Was it due to a version problem ?

2
Mind sharing a master log? - rukletsov
The problem with the "No master is currently leading..." error message seems to be solved now (zookeeper configuration problem). I've updated the question with the slave log as the mesos-slaves process cannot be started. - Luc
To get more verbose logs, start the slave with GLOG_v=1 (or even 2). You can prefix this to your command like GLOG_v=1 ./bin/mesos-slave --master=[...] or add it under /etc/default/mesos-slave/ - Adam

2 Answers

2
votes

Any of the Mesos command-line parameters can be set as files like /etc/mesos-slave/master (for mesos-slave --master). This is how the service startup finds Mesos parameters.

You can also use /etc/default/mesos-slave/ (or -master/) for environment variables, or /etc/mesos/ for general parameters.

-1
votes

The slaves can be seen as activated within the mesos UI when I used the "present" state instead of the "latest" state in the Ansible playbook while installing mesos.

- name: install mesos + zookeeper
  apt: name=mesos state=present

- name: install marathon
  apt: name=marathon state=present