1
votes

I'm using mesosphere on 3 host over Ubuntu 14.04 as follow:

  • one with mesos master
  • two with mesos slave

All work fine, but after restart all physical hosts all scheduled job was lost. It's normal? I'm expected that zookeeper will store the current jobs, then when the system will need restart it, all jobs will be rescheduled after the master boot.

Update: I'm using marathon and mesos on a same node, and I'm run marathon with flag --zk

3
What scheduler are you using? - KirkSpaziani
@KirkSpaziani, I'm using marathon - enrique-carbonell
Could you check the zookeeper state? - drexin

3 Answers

0
votes

With marathon's --zk and --ha enabled, Marathon should be storing its state in ZK and recovering it on restart, as long as Mesos allows it to reregister with the same framework ID.

However, you'll also need to enable the Mesos registry (even for a single master), to ensure that Mesos persists information about what frameworkIds are registered in the event of master failover. This can be accomplished by setting the --registry=replicated_log (default), --quorum=1 (since you only have 1 master), and --work_dir=/path/to/registry (where to store the state).

0
votes

Although you found a solution, I'd like to explain more to this issue:)

In official doc:http://mesos.apache.org/documentation/latest/slave-recovery/

Note that if the operating system on the slave is rebooted, all executors and tasks running on the host are killed and are not automatically restarted when the host comes back up.

So all frameworks on Mesos will be killed after reboot. One way to restart the frameworks is to run all frameworks on Marathon, which will manage other frameworks and restart them in need.

However, then you need to auto-restart Marathon when it's killed. In the digitialocean link you mentioned, the Marathon is installed with script in init.d, so it can be restarted after rebooted. Otherwise, if you installed the Marathon via source code, you can use tools like supervisord to monitor Marathon.