0
votes

I am running a Jenkins server on DCOS as documented here https://docs.mesosphere.com/1.7/usage/tutorials/jenkins/.

The Jenkins server is able to spawn new mesos slaves when new jobs are scheduled and kill them when the job is completed.

But if a cluster node crashes, having a Jenkins job running on it, Jenkins server doesn't re-run the job on other available nodes.

Is the Jenkins service on DCOS fault tolerant? Can we re-run the job(on some other available node) that failed due to cluster node crashed in between execution of the job?

1
1.7 is a really really old version of DC/OS. Are you already running it in production? If not I'd get started with DC/OS 1.11.1 (the most recent version). Not entirely sure about Jenkins but in case you haven't found the service docs they're here.Judith Malnick
@JudithMalnick , I am using the 1.8 version. Link I shared was just for reference. I am not running it on production. Doing a POC of Jenkins on DCOS. What I am really interested to know is that , is Jenkins job running on DCOS fault tolerant. Will they get re-trigger automatically on some other node if they failed due to crash of the cluster node.justcodeit

1 Answers

0
votes

Jenkins itself does not rerun jobs that disappear. It is not specific to DC/OS or Mesos, it's just the way Jenkins works.

DC/OS and Mesos will make sure that Jenkins stays running and available to send jobs to, and in this way, it is "fault tolerant", but in the way you are asking about it isn't.