2
votes

We have a Jenkins pipeline with several stages (pipeline as code). One of these stages creates ~40-50 downstream jobs and starts them in a parallel step. Unfortunatelly our jenkins master reboots every night. After this reboot every job in the queue is lost and the currently running downstream jobs are stopped with an error. After a reconnect of the child nodes the pipeline is in resume state (=> console output: resuming build) but nothing happens.

Now I have following questions:

  • What exactly happens when the pipeline tries to resume? Does the pipeline starts from stage 1 again?
  • Is it possible to requeue the downstream jobs that were in the queue before?
1
Have you added checkpoints to your build? Resume option will start from the checkpoint it failed.Amol Manthalkar
Checkpoints plugin only seems to be available in the Cloudbees version not the open source version of Jenkins.Andrew Gray

1 Answers

2
votes

Just in each job that you use in the flow choose "don't not allow the pipeline to resume if the master restarts". In such matter this issue will not happen (with resume state).

options {
  disableResume()
}

As solution to resuming - use some "queue" checker. For example:

  1. for each build request create a unique named json/yaml file (config of build to launch) in some folder that you'r main job will check for file existence.
  2. If you found such file - launch the main job (with configuration "not allow concurrent builds"). Make some timeout for the job
  3. In the end of main job's launch - delete the file...

Or use some SQS queue if that is in AWS...