0
votes

We have the following setup for our jenkins master/slave nodes: one static & fixed master (no jobs will execute on master) and dynamic spin up of slaves and also we terminate AWS EC2 jenkins slaves if it is idle for 30 mins.

When it needs to execute a new job it will provision a slave based on the configuration defined in Manage Jenkins ---> Configure System ---> Cloud ---> Amazon EC2. It also has some init script section to install & configure according to our needs.

When a new node is spun up, initially we see it in offline status on the following page: https://jenkins.xxxxx.xxxxx.com/computer/

While it is executing the init script stuff (which takes around ~10 mins in our case to complete all the mentioned commands) a jenkins job which was waiting in the build queue is assigned to this new node and people are seeing the following message in the job console output and aborting the builds thinking that job has been assigned to an offline slave/node which no longer exists

Still waiting to schedule task

‘EC2 (Jenkins Slave) - jenkins-dev-docker-slave (i-xxxxxxxxx)’ is offline; ‘EC2 (Jenkins Slave) - jenkins-dev-docker-slave (i-xxxxxxxxxx)’ is offline

Is there any way for us to specify a configuration which waits till the slave node/agent becomes ready (Agent successfully connected and online) for executing the jobs instead of being allotted before the init script commands execution is complete ?

Note:- All these are pipeline jobs and linux slaves/nodes. Jenkins version: 2.190.1

Please let me know if anyone needs any other details.

Regards, Pavan

1
I have looked at this link stackoverflow.com/questions/29650602/… (EC2 plugin comment but still need help on this) - Pavan Tatikonda

1 Answers

0
votes

In our experience, the job or stage scheduled on a label will wait until at least one slave is online on that label. At that time, it will start executing on that label. If no slaves are available, the job/stage will wait forever, unless a timeout is specified on the job/stage. We can start a pipeline, make it wait for a label, spin up a slave, assign it a label, and the job will start executing there.

So your experience here is very different from ours.

It may be the case that your slave was online (and so a job was scheduled on it), but then for some reason it went offline. If that is the case, you may want to look after its status, or play around with MAX_DURABILITY settings.