Jenkins slave went offline during build

Question

Jenkins slave going offline during build. How can I fix this , I saw lot of related questions in SO and Jenkins issues but no one gave solution.

My configuration:

Jenkins version 1.651.1, Zuul version 2.1.1.dev393 with one Jenkins master(Ubuntu), 2 slaves(Ubuntu) each has 16GB of RAM Running builds in parallel.

Jenkins master, devstack and both nodepool slaves are in same IP range.

I'm facing an issue when one of the slave completes its build then the java process in both the slaves is getting killed so the other slave going offline.

I found this issue by listing out the processes running in the slaves and observed that java process is getting killed simultaneous in both slaves when one of the slave completed its build and the other slave is still running the build.

Previously I had this issue and that was resolved by switching to Oracle's JDK from Open JDK. Now slaves are using oracle java 1.8.0_111 but now we getting same issue with Oracle-java8 also

Build logs:

01:42:07 Slave went offline during the build
01:42:07 ERROR: Connection was broken: java.io.IOException: Unexpected termination of the channel
01:42:07    at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:50)
01:42:07 Caused by: java.io.EOFException
01:42:07    at java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2351)
01:42:07    at java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:2820)
01:42:07    at java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:804)
01:42:07    at java.io.ObjectInputStream.<init>(ObjectInputStream.java:302)
01:42:07    at hudson.remoting.ObjectInputStreamEx.<init>(ObjectInputStreamEx.java:48)
01:42:07    at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(    AbstractSynchronousByteArrayCommandTransport.java:34)
01:42:07    at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:48)
01:42:07 
01:42:07 Build step 'Execute shell' marked build as failure

Did you look into system messages log? Try to see if this issue and workaround is relevant to your case. — Fedor Losev
We saw this very regularly when master got very busy. We then allocate more "CPU"s to it. Not seen it after that(2 months so far). — Jayan
How are you running the master? Docker? What is the resource allocation for the master node? — Jayan

dildeepak dildeepak · Accepted Answer · 2016-12-07T19:01:52

The slaves goes offline, either because

The jobs running onto it are consuming more RAM than it is having or no memory left.

-If this is the case, try to have less number of executors in slaves or have more CPU/RAM in nodes.

Slave cleanup process might be running or some orphan process might be running in back , which is causing the connection break.

-Stop the cleanup process or kill the orphan process, which is consuming the memory.

SSH keys might got changed between master and slaves.

-Need to send the ssh keys to slaves via scp again and need to touch up once again.

Please try once and also read the below articles for more help.

Jenkins slave went offline during build

4 Answers