0
votes

Our backend server is using Embedded Jetty 8.1.15 for few years now. Did not have any issues with it until recently, when we started doing load test scenario with a lot of concurrent users. Then we succeeded reproducing the issue even with small amount of users using JMeter (HTTP Sampler with concurrent pool of 1000 and KeepAlive). The communication between client and server is over TLS (with connection pooling on the client side) Client (connection pooling) -> TLS -> Server The behavior we see - at some point of time, a lot of threads are "stuck" (waiting on monitor on different methods) with the following stacktraces -

Thread "qtp438123546-99":
at java.security.SecureRandom.nextBytes(byte[ ]) (line: 457)
at sun.security.ssl.RandomCookie.<init>(java.security.SecureRandom) (line: 53)
at sun.security.ssl.ServerHandshaker.clientHello(sun.security.ssl.HandshakeMessage$ClientHello) (line: 522)
at sun.security.ssl.ServerHandshaker.processMessage(byte, int) (line: 213)
at sun.security.ssl.Handshaker.processLoop() (line: 925)
at sun.security.ssl.Handshaker$1.run() (line: 865)
at sun.security.ssl.Handshaker$1.run() (line: 862)
at java.security.AccessController.doPrivileged(java.security.PrivilegedExceptionAction, java.security.AccessControlContext)
at sun.security.ssl.Handshaker$DelegatedTask.run() (line: 1302)
at org.eclipse.jetty.io.nio.SslConnection.process(org.eclipse.jetty.io.Buffer, org.eclipse.jetty.io.Buffer) (line: 375)
at org.eclipse.jetty.io.nio.SslConnection.access$900(org.eclipse.jetty.io.nio.SslConnection, org.eclipse.jetty.io.Buffer, org.eclipse.jetty.io.Buffer) (line: 48)
at org.eclipse.jetty.io.nio.SslConnection$SslEndPoint.fill(org.eclipse.jetty.io.Buffer) (line: 678)
at org.eclipse.jetty.http.HttpParser.fill() (line: 1044)
at org.eclipse.jetty.http.HttpParser.parseNext() (line: 280)
at org.eclipse.jetty.http.HttpParser.parseAvailable() (line: 235)
at org.eclipse.jetty.server.AsyncHttpConnection.handle() (line: 82)
at org.eclipse.jetty.io.nio.SslConnection.handle() (line: 196)
at org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle() (line: 696)
at org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run() (line: 53)
at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(java.lang.Runnable) (line: 608)
at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run() (line: 543)
at java.lang.Thread.run() (line: 745)

OR sometimes on other Java Security synchronized API methods

Thread "qtp438123546-993":
at sun.security.ssl.SignatureAndHashAlgorithm.getSupportedAlgorithms(java.security.AlgorithmConstraints) (line: 155)
at sun.security.ssl.Handshaker.getLocalSupportedSignAlgs() (line: 422)
at sun.security.ssl.ServerHandshaker.clientHello(sun.security.ssl.HandshakeMessage$ClientHello) (line: 700)
at sun.security.ssl.ServerHandshaker.processMessage(byte, int) (line: 213)
at sun.security.ssl.Handshaker.processLoop() (line: 925)
at sun.security.ssl.Handshaker$1.run() (line: 865)
at sun.security.ssl.Handshaker$1.run() (line: 862)
at java.security.AccessController.doPrivileged(java.security.PrivilegedExceptionAction, java.security.AccessControlContext)
at sun.security.ssl.Handshaker$DelegatedTask.run() (line: 1302)
at org.eclipse.jetty.io.nio.SslConnection.process(org.eclipse.jetty.io.Buffer, org.eclipse.jetty.io.Buffer) (line: 375)
at org.eclipse.jetty.io.nio.SslConnection.access$900(org.eclipse.jetty.io.nio.SslConnection, org.eclipse.jetty.io.Buffer, org.eclipse.jetty.io.Buffer) (line: 48)
at org.eclipse.jetty.io.nio.SslConnection$SslEndPoint.fill(org.eclipse.jetty.io.Buffer) (line: 678)
at org.eclipse.jetty.http.HttpParser.fill() (line: 1044)
at org.eclipse.jetty.http.HttpParser.parseNext() (line: 280)
at org.eclipse.jetty.http.HttpParser.parseAvailable() (line: 235)
at org.eclipse.jetty.server.AsyncHttpConnection.handle() (line: 82)
at org.eclipse.jetty.io.nio.SslConnection.handle() (line: 196)
at org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle() (line: 696)
at org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run() (line: 53)
at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(java.lang.Runnable) (line: 608)
at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run() (line: 543)
at java.lang.Thread.run() (line: 745)

When everything is fine, before the problem appears, there is a connection pooling with between client and server with persistent connections (that can be seen in netstat), but when the problem appears there are lot of connections in different states, other that ESTABLISHED:

On the client side:

tcp6       0      0 client-host:39962       server-host:443         FIN_WAIT2
tcp6       0      1 client-host:45266       server-host:443         SYN_SENT
tcp6       0      0 client-host:46234       server-host:443         FIN_WAIT2
tcp6       0      0 client-host:38892       server-host:443         FIN_WAIT2
tcp6       0      0 client-host:39160       server-host:443         FIN_WAIT2
tcp6       0      0 client-host:39188       server-host:443         FIN_WAIT2
tcp6       0      1 client-host:43496       server-host:443         SYN_SENT
tcp6       0      0 client-host:46122       server-host:443         FIN_WAIT2
tcp6       0      1 client-host:44938       server-host:443         SYN_SENT
tcp6       0      0 client-host:46446       server-host:443         ESTABLISHED

On the server side:

tcp        0   2980 server-host:443         client-host-1:34964       LAST_ACK
tcp        0   2980 server-host:443         client-host-3:52430       LAST_ACK
tcp        0   2980 server-host:443         client-host-1:35922       LAST_ACK
tcp        0      0 server-host:443         client-host-1:38362       CLOSE_WAIT
tcp      236      0 server-host:443         client-host-3:58296       CLOSE_WAIT
tcp        0   2980 server-host:443         client-host-1:34980       LAST_ACK
tcp        0   2980 server-host:443         client-host-2:55748       LAST_ACK
tcp        0   2980 server-host:443         client-host-3:53376       LAST_ACK
tcp        0      0 server-host:443         client-host-1:40104       SYN_RECV
tcp        0      0 server-host:443         client-host-1:38718       CLOSE_WAIT
tcp        0   2980 server-host:443         client-host-2:54142       LAST_ACK
tcp        0   2980 server-host:443         client-host-1:50766       LAST_ACK
tcp        0      0 server-host:443         client-host-1:38604       CLOSE_WAIT
tcp      236      0 server-host:443         client-host-3:57604       CLOSE_WAIT
tcp        0   2980 server-host:443         client-host-2:55502       LAST_ACK
tcp        0   2980 server-host:443         client-host-2:58254       LAST_ACK
tcp        0   2980 server-host:443         client-host-1:38042       LAST_ACK
tcp        0      0 server-host:443         client-host-1:38222       CLOSE_WAIT
tcp        0   2980 server-host:443         client-host-3:47812       LAST_ACK
tcp        0   2980 server-host:443         client-host-1:60532       LAST_ACK
tcp        0   2980 server-host:443         client-host-2:54282       LAST_ACK
tcp        0      0 server-host:443         client-host-1:40978       SYN_RECV

Almost all the threads are unresponsive, CPU is very high and GC is consistently working

enter image description here

enter image description here

enter image description here

We also set the following flag into the JVM:

-Djava.security.egd=file:/dev/./urandom

In order the SecureRandom will be non-blocking (as opposed to /dev/random)

java version "1.8.0_05" Java(TM) SE Runtime Environment (build 1.8.0_05-b13) Java HotSpot(TM) 64-Bit Server VM (build 25.5-b02, mixed mode)

Kernel: 4.14.94-89.73.amzn2.x86_64 (but the problem is also appears on system with Kernel 2.6.32-696.20.1.el6.x86_64)

limits:

core file size          (blocks, -c) unlimited
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 151551
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
open files                      (-n) 65536
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 16384
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

When a component enters that state, it is unresponsive, the logs are not being written to the log file (btw, we use log4j2)

When a load on a component is stopped, it takes few minutes for component to recover and it becomes responsive again

Have anyone had similar behaviors in your Java Backend Components? Please comment/suggest a direction for investigation and/or solution

1
Jetty 8 is EOL (End of Life) as of November 2014. It is not recommended for production use, especially with SSL/TLS. Consider upgrading to a supported and stable release soon.Joakim Erdfelt

1 Answers

0
votes

SecureRandom attempts to read random bytes from an OS-provided randomness source (such as /dev/random), however this operation can hang if the system doesn't have enough entropy available.

A workaround is to use another randomness source that doesn't block (such as /dev/urandom). This can be configured by updating $JAVA_HOME/jre/lib/security/java.security:

securerandom.source=file:/dev/./urandom

Another option is to install Haveged which can speed up reads from /dev/random.

This bug report has additional details: https://bugs.java.com/bugdatabase/view_bug.do?bug_id=6521844