4
votes

We are using Jersey Server-Sent Events (SSE) to allow remote components of our application to listen to events raised by our Jersey/Tomcat server. This works great.

However, it is crucial that our server have an accurate list of currently-connected listeners (our remote components). To this end, our server sends a tiny message to each caller (via eventOutput.write) once every five seconds. If our remote component is shut down while SSE-connected, or if the remote computer is powered off while SSE-connected, our server's eventOutput.write throws the ClientAbortException/SocketException exception shown below. That's perfect: we catch the exception, mark that caller as no longer connected, and move on.

Now, for the problem. As I mentioned, eventOutput.write throws an exception in cases where our remote component software is not running, or where the computer it runs on has been powered down. However, there are two cases where calling eventOutput.write to a no-longer-connected computer does NOT throw an exception: 1) if the Ethernet cable of the remote computer is simply pulled while the caller is SSE-connected, and 2) if the network adapter in the remote computer is turned off (i.e., by an administrative action) while the caller is SSE-connected. In these two cases, we can call eventOutput.write to the remote computer every five seconds for hours and no exception is thrown. This makes it impossible to detect that the remote computer is no longer connected.

I see that EventOutput (and ChunkedOutput) has very few methods and properties, but I wonder if there is any way to configure or use it that will cause an exception to be thrown when writing to a remote computer that has been disconnected by having its Ethernet cable pulled or network adapter turned off.

And here is the (good/useful) exception we get in cases where eventOutput.write DOES throw the exception we want:

org.apache.catalina.connector.ClientAbortException: null
    at org.apache.catalina.connector.OutputBuffer.doFlush(OutputBuffer.java:371) ~[catalina.jar:7.0.53]
    at org.apache.catalina.connector.OutputBuffer.flush(OutputBuffer.java:333) ~[catalina.jar:7.0.53]
    at org.apache.catalina.connector.CoyoteOutputStream.flush(CoyoteOutputStream.java:101) ~[catalina.jar:7.0.53]
    at org.glassfish.jersey.servlet.internal.ResponseWriter$NonCloseableOutputStreamWrapper.flush(ResponseWriter.java:303) ~[jaxrs-ri-2.13.jar:2.13.]
    at org.glassfish.jersey.message.internal.CommittingOutputStream.flush(CommittingOutputStream.java:292) ~[jaxrs-ri-2.13.jar:2.13.]
    at org.glassfish.jersey.server.ChunkedOutput$1.call(ChunkedOutput.java:240) ~[jaxrs-ri-2.13.jar:2.13.]
    at org.glassfish.jersey.server.ChunkedOutput$1.call(ChunkedOutput.java:190) ~[jaxrs-ri-2.13.jar:2.13.]
    at org.glassfish.jersey.internal.Errors.process(Errors.java:315) ~[jaxrs-ri-2.13.jar:2.13.]
    at org.glassfish.jersey.internal.Errors.process(Errors.java:242) ~[jaxrs-ri-2.13.jar:2.13.]
    at org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:347) ~[jaxrs-ri-2.13.jar:2.13.]
    at org.glassfish.jersey.server.ChunkedOutput.flushQueue(ChunkedOutput.java:190) ~[jaxrs-ri-2.13.jar:2.13.]
    at org.glassfish.jersey.server.ChunkedOutput.write(ChunkedOutput.java:180) ~[jaxrs-ri-2.13.jar:2.13.]
    at com.appserver.webservice.AgentSsePollingManager$ConnectionChecker.run(AgentSsePollingManager.java:174) ~[AgentSsePollingManager$ConnectionChecker.class:na]
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) [na:1.7.0_71]
    at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304) [na:1.7.0_71]
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178) [na:1.7.0_71]
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) [na:1.7.0_71]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_71]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_71]
    at java.lang.Thread.run(Thread.java:745) [na:1.7.0_71]
Caused by: java.net.SocketException: Broken pipe
    at java.net.SocketOutputStream.socketWrite0(Native Method) ~[na:1.7.0_71]
    at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:113) ~[na:1.7.0_71]
    at java.net.SocketOutputStream.write(SocketOutputStream.java:159) ~[na:1.7.0_71]
    at org.apache.coyote.http11.InternalOutputBuffer.realWriteBytes(InternalOutputBuffer.java:215) ~[tomcat-coyote.jar:7.0.53]
    at org.apache.tomcat.util.buf.ByteChunk.flushBuffer(ByteChunk.java:480) ~[tomcat-coyote.jar:7.0.53]
    at org.apache.coyote.http11.InternalOutputBuffer.flush(InternalOutputBuffer.java:119) ~[tomcat-coyote.jar:7.0.53]
    at org.apache.coyote.http11.AbstractHttp11Processor.action(AbstractHttp11Processor.java:799) ~[tomcat-coyote.jar:7.0.53]
    at org.apache.coyote.Response.action(Response.java:174) ~[tomcat-coyote.jar:7.0.53]
    at org.apache.catalina.connector.OutputBuffer.doFlush(OutputBuffer.java:366) ~[catalina.jar:7.0.53]
    ... 19 common frames omitted
1

1 Answers

0
votes

I do not think it will be possible to account for all the possible failures by adding the code around SSE sockets even if Jersey adds all information accessible from the socket interface. The only viable solution is proper two way communication. In case of SSE chunked output stream, pulled cable does not cause any interruption because nothing is supposed to tell it that remote host is now unreachable (until OS closes the socket).

Your first step is right - implement heartbeats every N seconds. Then all you need to do is to report back with another tiny http call every so often that you still listen. It is up to you to do acknowledgments every 5 seconds or every minute - depends on how fast do you need a problem detection.

You can do it in the same Jersey Resource by implementing @POST (in RESTful terms it reads "create new ack that you receive events").

Note: browsers are good at re-establishing SSE connection on their own in case of network interrupts, no need to fiddle with it.