0
votes

We recently saw this error. Server sent a response message (MSG1) to one of its clients (size of msg > 64KB). Something happened at the client's read, this message was not received at all. Neither the SocketTimeOutException nor the IOException were thrown. Server tried to send another message (MSG2) but was blocked in its write (the buffers at the TCP/IP level could have been full with MSG1). 2 hours elapsed before we realised this and had to restart everything. We managed to simulate the same thing by making client read slower (by pausing for 2s) and making the server send similar messages (size >64KB) as usual.

In the first place, we could see no reason why the client would read slowly, for months it has been able to cope with such messages without the problem. I would like to know (a) what causes this kind of deadlock say if the client's read

getInputStream().read(byBuf)
is slower than the server's write
getOutputStream().write(MSG1)
(byBuf is a Byte buffer of size 512Bytes) (b) Could a socket error or n/w error cause such blockage otherwise?

We are using jdk 1.6.0.

Many thanks!

2
If you can reproduce this with your test, you should fire up Wireshark and look at the network trace for what's going on. - nos

2 Answers

0
votes

The write blocks if the socket send buffer is full, which implies that the receiver's socket receive buffer is full, which implies that the reader is slower than the writer. As your reader was blocked in a read this suggests a network or kernel problem.

0
votes

Sounds like a network problem - perhaps a congested link or broken firewall between the two hosts. "Reproducing" the problem by adding delays to the client won't tell you anything interesting, merely that the OS-level buffering works as designed :)

You need to find out why the packets are being delayed, which generally means tcpdump/wireshark at both ends. If the issue only occurs every few months though then this is probably overkill - focus on improving how the app handles this scenario and/or how you detect if it reoccurs.

As an aside, Java doesn't allow a write timeout to be set, so to recover from this scenario the reader should call setSoTimeout before reading, then close the socket if a read throws SocketTimeoutException. The write should then fail with SocketException ("connection reset by peer"), although this may take a while if the link is slow/intermittent.