0
votes

As the title suggest, I've worked with both winsock & boost sockets. I'm having an incredible difficulty detecting for disconnections.

First, I know that a disconnection can be discovered by the following:

  1. recv() / async_read() return a socket error or 0.
  2. send() / async_write() ... ... ...
  3. Whether the client closed manually, got interrupted / program closed - whatever.

So here's the problem scenarios:

I close my connection with closesocket(). The client detects the disconnect - all fine.

I close the program - there's a 50/50 chance the client fails to detect the disconnection. For some reason my overlapped IO WSARecv() isn't a guarantee detect.

I kill the process. The chances increase to 80% of detection. But for the rest of the 20% - here's what's bothering me. I implemented a keep-alive ping mechanism which sends data to the server. Even if I killed the program - the server is still async_writing() to the connection - even though it's not detected or dead.

Is this something I have to live with? I'm kind of lost since I tried everything in my power to detect disconnections... yet they're still a problem.

1
What protocol are you implementing on top of TCP? Does it have a protocol specification?David Schwartz
When a connection is killed/lost abnormally, the OS has no way of knowing it is gone for a (long) while, and so during that time it does not report errors, and buffers outgoing data waiting for the peer to accept the data. That is where socket-level keepalives and protocol-level pings comes into play. If a keepalive/ping times out and fails to respond in a timely matter, just close the socket regardless of its actual state.Remy Lebeau

1 Answers

3
votes

TCP doesn't guarantee that a side that is only receiving can detect a loss of connection. The protocol you're implementing on top of TCP should have taken this into account in its design. If not, the protocol is broken and you should complain loudly to whoever designed it.

If you're designing a protocol yourself, do not skip the step of documenting the protocol. This should always include whether it supports application-level messages, how they're framed, who transmits when, how disconnections are detected, any timeouts, and so on. An "off the cuff" protocol implemented without an actual design is basically doomed to fail and, even if it happens to work, will never be maintainable because it will be literally impossible to determine how it's supposed to work.

Once you have a protocol specification, it is at least possible to identify where the problem lies by following these simple steps:

  1. Does the server follow the specification? If not, stop. The server is broken.

  2. Does the client follow the specification? If not, stop. The client is broken.

  3. Stop. The specification is broken.

Without a specification, these questions are unanswerable. Much pain is caused in this way.