0
votes

I am trying to upgrade my C++ TCP client program to be compatible with Windows 7. The program is set to use non-blocking socket and it works OK for Windows XP. However, when I ran the same code in Windows 7 and found that recv function in the socket class behaves differently.

When I try to disconnect the ethernet cable from the PC, both Windows XP and Windows 7 detects that the network is disconnected in the system tray. However,

In Windows XP, recv returns SOCKET_ERROR and WSAGetLastError returns WSAECONNRESET

In Windows 7, recv returns SOCKET_ERROR but WSAGetLastError returns WSAEWOULDBLOCK

I am curious why socket recv function still thinks that the socket is still connected (i.e. return WSAEWOULDBLOCK) and only detects there is a disconnection after keepalive timed out.

Also is there any alternative way to detect the network is disconnected using TCP socket library apart from checking the value of WSAGetLastError after calling recv function?

Many thanks!

2
It's sorta a philosophical design problem. If a link becomes electrically disconnected at L1, do you immediately signal a TCP disconnect at L2/3, or wait to see if the disconnected link, or some other link, becomes available?Martin James
Put another way - it's very common to implement a 'ping' at application-protocol level to try an monitor overall reachability. I usually use a default recv() timout of 5 seconds and a 'polling' flag. If the recv() times out and the flag is reset, I set the flag and 'ping' the peer with a 'just acknowledge' request. If the recv() times out and the flag is reset, I close the socket and signal 'disconnected' to the user. If any data at all is received, I reset the flag. This is easy to do with blocking designs, not sure about non-blocking.Martin James
@Martin James My program needs to report the TCP disconnect back to application layer almost immediately so I don't want to wait until keepalive timeout occurs. Also my program is non-blocking so I probably won't use the 'ping' approach.chesschi
As Martin said, doing an application layer ping is common... and it's common for a reason: it's the best/fastest way to determine the connectivity. If you wait on the TCP stack to error out, you could be waiting a long time.mark
@chesschi - problem is, removing a cable is not a TCP disconnect, it's a L1 link fail. See EJP answer - the TCP protocol will try very hard to deliver your data somehow, and will take some time to give up.Martin James

2 Answers

1
votes

I would say that Windows 7 is behaving correctly here, and that XP was wrong to react immediately to a cable pull. TCP was intended from the beginning to be highly fault tolerant, and specifically to allow cable pulls or indeed router outages without disrupting existing connections.

Is there any other way [etc]

The most reliable way to detect a TCP connection outage is to send to it. After enough buffering, retries, etc, TCP will give up sending and deliver a reset to the caller of send(). But it can take a while, and several calls.

0
votes

You may be using a looping select to wait for data, in which case the pseudo-code idea is this:

bool ping_outstanding = false;
while (running) {
   waitForDataMaxSeconds(5);
   if (dataAvailable()) {
      // all is good
      ping_outstanding = false;
      getData(); // Could be normal data or a ping response
   }
   else if ( !ping_outstanding ) {
      ping_outstanding = true;
      sendPing();
      // I should get data in response to this
   }
   else {
      // I've got connection trouble!
   }
}

You can, of course, adjust the amount you wait before sending a ping request, and adjust the amount of time you expect a response, based on your system requirements.

If you're using some other "data waiting" indication you can use an async timer instead of the loop but the idea is the same... if you don't receive data, send some that requests data. If you still don't receive data, assume there's a problem.