15
votes

I'm connecting a server process and a client process with a TCP connection, and I have to detect
that physical connection between the two machines is down. I'm trying to do this using the keepalive,
decreasing the default system wide values to:

TCP_KEEPIDLE=5
TCP_KEEPCNT = 5
TCP_KEEPINTVL = 1

When the connection goes down ( I disconnect the cable ) only the server in 10 seconds detect that the connection has been lost, the client just hangs on the send.

This is the client code:

#include <iostream>
#include <string.h>
#include <sys/socket.h>
#include <stdlib.h>
#include <arpa/inet.h>
#include <errno.h>
#include <netinet/tcp.h>

int main(int argc, char** argv) {
  char myVector[1600];

  int mySocket = socket(AF_INET, SOCK_STREAM, IPPROTO_IP);
  if (mySocket < 0 ) {
    std::cout << "error creating the socket" << strerror(errno) << std::endl;
    ::exit(-1);
 }

 struct sockaddr_in sin;
 memset( (char *)&sin, 0, sizeof( sin ) );
 sin.sin_addr.s_addr = inet_addr("192.168.21.27");
 sin.sin_port   = htons(7788);
 sin.sin_family = AF_INET;

 if ( connect( mySocket, (struct sockaddr *)&sin, sizeof( sin )) < 0 ) {
   std::cout << "Error on connection: " << strerror(errno) << std::endl;
   ::exit(-1);
 }

 int optval = 1;
 socklen_t optlen = sizeof(optval);

 /*Enabling keep alive*/
 if(setsockopt(mySocket, SOL_SOCKET, SO_KEEPALIVE, &optval, optlen) < 0) {
   std::cout << "Error setting SO_KEEPALIVE: " << strerror(errno) << std::endl;
 }

 optval = 5;
 optlen = sizeof(optval);
 if(setsockopt(mySocket, SOL_TCP, TCP_KEEPIDLE, &optval, optlen) < 0) {
    std::cout << "Error setting TCP_KEEPIDLE: " << strerror(errno) << std::endl;
 }

 optval = 5;
 optlen = sizeof(optval);
 if(setsockopt(mySocket, SOL_TCP, TCP_KEEPCNT, &optval, optlen) < 0) {
   std::cout << "Error setting TCP_KEEPCNT: " << strerror(errno) << std::endl;
 }

 optval = 1;
 optlen = sizeof(optval);
 if(setsockopt(mySocket, SOL_TCP, TCP_KEEPINTVL, &optval, optlen) < 0) {
   std::cout << "Error setting TCP_KEEPINTVL: " << strerror(errno) << std::endl;
 }

 for (;;) {
   ssize_t myRet= ::send(mySocket,
                                      myVector,
                                      sizeof(myVector),
                                     0);
   if (myRet < 0) {
     std::cout << "Error: " << strerror(errno) << std::endl;
     break;
   }
   std::cout << myRet << "."; std::cout.flush();
   sleep(1);
 }
}

I'm sure I'm missing something, but what ?

5

5 Answers

9
votes

TCP Keepalive isn't intended for this use.

If you want to detect outages at the application layer, do what protocols like SSH, IMAP and IRC do - implement an echo/ping type message at the application layer. Send them on a regular basis, and if you don't get a timely reply, the connection can be assumed to be down.

3
votes

We wondered about that question in our company a while ago : "how to detect that connection went down?". To adress this issue reliably, we had to implement a "heart-beat" system, ie the client regularly check (each second in our case) that the server is still there, by doing a pseudo-ping. If you don't want to do that, you can wait that the OS actually detects that connection went down, but don't expect it to be reliable...

3
votes

So, after further investigaion, even if "TCP Keepalive" is not intended for this use, I have discovered that keep alive probes are started to being sent on a "idle connection". The question is now: "when a connection is considered in idle state?". A connection is considered idle when there is no data "being transmitted" so if one of the two peers are blocked on a send(...) there are actually some data being transmitted and the connection is not considered idle. I guess the only option I have now is to do a ping/pong using sends/recv with timeout, declaring a connection "lost" when those timers expires.

0
votes

Gaetano, IMO, TCP keep-alives can be used to detect dead connections. In your example, the client might actually be hanging in the send waiting for the TCP retries to exhaust themselves. Depending on the back-off algorithm and TCP stack state machine, this can last several minutes without any keep-alive probes, and thus no way to exhaust keepcnt.

I assume that the server is mostly read-blocked, in which case, its keep-alives would be sent out every keepidle/slowhz seconds (slowhz is often 2 instead of 1), and it will detect the connection loss fairly quickly.

If you capture a packet trace with tcpdump, you'll see exactly what's happening on the wire.

0
votes

You should replace SOL_TCP with IPPROTO_TCP.
For more information follow these links