47
votes

I open a TCP socket and connect it to another socket somewhere else on the network. I can then successfully send and receive data. I have a timer that sends something to the socket every second.

I then rudely interrupt the connection by forcibly losing the connection (pulling out the Ethernet cable in this case). My socket is still reporting that it is successfully writing data out every second. This continues for approximately 1hour and 30 minutes, where a write error is eventually given.

What specifies this time-out where a socket finally accepts the other end has disappeared? Is it the OS (Ubuntu 11.04), is it from the TCP/IP specification, or is it a socket configuration option?

2
Maybe this gives answer for you.SKi

2 Answers

74
votes

Pulling the network cable will not break a TCP connection(1) though it will disrupt communications. You can plug the cable back in and once IP connectivity is established, all back-data will move. This is what makes TCP reliable, even on cellular networks.

When TCP sends data, it expects an ACK in reply. If none comes within some amount of time, it re-transmits the data and waits again. The time it waits between transmissions generally increases exponentially.

After some number of retransmissions or some amount of total time with no ACK, TCP will consider the connection "broken". How many times or how long depends on your OS and its configuration but it typically times-out on the order of many minutes.

From Linux's tcp.7 man page:

   tcp_retries2 (integer; default: 15; since Linux 2.2)
          The maximum number of times a TCP packet is retransmitted in
          established state before giving up.  The default value is 15, which
          corresponds to a duration of approximately between 13 to 30 minutes,
          depending on the retransmission timeout.  The RFC 1122 specified
          minimum limit of 100 seconds is typically deemed too short.

This is likely the value you'll want to adjust to change how long it takes to detect if your connection has vanished.

(1) There are exceptions to this. The operating system, upon noticing a cable being removed, could notify upper layers that all connections should be considered "broken".

1
votes

If want a quick socket error propagation to your application code, you may wanna try this socket option:

TCP_USER_TIMEOUT (since Linux 2.6.37) This option takes an unsigned int as an argument. When the value is greater than 0, it specifies the maximum amount of time in milliseconds that transmitted data may remain unacknowledged before TCP will forcibly close the corresponding connection and return ETIMEDOUT to the application. If the option value is specified as 0, TCP will use the system default.

See full description on linux/man/tcp(7). This option is more flexible (you can set it on the fly, just right after a socket creation) than tcp_retries2 editing and exactly applies to a situation when you client's socket doesn't aware about server's one state and may get into so called half-closed state.