Why Socket Connection Blocked and TCP Kernel Keeps Retransmitting [ACK] packets

Question

We are facing a problem that from some time later, specific socket connection is blocked and tcp kernel of client side keeps retransmitting [ACK] packets.

The topology flow is as below:

   Client A ←→ Switch A ← Router A:NAT ← .. Internet .. 
               → Router B:NAT → Switch B ←→ Server B

Here are the packets captured by WireShark:
A) Server

1. 8013 > 6757 [PSH, ACK] Seq=56 Ack=132 Win=5840 Len=55     
2. 6757 > 8013 [ACK] Seq=132 Ack=111 Win=65425 Len=0

B) Client

//lines 3 and 4 are exactly the same as line 1 and 2      
3. 8013 > 13000 [PSH, ACK] Seq=56 Ack=132 Win=5840 Len=55      
4. 13000 > 8013 [ACK] Seq=132 Ack=111 Win=65425 Len=0     
5. 13000 > 8013 [PSH, ACK] Seq=132 Ack=111 Win=65425 Len=17     

[TCP Retransmission]          
6. 13000 > 8013 [PSH, ACK] Seq=132 Ack=111 Win=65425 Len=17

8013 is server port and 6757 is client NAT port.

Why does the TCP kernel keep transmitting [ACK] packets to tell the client it receives packet 1 (see packet 4, 5, and 6), even when the server has already received one [ACK] packet (see packet 2)? Neither side of the connection closes the socket when problem happens.

After packet 6, the connection is lost, and we can't send anything to the server via that socket anymore.

         psuedocode:  
         //client
         serverAddr.port =htons(8013) ;
         serverAddr.ip = inet_addr(publicIPB);
         connect(fdA, serverAddr,...);         

         //server
         listenfd = socket(,SO_STREAM,);
         localAddr.port = htons(8013);
         localAddr.ip = inet_addr(INADDR_ANY);
         bind(localAddr...)
         listen(listenfd, 100);

         ...
         //using select model
         select(fdSet, NULL, NULL, NULL);
         for(...)
         {
         if (FD_ISSET(listenfd))
            {
            ...
              }
         ...
         }

UPDATE
UP1. Here are the concrete steps to reproduce the problem

Given three computers which are PC1, PC2 and PC3. All three are behind RouterA while Server is behind RouterB.
Given two users which are U1 and U2. U1 logs in from PC1 and U2 logs in from PC3. Both U1 and U2 will build a tcp connection between itself and the Server. Now U1 is able to send data via its tcp connection to Server, then Server relays all data to U2. Everything works fine until this moment.

Denote the socket number which corresponds to Server endpoint of the TCP connection between U1 and Server: U1-OldSocketFd
Don't log out U1, and unplug the cable of PC1. Then U1 logs in from PC2, now it establishes a new TCP connection to the Server.

Denote the socket number which corresponds to Server endpoint of the TCP connection between U1 and Server: U1-NewSocketFd

From Server side, when it updates its Session with U1, it calls close(U1-OldSocketFd).

4.1. About 30 seconds after step 3, we found U1 IS NOT able to send any data to Server via its new TCP connection.

4.2. In step 3, if Server don't call close(U1-OldSocketFd) immediately (the same second new connection between U1 and Server is established), instead, Server calls close(U1-OldSocketFd) in more than 70-80 seconds, then everything works fine.

UP2. Router B uses Port Forwarding on port 8013.
UP3. Some parameters of the Linux OS which Server runs on.

    net.ipv4.tcp_tw_reuse = 1
    net.ipv4.tcp_tw_recycle = 1

what are the client NAT ports (those from server point of view) of U1 from PC1 and U1 from PC2? Just give example from one trial. And what are the internal IPs of PC1 and PC2, and the internal client ports? — Tomas
@Tomas The clients' NAT ports are allocated by NAT and seem to be random, in my example RouterA allocates the port number --6757 -- for the first connection between U1 and Server. The internal client ports are usually 13000, and if this port number is in use (by other application) the client tries to bind the next number which is 13001, if the port is still in use then 13002, 13003.. — Wallace
Steve, I want to see all of these numbers from one trial. It is important for the diagnostic. Include the local IPs please. — Tomas
@Tomas I will update this post later, adding the ports and ips for every tcp connection. I don't have them at this time. — Wallace

Celada Celada · Accepted Answer · 2013-03-04T01:06:29

After packets 1 (same as 3) and 2 (same as 4) have gone by, your client seems to be transmitting 17 bytes of data to the server (packet 5). I don't know how much later packet 5 comes after the first exchange of packets so I don't know after how much time this happens. Your pseudocode doesn't clarify it because it just shows the socket initialization, it doesn't show which side attempts to transmit what data at what time. A ladder diagram might be useful in this instance to represent your protocol exchanges.

In any case, the server apparently doesn't acknowledge the 17 bytes of data so they are transmitted again (packet 6).

Unless you have some problem with the network or with a firewall or NAT router or something else dropping packets, there shouldn't be any reason why the server is able to receive the earlier parts of the TCP exchange but apparently cannot receive packets 5 or 6. Once again, is there a large amount of time elapsed between the prior exchange of data and packet 5 (such as, enough time for a NAT router, firewall, or load balancer to expire the connection)?

Why Socket Connection Blocked and TCP Kernel Keeps Retransmitting [ACK] packets

2 Answers