4
votes

We are facing a problem that from some time later, specific socket connection is blocked and tcp kernel of client side keeps retransmitting [ACK] packets.

The topology flow is as below:

   Client A ←→ Switch A ← Router A:NAT ← .. Internet .. 
               → Router B:NAT → Switch B ←→ Server B

Here are the packets captured by WireShark:
A) Server

1. 8013 > 6757 [PSH, ACK] Seq=56 Ack=132 Win=5840 Len=55     
2. 6757 > 8013 [ACK] Seq=132 Ack=111 Win=65425 Len=0     

B) Client

//lines 3 and 4 are exactly the same as line 1 and 2      
3. 8013 > 13000 [PSH, ACK] Seq=56 Ack=132 Win=5840 Len=55      
4. 13000 > 8013 [ACK] Seq=132 Ack=111 Win=65425 Len=0     
5. 13000 > 8013 [PSH, ACK] Seq=132 Ack=111 Win=65425 Len=17     

[TCP Retransmission]          
6. 13000 > 8013 [PSH, ACK] Seq=132 Ack=111 Win=65425 Len=17         

8013 is server port and 6757 is client NAT port.

Why does the TCP kernel keep transmitting [ACK] packets to tell the client it receives packet 1 (see packet 4, 5, and 6), even when the server has already received one [ACK] packet (see packet 2)? Neither side of the connection closes the socket when problem happens.

After packet 6, the connection is lost, and we can't send anything to the server via that socket anymore.

         psuedocode:  
         //client
         serverAddr.port =htons(8013) ;
         serverAddr.ip = inet_addr(publicIPB);
         connect(fdA, serverAddr,...);         

         //server
         listenfd = socket(,SO_STREAM,);
         localAddr.port = htons(8013);
         localAddr.ip = inet_addr(INADDR_ANY);
         bind(localAddr...)
         listen(listenfd, 100);

         ...
         //using select model
         select(fdSet, NULL, NULL, NULL);
         for(...)
         {
         if (FD_ISSET(listenfd))
            {
            ...
              }
         ...
         }

UPDATE
UP1. Here are the concrete steps to reproduce the problem

  1. Given three computers which are PC1, PC2 and PC3. All three are behind RouterA while Server is behind RouterB.

  2. Given two users which are U1 and U2. U1 logs in from PC1 and U2 logs in from PC3. Both U1 and U2 will build a tcp connection between itself and the Server. Now U1 is able to send data via its tcp connection to Server, then Server relays all data to U2. Everything works fine until this moment.

    Denote the socket number which corresponds to Server endpoint of the TCP connection between U1 and Server: U1-OldSocketFd

  3. Don't log out U1, and unplug the cable of PC1. Then U1 logs in from PC2, now it establishes a new TCP connection to the Server.

    Denote the socket number which corresponds to Server endpoint of the TCP connection between U1 and Server: U1-NewSocketFd

    From Server side, when it updates its Session with U1, it calls close(U1-OldSocketFd).

4.1. About 30 seconds after step 3, we found U1 IS NOT able to send any data to Server via its new TCP connection.

4.2. In step 3, if Server don't call close(U1-OldSocketFd) immediately (the same second new connection between U1 and Server is established), instead, Server calls close(U1-OldSocketFd) in more than 70-80 seconds, then everything works fine.

UP2. Router B uses Port Forwarding on port 8013.
UP3. Some parameters of the Linux OS which Server runs on.

    net.ipv4.tcp_tw_reuse = 1
    net.ipv4.tcp_tw_recycle = 1
2
Someone voted this off topic but I feel it is on topic.Celada
what are the client NAT ports (those from server point of view) of U1 from PC1 and U1 from PC2? Just give example from one trial. And what are the internal IPs of PC1 and PC2, and the internal client ports?Tomas
@Tomas The clients' NAT ports are allocated by NAT and seem to be random, in my example RouterA allocates the port number --6757 -- for the first connection between U1 and Server. The internal client ports are usually 13000, and if this port number is in use (by other application) the client tries to bind the next number which is 13001, if the port is still in use then 13002, 13003..Wallace
Steve, I want to see all of these numbers from one trial. It is important for the diagnostic. Include the local IPs please.Tomas
@Tomas I will update this post later, adding the ports and ips for every tcp connection. I don't have them at this time.Wallace

2 Answers

1
votes

After packets 1 (same as 3) and 2 (same as 4) have gone by, your client seems to be transmitting 17 bytes of data to the server (packet 5). I don't know how much later packet 5 comes after the first exchange of packets so I don't know after how much time this happens. Your pseudocode doesn't clarify it because it just shows the socket initialization, it doesn't show which side attempts to transmit what data at what time. A ladder diagram might be useful in this instance to represent your protocol exchanges.

In any case, the server apparently doesn't acknowledge the 17 bytes of data so they are transmitted again (packet 6).

Unless you have some problem with the network or with a firewall or NAT router or something else dropping packets, there shouldn't be any reason why the server is able to receive the earlier parts of the TCP exchange but apparently cannot receive packets 5 or 6. Once again, is there a large amount of time elapsed between the prior exchange of data and packet 5 (such as, enough time for a NAT router, firewall, or load balancer to expire the connection)?

1
votes

Based on your steps to reproduce the issue and UPD3, it may be due to

net.ipv4.tcp_tw_recycle = 1

The reason is that the kernel is trying to recycle a TIME_WAIT connection before due time (thanks to tw_recycle).

This answer explains how tw_reuse and tw_recycle behave (NAT section is of interest here).

According to the steps to reproduce and observations 4-1 and 4-2, when you immediately call fclose() the connection enters TIME_WAIT state, from where tw_recycle can take on and assume that since this side has closed the connection, the socket can be recycled. Since the connection comes from the same host from the server's point of view, tw_recycle kicks in.

When you instead wait before calling fclose(), since no disconnect is triggered from the server's POV, it will assume that the connection is still alive, which prevents tw_recycle from kicking in, possibly/probably forcing the creation of a brand new connection.

According to 1, to be safe from protocol POV, you have 2 cases:

  • Disable both tw_reuse and tw_recycle
  • Enable tw_reuse, enable TCP timestamps, disable tw_recycle

tw_recycle will probably always trigger the no-connectivity condition, given your network topology.