We are facing a problem that from some time later, specific socket connection is blocked and tcp kernel of client side keeps retransmitting [ACK] packets.
The topology flow is as below:
Client A ←→ Switch A ← Router A:NAT ← .. Internet ..
→ Router B:NAT → Switch B ←→ Server B
Here are the packets captured by WireShark:
A) Server
1. 8013 > 6757 [PSH, ACK] Seq=56 Ack=132 Win=5840 Len=55
2. 6757 > 8013 [ACK] Seq=132 Ack=111 Win=65425 Len=0
B) Client
//lines 3 and 4 are exactly the same as line 1 and 2
3. 8013 > 13000 [PSH, ACK] Seq=56 Ack=132 Win=5840 Len=55
4. 13000 > 8013 [ACK] Seq=132 Ack=111 Win=65425 Len=0
5. 13000 > 8013 [PSH, ACK] Seq=132 Ack=111 Win=65425 Len=17
[TCP Retransmission]
6. 13000 > 8013 [PSH, ACK] Seq=132 Ack=111 Win=65425 Len=17
8013 is server port and 6757 is client NAT port.
Why does the TCP kernel keep transmitting [ACK] packets to tell the client it receives packet 1 (see packet 4, 5, and 6), even when the server has already received one [ACK] packet (see packet 2)? Neither side of the connection closes the socket when problem happens.
After packet 6, the connection is lost, and we can't send anything to the server via that socket anymore.
psuedocode:
//client
serverAddr.port =htons(8013) ;
serverAddr.ip = inet_addr(publicIPB);
connect(fdA, serverAddr,...);
//server
listenfd = socket(,SO_STREAM,);
localAddr.port = htons(8013);
localAddr.ip = inet_addr(INADDR_ANY);
bind(localAddr...)
listen(listenfd, 100);
...
//using select model
select(fdSet, NULL, NULL, NULL);
for(...)
{
if (FD_ISSET(listenfd))
{
...
}
...
}
UPDATE
UP1. Here are the concrete steps to reproduce the problem
Given three computers which are PC1, PC2 and PC3. All three are behind RouterA while Server is behind RouterB.
Given two users which are U1 and U2. U1 logs in from PC1 and U2 logs in from PC3. Both U1 and U2 will build a tcp connection between itself and the Server. Now U1 is able to send data via its tcp connection to Server, then Server relays all data to U2. Everything works fine until this moment.
Denote the socket number which corresponds to Server endpoint of the TCP connection between U1 and Server: U1-OldSocketFd
Don't log out U1, and unplug the cable of PC1. Then U1 logs in from PC2, now it establishes a new TCP connection to the Server.
Denote the socket number which corresponds to Server endpoint of the TCP connection between U1 and Server: U1-NewSocketFd
From Server side, when it updates its Session with U1, it calls
close(U1-OldSocketFd)
.
4.1. About 30 seconds after step 3, we found U1 IS NOT able to send any data to Server via its new TCP connection.
4.2. In step 3, if Server don't call close(U1-OldSocketFd)
immediately (the same second new connection between U1
and Server is established), instead, Server calls
close(U1-OldSocketFd)
in more than 70-80 seconds, then
everything works fine.
UP2. Router B uses Port Forwarding on port 8013.
UP3. Some parameters of the Linux OS which Server runs on.
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_tw_recycle = 1