0
votes

So I open a udp socket (SOCK_DGRAM) and bind it. After sending traffic, the socket is closed. The same code is used to create a socket and bind (same address port). The call fails with errno 98 (address already in use). For TCP that makes sense to me since it goes in the TIME_WAIT state, but for UDP? Why does this happen?

To note, this only happens when the server is busy sending at high rates (e.g. 10Gb/s). The socket is wrapped in a c++ class providing RAII. Code below:

Socket::Socket(uint32_t srcIp, uint16_t port)
{
   fd = socket(AF_INET, SOCK_DGRAM, 0);
   sockaddr_in addr = {};
   addr.sin_addr.s_addr = srcIp;
   addr.sin_port = port;
   addr.sin_family = AF_INET;
   socklen_t len = sizeof(addr);
   
   
   if (bind(fd, reinterpret_cast<sockaddr*>(&addr), len) != 0) {
      // throw error
   }

}

Socket::~Socket()
{
   close(fd);
}

Interestingly, after the socket is destructed and recreated, the socket(AF_INET, SOCK_DGRAM, 0) call returns the same value for the file descriptor. I think that indicates that the OS has recycled the FD already and has processed the close. Yet, it doesn't like the bind. Weird to me that for UDP bind would behave that way since it is connectionless.

I don't want to use SO_REUSEADDR because I don't want to bind to a port already in use. I want to know if I already have a socket listening on that port. Or if that is the only way, then how can I know if the socket has been closed and the UDP is in whatever state it is in (can it even go in TIME_WAIT state?). The nature of this issue is a race condition that happens fast enough that I can't query netstat to see what state the zombie socket is in, because by then it will have disappeared.

I do set a gdb breakpoint on the destructor and then on the '// throw error' line. I see the destructor called, so I know close was called; and then the next breakpoint to hit is in the constructor (with the bind failed).

1
Do you fork any child processes? They would inherit the socket and hold the binding.Barmar
Pretty sure it doesn't. There are quite a few threads, but I don't think it forks.Gabe
I can't think of any other reason why the port would still be in use. As you say, TIME_WAIT only exists for TCP, not UDP.Barmar
Why close the socket at all? Keep it open.user207421
@Barmar -- I was able to catch a glimpse where the socket was being owned by another process. So the application is indeed forking. I think you solved it for me. Thank you! This was super hard to track down.Gabe

1 Answers

0
votes

It's been a while and an official answer wasn't provided. But the reason for this behavior was due to the process being forked in other code. This resulted in the socket inheriting the binding in the forked process as Barmar pointed out. In fact, boost was being used to fork the process and it wasn't closing file descriptors which I think is good practice to do so you don't hit scenarios like this one.