This is summarized from Section 16.6 (Nonblocking accept
) of the third edition of the late W.Richard Stevens' "Unix Network Programming", page 461-463. UNP is probably still the best available textbook on writing networking code.
Although you might think that accept
cannot block after select
indicates that a listening socket is ready, Stevens describes a race condition in some network stack implementations which can cause accept
to block indefinitely. (A footnote attributes the description to "A.Gierth"). The problem is described by means of an echo client which:
Connects to the server;
Sets the SO_LINGER
socket option on the connected socket;
Immediately closes the socket. Because the SO_LINGER
option has been set, closing the socket causes an RST
(reset) to be sent.
Now, let's suppose the server is running but on a heavily-loaded machine. The modified echo client is run. The TCP connection causes the select
call to return with an indication that there is a connection available. (Remember that the connection was actually accepted by the kernel and put into the accept queue; accept
does not need to be executed for this to happen.)
However, the server code is interrupted by a process switch before the accept
call is executed, and in the meanwhile, the client manages to finish steps (2) and (3). Then the kernel receives the reset from the client, and now the connection is no longer valid. It might, therefore, remove it from the accept queue.
So by the time the server code gets around to accept
ing the connection, there is no connection to accept, and the accept
call blocks until the next connection, if there is one.
The behaviour described above might not actually happen. POSIX wants the accept
call to fail with ECONNABORTED
even if there is another available connection in the accept queue (which you also have to remember to deal with). According to Stevens:
In Section 5.11, we noted that when the client aborts the connection before the server calls `accept`, Berkeley-derived implementations do not return the aborted connection to the server, while other implementations should return `ECONNABORTED` but often return `EPROTO` instead.
Stevens' source code is available here, on the publisher's site; the modified client is nonblock/tcpcli03.c
, and the modification to the server simply consists of sleeping for five seconds before calling accept
. So you can try it on whatever systems you have available.
I don't believe that either FreeBSD or Linux exhibit the Berkeley-derived behaviour any more, although I'm pretty sure I remember it happening on FreeBSD (that could have been over a decade ago, and I no longer have a FreeBSD box handy to test it on.) OpenBSD seems to have been patched in 1999 to fix the problem (see patch to 2.4); probably the other Berkeley-derivatives made similar changes later. I have no idea about MacOSX (although it's probably the same as FreeBSD) or Windows. It might well be that no modern system exhibits the behavious, although it was surely observable when Stevens wrote UNP.
In any event, Stevens' advice is pretty simple, and it never hurts to be careful. What he suggests is:
Always set a listening socket to non-blocking when you use select
on it;
If accept
fails with EWOULDBLOCK
, ECONNABORTED
, EPROTO
, or EINTR
, ignore the error and return to the select
loop.