3
votes

The Luasocket select function is supposed to tell when a socket can be read without blocking. It apparently can also be used to tell when a server socket is ready to accept a new connection however the documentation gives the following warning:

Another important note: calling select with a server socket in the receive parameter before a call to accept does not guarantee accept will return immediately. Use the settimeout method or accept might block forever.

Under what circumstances can accept block even when select told it was safe to read? Is there a way to force this problem to occur, for testing purposes?

2

2 Answers

2
votes

I don't know where they got that idea. Never seen it in over 20 years of network programming.

It can happen of course if you have multiple select() threads, but I would expect the document to say so if that was what was intended.

1
votes

This is summarized from Section 16.6 (Nonblocking accept) of the third edition of the late W.Richard Stevens' "Unix Network Programming", page 461-463. UNP is probably still the best available textbook on writing networking code.

Although you might think that accept cannot block after select indicates that a listening socket is ready, Stevens describes a race condition in some network stack implementations which can cause accept to block indefinitely. (A footnote attributes the description to "A.Gierth"). The problem is described by means of an echo client which:

  1. Connects to the server;

  2. Sets the SO_LINGER socket option on the connected socket;

  3. Immediately closes the socket. Because the SO_LINGER option has been set, closing the socket causes an RST (reset) to be sent.

Now, let's suppose the server is running but on a heavily-loaded machine. The modified echo client is run. The TCP connection causes the select call to return with an indication that there is a connection available. (Remember that the connection was actually accepted by the kernel and put into the accept queue; accept does not need to be executed for this to happen.)

However, the server code is interrupted by a process switch before the accept call is executed, and in the meanwhile, the client manages to finish steps (2) and (3). Then the kernel receives the reset from the client, and now the connection is no longer valid. It might, therefore, remove it from the accept queue.

So by the time the server code gets around to accepting the connection, there is no connection to accept, and the accept call blocks until the next connection, if there is one.

The behaviour described above might not actually happen. POSIX wants the accept call to fail with ECONNABORTED even if there is another available connection in the accept queue (which you also have to remember to deal with). According to Stevens:

In Section 5.11, we noted that when the client aborts the connection before the server calls `accept`, Berkeley-derived implementations do not return the aborted connection to the server, while other implementations should return `ECONNABORTED` but often return `EPROTO` instead.

Stevens' source code is available here, on the publisher's site; the modified client is nonblock/tcpcli03.c, and the modification to the server simply consists of sleeping for five seconds before calling accept. So you can try it on whatever systems you have available.

I don't believe that either FreeBSD or Linux exhibit the Berkeley-derived behaviour any more, although I'm pretty sure I remember it happening on FreeBSD (that could have been over a decade ago, and I no longer have a FreeBSD box handy to test it on.) OpenBSD seems to have been patched in 1999 to fix the problem (see patch to 2.4); probably the other Berkeley-derivatives made similar changes later. I have no idea about MacOSX (although it's probably the same as FreeBSD) or Windows. It might well be that no modern system exhibits the behavious, although it was surely observable when Stevens wrote UNP.

In any event, Stevens' advice is pretty simple, and it never hurts to be careful. What he suggests is:

  1. Always set a listening socket to non-blocking when you use select on it;

  2. If accept fails with EWOULDBLOCK, ECONNABORTED, EPROTO, or EINTR, ignore the error and return to the select loop.