select() - practical use of writefds and exceptfds with non-blocking TCP sockets?

Question

According to Linux man pages, select supports three kinds of events for waking up:

readfds will be watched to see if characters become available for reading
writefds will be watched to see if space is available for write
exceptfds will be watched for exceptions

While looking for practical use examples with TCP sockets online and in networking books, I mostly see only readfds being used, even if the code tries to write to the socket later.

But the socket might be not ready for writing because we might have received it only in readfs set but not in writefds set. To avoid blocking on writes, I usually set the fd of the socket in non-block mode. Then if send fails, I could just queue the data into some internal buffer and send it out later (which means - next time when select() with readfs wakes up). But this seems dangerous - what if next readfs wakeup comes much later and the data to be written just sits in our buffer waiting, theoretically, forever?

Apple's documentation also recommends to use writefds: Using Sockets and Socket Streams, see section "Handling Events with Pure POSIX Code", quoting:

Call select in a loop, passing two separate copies of that file descriptor set (created by calling FD_COPY) for the read and write descriptor sets.

So the questions are:

Is Apple recommending to use writefds just because it's "the right official way" or maybe there are other approaches how to deal with socket writes without writefds? And Apple's recommendation seems suspicious to me. If we put the socket into writefds from the very start and then don't write to it for some time, won't select() wake up immediately just because the socket is writable (and that's because we haven't written to it yet)?
About exceptfds -I haven't yet seen any examples using it with TCP sockets. I have read that it is used for out-of-band data. Does that mean that I can ignore exceptfds for TCP sockets if I deal only with mainstream Internet traffic, such as HTTP, audio/video streaming, game servers etc.?

Regarding exceptfds, the Telnet protocol (also used in other protocols, e.g. in the FTP control connection) uses TCP URG (OOB) for some commands. IIRC Ctrl-C'ing a connection causes it to be used. — ninjalj

Jeremy Friesner Jeremy Friesner · Accepted Answer · 2017-02-01T21:09:45

Is Apple recommending to use writefds just because it's "the right official way" or maybe there are other approaches how to deal with socket writes without writefds?

The other approach (that you saw in the tutorials you looked at) is to assume that the write-buffer will always be large enough to immediately hold whatever data you want to send out to it, and just blindly call send() whenever you need to.

It simplifies the code, but it's not a very good approach -- maybe it's good enough for a toy/example program, but I wouldn't want to make that assumption in production-quality code, because it means something bad will happen if/when your program generates enough data at once to fill the socket's output buffer. Depending on how you (mis)handled the call to send(), either your program would go into a spin loop (calling send() and getting EWOULDBLOCK, over and over again, until there was finally enough room to place all the data), or error out (if you treated EWOULDBLOCK/short-send() as a fatal error condition), or drop some of the outgoing data bytes (if you were just ignoring send()'s return value completely). None of these is a graceful way to handle the full-output-buffer situation.

If we put the socket into writefds from the very start and then don't write to it for some time, won't select() wake up immediately just because the socket is writable (and that's because we haven't written to it yet)?

Yes, absolutely -- which is why you would only place the socket into the writefds set if you currently have some data that you want to write to the socket. In the case where you currently have no data that you want to write to the socket, you'd leave the socket out of writefds so that select() wouldn't immediately return.

About exceptfds -I haven't yet seen any examples using it with TCP sockets. I have read that it is used for out-of-band data.

Generally exceptfds isn't used for much (neither is TCP's out-of-band data feature, AFAIK). The only other time I've seen it used is when doing asynchronous/non-blocking TCP connects under Windows -- Windows uses the exceptfds to wake up select() when an asynchronous/non-blocking TCP connect attempt has failed.

Then if send fails, I could just queue the data into some internal buffer and send it out later (which means - next time when select() with readfs wakes up). But this seems dangerous - what if next readfs wakeup comes much later and the data to be written just sits in our buffer waiting, theoretically, forever?

Since TCP automatically slows the sender down to transmit at roughly the rate the receiver receives it at, it certainly is possible that the receiving program could simply stop calling recv(), eventually reducing the sender's transmission rate to zero. Or, alternatively, the network in between the sender and the receiver could start dropping so many packets that the transmission rate becomes effectively zero, even though the receiver is calling recv() like it is supposed to. In either case, that would mean that your queued data could very well sit in your outgoing-data buffer for a long time -- probably not forever in the latter case, since a completely bogged down TCP connection will eventually error out; and in the former case you need to debug the receiving side more than the sending side.

The real problem comes when your sender is generating data faster than your receiver can receive it (or, to put it another way, faster than the network can transport it) -- in that case, if you're queueing the "excess" data into a FIFO on the sender's side, that FIFO could grow without bound until eventually your sending process crashes due to memory exhaustion -- definitely not desirable behavior.

There are several ways to handle that; one way would be to simply monitor the number of bytes currently held in the FIFO, and when it reaches a certain threshold (e.g. one megabyte or something; what constitutes a "reasonable" threshold would depend on what your app is doing), the server could decide that the client simply can't perform well enough and close the sending socket in self-defense (and free up the associated FIFO queue of course). That works well in a lot of cases, although if your server ever generated/enqueued more than that amount of data instantaneously, it might suffer from false positives, and end up inappropriately disconnecting clients that were actually performing well.

Another approach (which I prefer, when possible) is to design the server so that it only generates more output data for a socket when there is currently no output-data queued up for that socket. i.e. when the socket selects as ready-for-write, drain as much existing data as you can from the FIFO queue into the socket. When the FIFO queue is empty and you have data you want to generate outgoing-bytes from and the socket is ready-for-write, that is the only time to generate some more output data bytes and place them into the FIFO queue. Repeat that forever, and your FIFO queue's size will never be greater than the amount of data you generated in one iteration of your generate-more-data-bytes step, no matter how slow the client is.

select() - practical use of writefds and exceptfds with non-blocking TCP sockets?

1 Answers