Are repeated recv() calls expensive?

Question

I have a question about a situation that I face quite often. From time to time I have to implement various TCP-based protocols. Most of them define variable-length data packets that begin with a common header ([packet ID, length, payload] or something really similar). Obviously, there can be two approaches to reading these packets:

Read header (since header length is usually fixed), extract the payload length, read the payload
Read all available data and store it in a buffer; parse the buffer afterwards

Obviously, the first approach is simple, but requires two calls to read() (or probably more). The second one is slightly more complicated, but requires less calls.

The question is: does the first approach affect the performance badly enough to worry about it?

Personally I wouldn't worry about the difference between one call per connection and two, because accepting a connection is likely much more expensive than the overhead of a system call. But I'm not a network guru so just a comment. Also perhaps consider that if the header indicates some kind of error or impossibility then in the first case, you don't have to read all the data and in the second you do. Consider for example an HTTP POST trying to upload 100MB to a non-existent URL. Ultimately receive() is called in a loop so it's not really up to you how many calls you make... — Steve Jessop
@Steve: Aside from a few extremely expensive syscalls (and of course all the ones that can be interrupted during filesystem access, etc.), most of the cost of a syscall is the fixed overhead from entering and exiting kernelspace. Making 2 recv calls where one would do will probably double the time spent there. If each is already accompanied by accept/close then it's only 33% more time (or less) you'll spend on the extra recv, but if this is a persistent connection, the increased cost could be high. — R.. GitHub STOP HELPING ICE
@R..: possibly "expensive" was the wrong word, I suppose I actually meant "time-consuming". Proper analysis really requires finding out what Roman D means by "performance". I had unthinkingly assumed a smallish number of simultaneous connections on the machine, so that the cost of syscalls is pretty much irrelevant provided you can keep up with data arriving, and 1 vs. 2 likely won't affect that. Accepting a connection therefore is "expensive" in the sense that a TCP handshake takes essentially forever. But you're right, that might not be relevant in any given case. — Steve Jessop
... if each header and each message-payload is 1 byte long, all on a single persistent connection operating at high bandwidth, and the server is CPU-bound, then I guess 1 vs 2 syscalls per message could double the throughput. — Steve Jessop
@Steve: Even if not, it could affect your electric bill. I don't know if many hosting services bill that way, but I'd love to find a colo that charges based on your electricity usage... I suspect if more hosting services took that into consideration for billing, people might start writing more efficient code in a hurry.... — R.. GitHub STOP HELPING ICE

Laurent G Laurent G · Accepted Answer · 2011-02-24T10:31:48

yes, system calls are generally expensive, compared to memory copies. IMHO it is particularly true on x86 architecture, and arguable on RISC machine (arm, mips, ...).

To be honest, unless you must handle hundreds or thousands of request per second, you will hardly notice the difference.

Depending on what is exactly the protocol, an hybrid approach could be the best. When the protocol uses a lot of small packets and less big ones, you can read the header and a partial amount of data. When it is a small packet, you win by avoiding a large memcpy, when the packet is big, you win by issuing a second syscall only for that case.

Are repeated recv() calls expensive?

5 Answers