0
votes

My long term understanding of sockets was that a call to recv() could not be relied upon to return the requested amount of data (be it blocking socket or not)

I was trying to prove this to a colleague by sending large (1mb) amounts of data in one call to send (from my home, to his home over a works VPN network - multiple routers wifi etc involved)

The sockets are blocking so I don't expect the send() to return anything but 1mb and do it in one go, but I did expect the recv() calls to return less than 1mb even though the socket is blocking.

The background for this test was to persuade him that we need a length and payload in the message protocol so you know where messages start/end, and you can't rely on one recv() call returning one message. Also to show him a simple recv() isn't even enough if we do have this protocol, we need to recv() in a loop, even for say the 4 byte message length field, incase recv() doesn't return the requested size.

Is my understanding of TCP comms wrong? Have I just been doing it overkill all these years? If not how can i force these recv() to come in fragmented?

2
I'm surprised you got all 1MB in a single receive. That's likely hundreds of packets buffered before returning from the recv() call. Try with 10MB, or even 100MB. - President James K. Polk
This is what I thought!, 1mb, without a split, unlikely. I started to wonder if its the VPN software, so that its irrelevant how many types of network are between us, the VPN S/W is effectively allowing unlimited size TCP packets to be reconstructed on the other side? Maybe I try with two laptops over wifi at home? - Jay Evans
Right, so I did a noddy app sending 16k heartbeats every second, and did it purely over wifi at home. Went outside house to the point the HB stopped and not once did i receive a partial recv(), it paused, but when I came back in range continued - Jay Evans
So something I've not mentioned, we are using SSL for the connection, but we did a lot of our tests on both SSL and non-SSL. My noddy 16k test above was done on SSL (C# SslStream wrapping TCPClient.GetStream). If my reader waits 5 sec at link up before issuing a read, and I ensure my sender sends 2 distinct messages in that time, the receiver gets 2 reads!!!!! But if i swap to non-ssl (either using Client.GetStream or Client.Client) then the reader gets both messages in 1 read. Im going to repeat the walking outside house test on non-ssl - Jay Evans
Walking to the brink of the wifi connection i cannot force the recv to get a partial read on a non-ssl socket. Am i baning my head against a wall trying to "force" this split condition? - Jay Evans

2 Answers

3
votes

persuade him that we need a length and payload in the message protocol so you know where messages start/end

You definitely need to, and if you need to persuade your coworker to do so, have them explain why not. One send() does not correspond to one recv(), period. That it might happen under certain conditions, sure. Internet is fast these days, and Nagle is enabled by default.

Try the opposite, send 1000 small messages and have them dissect the separate messages from the result of one or more recv() calls; they can't without proper message framing.

3
votes

Your colleague is dangerously wrong. You simply cannot get a guarantee from being unable to think of, or observe, a way something can go wrong. Either there is a guarantee that receives will correspond to sends or there is not.

Observations are not helpful here.

  1. If you don't observe receives not corresponding to sends, it could be a bug in the implementation in which you observe it.

  2. If you do observe receives corresponding to sends, it could be because of something unusual about the implementations you are testing on and the very next version of he OS, compiler, standard library, or the like might not follow the assumptions.

Either your colleague is relying on guaranteed behavior or they are not. And the answer is that they are not.

True story from the late 90's: I once had to debug some code that "always worked just fine" because it expected the first 12 bytes of a TCP connection to stay stuck together. It broke when an attack was discovered that involved using 8 specially crafted malicious bytes and the defense used at a customer's location was a filter that intercepted and checked the first eight bytes of all TCP connections before passing them on to the application. As a result, the application always read the first 8 bytes (from the filter) on its first call to recv, violating the (broken, idiotic) assumption that a read for 12 bytes at the beginning of a TCP connection would always get all 12.