0
votes

When using Berkeley socket api, what is the data type of content that is sent over the read/send or write/recv calls? For example -

char *msg = "Our Message!";
int len, bytes_sent;
len = strlen(msg);
bytes_sent = send(sockfd, msg, len, 0);

in this code, we are using char type, but are we limited to just char type since send/write/sendto usually take void * type. I've also seen arguments like if we send some int, it might actually be stored in little endian/big endian causing problems b/w source/dest if their endianess don't match. Then why doesn't char type suffers from this problem too?

Also different languages like C and C++ have different size of char too, then why isn't this a problem? If socket doesn't care any type and just sees the content as buffer, why don't we see random corruption of data when different tcp servers/clients are written in different languages and communicate with each other?

In short, what values(type) can I send safely through sockets?

2
C and C++ are different languages. Clarify your quesiton and state your specific problem. Then pick a language and remove the unrelated tag.too honest for this site
In C and C++, the literal 'a' must have a size of 1.user2100815
The API is irrelevant. An object file has no source code and using a library function with C ABI does not justify the C tag!too honest for this site
@Olaf This is obviously C code, unless you think the Berkley socket library is written in C++, and if anything it's the C++ tag that should be removed. In fact, both tags are perfectly fine, so please stop removing the C tag.user2100815
@NeilButterworth 1) Wrong: character constants have type int in C, thus they can have different size than char. 2) OP stated he uses C++, so the code is C++, not C. 3) If the library you use would be relevant every C++ question would justify the C tag, because it eventually calls C code somewhere. 4) A library has no source code, but follows an ABI. Please stop adding irrelevant tags.too honest for this site

2 Answers

5
votes

You cannot safely send anything through a raw socket and expect the receiver to make sense of it. For example, the sending process might be on a machine where the character encoding is EBCDIC, and the receiving process might be on a machine where the character encoding was ASCII. It's up to the processes to either negotiate a protocol to sort this out, or to simply say in their specifications "We are using ASCII (or whatever)".

Once you have got the character encodings worked out, transmit the data in text is my advice. This avoids all endian problems, and is easier to debug and log.

4
votes

The simplest answer is that the data is an uninterpreted stream of octets, that is to say 8-bit bytes. Any interepretation of it is done by the sender and receiver, and they better agree. You certainly need to take both the size and endianness of integers into account, and compiler alignment and padding rules too. This is why for example you should not use C structs as network protocols.