How to perform a CHECKSUM for data integrity (BYTE ORDER)

Question

Some time ago I required a function to perform a checksum to test data integrity while exchanging datagrams over internet. Those days I found this function and just copied/pasted and used it.

Checksum (void* buf, int bufsize) {
    register int            sum    = 0;
             word           answer = 0;
    register word           *w     = (word*) buf;
    register int            nleft  = bufsize;

    while (nleft > 1) {
        sum += *w++;
        nleft -= 2;
    }

    if (nleft == 1) {
        *(byte*)(&answer) = *(byte*)w;
        sum += answer;
    }

    sum = (sum>>16) + (sum&0xFFFF);
    sum += (sum>>16);
    answer = ~sum;
    return answer;
}

Today I am working on another program that requires checksum to ensure data integrity, but this time I wanted to see how it works and have one doubt.

The algorithm sums all the words of the data buffer, then adds the carries (high word, if any) to the sum and finally negates the sum (one's complement), so it is the checksum value.

My concrete questions are:

Are the words from the buffer expected to be in little or big endian? My intuition tells me that the word bytes should be in BIG ENDIAN (by convention) so the checksum is the same for any machine, but this algorithm simply sums the values (on a x86 platform - little endian), what if the other side is a big endian platform? Would it still work if I sum the byte values inverted for little endian and directly for big endian?
If the buffer size is odd, the last byte is added directly to the sum, but is it added just as a value OR as the LOW ORDER byte of a word (considering that the following byte is 0 - HIGH ORDER byte in little endian)?

It is for a simple communication protocol with checksum support but must ensure that it will work for any platform independently of its architecture.

perform a checksum to test data integrity while exchanging datagrams over internet most internet protocols already do checksums - e.g. UDP. 1) There are different approaches - add byte order independent checksum (e.g., exor or any byte oriented one) and sender byte order. Your example does not detect words swapped. A simple one that does is Fletcher's sum. 2) welcome to padding. — greybeard
(Would you be surprised to learn there is a SO tag byte-order?) — greybeard
The use of register makes me think this code is from the 1980s. — melpomene
FYI addition-based algorithms are dinosaur things that wouldn't be used in real applications. They all use CRC. — Lundin

Lundin Lundin · Accepted Answer · 2019-03-12T07:42:04

Are the words from the buffer expected to be in little or big endian?

You can't know without a specific protocol in mind. The endianess of the protocol is sometimes called "network endianess". Traditionally most protocols use big endian, but there's no guarantees of that.

If the buffer size is odd, the last byte is added directly to the sum, but is it added just as a value OR as the LOW ORDER byte of a word (considering that the following byte is 0 - HIGH ORDER byte in little endian)?

Doesn't really matter, it will be bad for error detection either way - it will fail miserably for most double-bit errors. I would strongly suggest to use CRC instead: one of the standard CRC-16 or CRC-32 depending on data size.

How to perform a CHECKSUM for data integrity (BYTE ORDER)

2 Answers