C/C++ code to convert big endian to little endian

Question

I've seen several different examples of code that converts big endian to little endian and vice versa, but I've come across a piece of code someone wrote that seems to work, but I'm stumped as to why it does.

Basically, there's a char buffer that, at a certain position, contains a 4-byte int stored as big-endian. The code would extract the integer and store it as native little endian. Here's a brief example:

char test[8] = { 0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07};
char *ptr = test;
int32_t value = 0;
value =  ((*ptr) & 0xFF)       << 24;
value |= ((*(ptr + 1)) & 0xFF) << 16;
value |= ((*(ptr + 2)) & 0xFF) << 8;
value |= (*(ptr + 3)) & 0xFF;
printf("value: %d\n", value);

value: 66051

The above code takes the first four bytes, stores it as little endian, and prints the result. Can anyone explain step by step how this works? I'm confused why ((*ptr) & 0xFF) << X wouldn't just evaluate to 0 for any X >= 8.

Because char values are promoted to int before arithmetic is done. Note: should be using unsigned char * and uint32_t. — Weather Vane
Your code is independent from endianess, it will print 66051 on little and big endian machines. value is stored in the endianess of the machine, not always in little endian. — mch
The & 0xFF is only necessary for signed values, to strip off the extra bits when a negative char value is sign-extended to int. One reason to use unsigned, as well as dubious shifting into the sign bit. — Weather Vane
@WeatherVane: Using signed char and signed integers is indeed not beautiful, but does not change anything of the functionality of this swapping procedure. — Tom Kuschel

Edwin Buck Edwin Buck · Accepted Answer · 2017-07-25T15:32:22

This code is constructing the value, one byte at a time.

First it captures the lowest byte

 (*ptr) & 0xFF

And then shifts it to the highest byte

 ((*ptr) & 0xFF) << 24

And then assigns it to the previously 0 initialized value.

 value =((*ptr) & 0xFF) << 24

Now the "magic" comes into play. Since the ptr value was declared as a char* adding one to it advances the pointer by one character.

 (ptr + 1) /* the next character address */
 *(ptr + 1) /* the next character */

After you see that they are using pointer math to update the relative starting address, the rest of the operations are the same as the ones already described, except that to preserve the partially shifted values, they or the values into the existing value variable

 value |= ((*(ptr + 1)) & 0xFF) << 16

Note that pointer math is why you can do things like

 char* ptr = ... some value ...

 while (*ptr != 0) {
     ... do something ...
     ptr++;
 }

but it comes at a price of possibly really messing up your pointer addresses, greatly increasing your risk of a SEGFAULT violation. Some languages saw this as such a problem, that they removed the ability to do pointer math. An almost-pointer that you cannot do pointer math on is typically called a reference.

C/C++ code to convert big endian to little endian

6 Answers