1
votes

I've seen several different examples of code that converts big endian to little endian and vice versa, but I've come across a piece of code someone wrote that seems to work, but I'm stumped as to why it does.

Basically, there's a char buffer that, at a certain position, contains a 4-byte int stored as big-endian. The code would extract the integer and store it as native little endian. Here's a brief example:

char test[8] = { 0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07};
char *ptr = test;
int32_t value = 0;
value =  ((*ptr) & 0xFF)       << 24;
value |= ((*(ptr + 1)) & 0xFF) << 16;
value |= ((*(ptr + 2)) & 0xFF) << 8;
value |= (*(ptr + 3)) & 0xFF;
printf("value: %d\n", value);

value: 66051

The above code takes the first four bytes, stores it as little endian, and prints the result. Can anyone explain step by step how this works? I'm confused why ((*ptr) & 0xFF) << X wouldn't just evaluate to 0 for any X >= 8.

6
Because char values are promoted to int before arithmetic is done. Note: should be using unsigned char * and uint32_t.Weather Vane
Your code is independent from endianess, it will print 66051 on little and big endian machines. value is stored in the endianess of the machine, not always in little endian.mch
The & 0xFF is only necessary for signed values, to strip off the extra bits when a negative char value is sign-extended to int. One reason to use unsigned, as well as dubious shifting into the sign bit.Weather Vane
On x86 you can use ntohl.stark
@WeatherVane: Using signed char and signed integers is indeed not beautiful, but does not change anything of the functionality of this swapping procedure.Tom Kuschel

6 Answers

2
votes

This code is constructing the value, one byte at a time.

First it captures the lowest byte

 (*ptr) & 0xFF

And then shifts it to the highest byte

 ((*ptr) & 0xFF) << 24

And then assigns it to the previously 0 initialized value.

 value =((*ptr) & 0xFF) << 24

Now the "magic" comes into play. Since the ptr value was declared as a char* adding one to it advances the pointer by one character.

 (ptr + 1) /* the next character address */
 *(ptr + 1) /* the next character */

After you see that they are using pointer math to update the relative starting address, the rest of the operations are the same as the ones already described, except that to preserve the partially shifted values, they or the values into the existing value variable

 value |= ((*(ptr + 1)) & 0xFF) << 16

Note that pointer math is why you can do things like

 char* ptr = ... some value ...

 while (*ptr != 0) {
     ... do something ...
     ptr++;
 }

but it comes at a price of possibly really messing up your pointer addresses, greatly increasing your risk of a SEGFAULT violation. Some languages saw this as such a problem, that they removed the ability to do pointer math. An almost-pointer that you cannot do pointer math on is typically called a reference.

1
votes

If you want to convert little endian represantion to big endian you can use htonl, htons, ntohl, ntohs. these functions convert values between host and network byte order. Big endian also used in arm based platform. see here: https://linux.die.net/man/3/endian

1
votes

A code you might use is based on the idea that numbers on the network shall be sent in BIG ENDIAN mode.

The functions htonl() and htons() convert 32 bit integer and 16 bit integer in BIG ENDIAN where your system uses LITTLE ENDIAN and they leave the numbers in BIG ENDIAN otherwise.

Here the code:

#include <stdio.h>
#include <stdint.h>
#include <inttypes.h>
#include <arpa/inet.h>

int main(void)
{
    uint32_t x,y;
    uint16_t s,z;

    x=0xFF567890;

    y=htonl(x);

    printf("LE=%08X BE=%08X\n",x,y);

    s=0x7891;

    z=htons(s);

    printf("LE=%04X BE=%04X\n",s,z);

    return 0;

}

This code is written to convert from LE to BE on a LE machine.

You might use the opposite functions ntohl() and ntohs() to convert from BE to LE, these functions convert the integers from BE to LE on the LE machines and don't convert on BE machines.

0
votes

I'm confused why ((*ptr) & 0xFF) << X wouldn't just evaluate to 0 for any X >= 8.

I think you misinterpret the shift functionality.

value = ((*ptr) & 0xFF) << 24;

means a masking of the value at ptr with 0xff (the byte) and afterwards a shift by 24 BITS (not bytes). That is a shift by 24/8 bytes (3 bytes) to the highest byte.

0
votes

One of the keypoints to understanding the evaluation of ((*ptr) & 0xFF) << X

Is Integer Promotion. The Value (*ptr) & 0xff is promoted to an Integer before being shifted.

0
votes

I've written the code below. This code contains two functions swapmem() and swap64().

  • swapmem() swaps the bytes of a memory area of an arbitrary dimension.

  • swap64() swaps the bytes of a 64 bits integer.

At the end of this reply I indicate you an idea to solve your problem with the buffer of byte.

Here the code:

#include <stdio.h>
#include <stdint.h>
#include <inttypes.h>
#include <malloc.h>

void * swapmem(void *x, size_t len, int retnew);
uint64_t swap64(uint64_t k);

/**
    brief swapmem

         This function swaps the byte into a memory buffer.

    param x
         pointer to the buffer to be swapped

    param len
         lenght to the buffer to be swapped

    param retnew
         If this parameter is 1 the buffer is swapped in a new
         buffer. The new buffer shall be deallocated by using
         free() when it's no longer useful.

         If this parameter is 0 the buffer is swapped in its
         memory area.

    return
        The pointer to the memory area where the bytes has been
        swapped or NULL if an error occurs.
*/
void * swapmem(void *x, size_t len, int retnew)
{
    char *b = NULL, app;
    size_t i;

    if (x != NULL) {
        if (retnew) {
            b = malloc(len);
            if (b!=NULL) {
                for(i=0;i<len;i++) {
                    b[i]=*((char *)x+len-1-i);
                }
            }
        } else {
            b=(char *)x;
            for(i=0;i<len/2;i++) {
                app=b[i];
                b[i]=b[len-1-i];
                b[len-1-i]=app;
            }
        }
    }
    return b;
}

uint64_t swap64(uint64_t k)
{
    return ((k << 56) |
            ((k & 0x000000000000FF00) << 40) |
            ((k & 0x0000000000FF0000) << 24) |
            ((k & 0x00000000FF000000) << 8) |
            ((k & 0x000000FF00000000) >> 8) |
            ((k & 0x0000FF0000000000) >> 24)|
            ((k & 0x00FF000000000000) >> 40)|
            (k >> 56)
           );
}

int main(void)
{
    uint32_t x,*y;
    uint16_t s,z;
    uint64_t k,t;

    x=0xFF567890;

    /* Dynamic allocation is used to avoid to change the contents of x */
    y=(uint32_t *)swapmem(&x,sizeof(x),1);
    if (y!=NULL) {
        printf("LE=%08X BE=%08X\n",x,*y);
        free(y);
    }

    /* Dynamic allocation is not used. The contents of z and k will change */
    z=s=0x7891;
    swapmem(&z,sizeof(z),0);
    printf("LE=%04X BE=%04X\n",s,z);

    k=t=0x1120324351657389;
    swapmem(&k,sizeof(k),0);
    printf("LE=%16"PRIX64" BE=%16"PRIX64"\n",t,k);

    /* LE64 to BE64 (or viceversa) using shift */
    k=swap64(t);
    printf("LE=%16"PRIX64" BE=%16"PRIX64"\n",t,k);

    return 0;
}

After the program was compiled I had the curiosity to see the assembly code gcc generated. I discovered that the function swap64 is generated as indicated below.

00000000004007a0 <swap64>:
  4007a0:       48 89 f8                mov    %rdi,%rax
  4007a3:       48 0f c8                bswap  %rax
  4007a6:       c3                      retq

This result is obtained compiling the code, on a PC with Intel I3 CPU, with the gcc options: -Ofast, or -O3, or -O2, or -Os.

You may solve your problem using something like the swap64() function. A function like the following I've named swap32():

uint32_t swap32(uint32_t k)
{
    return ((k << 24) |
            ((k & 0x0000FF00) << 8) |
            ((k & 0x00FF0000) >> 8) |
            (k >> 24)
           );
}

You may use it as:

uint32_t j=swap32(*(uint32_t *)ptr);