confusion on little endian big endian

Question

I have some confusion on little endian/big endian. Seems I am missing smth simple. Some feedback appreciated. For example, say we have two functions which retrieve least and most significant bytes of
32bit value:

#define LSB(x) ((x) & 0x000000FF)

#define MSB(x) ((x) & 0xFF000000)

My question is: do above two functions return correct result both on big endian and little endian machines?

Now I will explain why I have the confusion. Imagine we are on a little endian machine. On a little endian machine integer 9 is stored in memory like this (in hex): 09 00 00 00 (least significant byte first) Now at some point, you might think, if we use above LSB function, then we would end up with such expression: 09 00 00 00 & 00 00 00 FF which is 0 - but of course that's not how above LSB function will work eventually. So it seems I am missing smth. Any help appreciated.

Also if I say int y = 0x000000FF - this is 255 regardless of the endiannes of the machine right?

OT: Shouldn't it be #define MSB(x) (((x) & 0xFF000000) >> 24) or just #define MSB(x) ((x) >> 24) (assuming a 32bit value is passed)? — alk
You might want MSB(x) = ((x) >> 24), otherwise code like if (MSB(x) == 0xFF) ... won't work. — japreiss
ok I will look into that but for now I was not particularly concerned with best implementation of LSB and MSB functions — user2793162

Shahbaz Shahbaz · Accepted Answer · 2013-10-16T16:33:39

Regardless of endianness, x & 0xFF will give you the least significant byte.

First of all, you should understand the difference between endianness and significance. Endianness means in what order the bytes are written to memory; it's completely irrelevant to any computation in the CPU. Significance says which bits have a higher value; it's completely irrelevant to any system of storage.

Once you load a value from memory into CPU, it's endianness doesn't matter, since to the CPU (more accurately, ALU) all that matters is the significance of the bits.

So, as far as C is concerned, 0x000000FF has 1s in its least significant byte and anding it with a variable would give its least significant byte.

In fact, in the whole C standard, you can't find the word "endian". C defines an "abstract machine" where only the significance of the bits matter. It's the responsibility of the compiler to compile the program in such a way that it behaves the same as the abstract machine, regardless of endianness. So unless you are expecting a certain layout of memory (for example through a union or a cast of pointers), you don't need to think about endianness at all.

Another example that might interest you is shifting. The same thing applies to shifting. In fact, like I said before, endianness doesn't matter to the ALU, so << always translates to shift towards more significant bits by not even the compiler, but the CPU itself, regardless of endianness.

Let me put these in a graph with two orthogonal directions so maybe you understand it better. This is how a load operation looks like from the CPU's point of view.

On a little-endian machine you have:

         MEMORY            CPU Register

  LSB BYTE2 BYTE3 MSB  ---->   MSB
    \    \     \----------->  BYTE3
     \    \---------------->  BYTE2
      \-------------------->   LSB

On a big endian machine you have:

         MEMORY            CPU Register

      /-------------------->   MSB
     /    /---------------->  BYTE3
    /    /     /----------->  BYTE2
  MSB BYTE3 BYTE2 LSB  ---->   LSB

As you can see, in both cases, you have:

CPU Register

    MSB
   BYTE3
   BYTE2
    LSB

which means in both cases, the CPU ended up loading the exact same value.

confusion on little endian big endian

6 Answers