8
votes

I have some confusion on little endian/big endian. Seems I am missing smth simple. Some feedback appreciated. For example, say we have two functions which retrieve least and most significant bytes of
32bit value:

#define LSB(x) ((x) & 0x000000FF)

#define MSB(x) ((x) & 0xFF000000)

My question is: do above two functions return correct result both on big endian and little endian machines?

Now I will explain why I have the confusion. Imagine we are on a little endian machine. On a little endian machine integer 9 is stored in memory like this (in hex): 09 00 00 00 (least significant byte first) Now at some point, you might think, if we use above LSB function, then we would end up with such expression: 09 00 00 00 & 00 00 00 FF which is 0 - but of course that's not how above LSB function will work eventually. So it seems I am missing smth. Any help appreciated.

Also if I say int y = 0x000000FF - this is 255 regardless of the endiannes of the machine right?

6
OT: Shouldn't it be #define MSB(x) (((x) & 0xFF000000) >> 24) or just #define MSB(x) ((x) >> 24) (assuming a 32bit value is passed)?alk
You might want MSB(x) = ((x) >> 24), otherwise code like if (MSB(x) == 0xFF) ... won't work.japreiss
ok I will look into that but for now I was not particularly concerned with best implementation of LSB and MSB functionsuser2793162

6 Answers

12
votes

Regardless of endianness, x & 0xFF will give you the least significant byte.

First of all, you should understand the difference between endianness and significance. Endianness means in what order the bytes are written to memory; it's completely irrelevant to any computation in the CPU. Significance says which bits have a higher value; it's completely irrelevant to any system of storage.

Once you load a value from memory into CPU, it's endianness doesn't matter, since to the CPU (more accurately, ALU) all that matters is the significance of the bits.

So, as far as C is concerned, 0x000000FF has 1s in its least significant byte and anding it with a variable would give its least significant byte.


In fact, in the whole C standard, you can't find the word "endian". C defines an "abstract machine" where only the significance of the bits matter. It's the responsibility of the compiler to compile the program in such a way that it behaves the same as the abstract machine, regardless of endianness. So unless you are expecting a certain layout of memory (for example through a union or a cast of pointers), you don't need to think about endianness at all.


Another example that might interest you is shifting. The same thing applies to shifting. In fact, like I said before, endianness doesn't matter to the ALU, so << always translates to shift towards more significant bits by not even the compiler, but the CPU itself, regardless of endianness.


Let me put these in a graph with two orthogonal directions so maybe you understand it better. This is how a load operation looks like from the CPU's point of view.

On a little-endian machine you have:

         MEMORY            CPU Register

  LSB BYTE2 BYTE3 MSB  ---->   MSB
    \    \     \----------->  BYTE3
     \    \---------------->  BYTE2
      \-------------------->   LSB

On a big endian machine you have:

         MEMORY            CPU Register

      /-------------------->   MSB
     /    /---------------->  BYTE3
    /    /     /----------->  BYTE2
  MSB BYTE3 BYTE2 LSB  ---->   LSB

As you can see, in both cases, you have:

CPU Register

    MSB
   BYTE3
   BYTE2
    LSB

which means in both cases, the CPU ended up loading the exact same value.

3
votes

0x000000FF is always 255, regardless of endianness. It is stored as FF 00 00 00 on little endian machines, so LSB(9) will continue to work.

1
votes

Yes, these work correctly regardless of endianess.

Both the number you use as the mask and the number you give these as input have the same endianess, so they give the same result either way.

Endianess becomes an issue primarily when you have (for example) an integer you've received over a network connection as an array of chars. In such a case, you have to put those chars back together in the right order to get the original value.

1
votes

My question is: do above two functions return correct result both on big endian and little endian machines?

Yes, they do. The problem comes when you want to form a scalar from a multi-byte array which is not what you are doing.

0
votes

As long as you treat the integer value as a single entity and not as a sequence of raw bytes (in memory, on the wire etc), the issue of endianness will not feature in your code.

Thus, 0x000000FF is always 255 and your LSB and MSB macros are correct.

0
votes

Endian is about how memory is used. You primarily have to worry about it when serializing or deserializing bytes to memory, storage or a stream of some kind.

I believe your macros will sometimes work and sometimes not work as expected depending on how you use them. If x is an int (assuming you are using 32 bit ints) then you should be fine since the compiler knows what an int is and how it is represented when x is not a 32bit number you could run into problems.