Big Endian and Little endian little confusion

Question

I was reading about little and big endian representations from this site http://www.geeksforgeeks.org/little-and-big-endian-mystery/.

Suppose we have a number 0x01234567, then in little endian it is stored as (67)(45)(23)(01) and in Big endian it is stored as (01)(23)(45)(67).

char *s= "ABCDEF"
int *p = (int *)s;
printf("%d",*(p+1)); // prints 17475 (value of DC)

After seeing the printed value here in the above code, it seems that string is stored as (BA)(DC)(FE).

Why is it not stored like (EF)(CD)(AB) from LSB to MSB as in first example? I thought that endianess means ordering of bytes within multi-bytes. So the ordering should be with respect to "whole 2 bytes" as in 2nd case and not within those 2 bytes right?

"After seeing the printed value here in the above code," — What printed value? On my little-endian machine, the given code prints 17989 (hex: 0x4645), which seems perfectly normal to me. — jwodder
this looks like UB to me. s points to 6 bytes. you equate p to s, but when you print you do p+1. Assuming you have 4 byte ints, this will point p to 'E'. The next byte is 'F', and then the next 2 bytes are beyond your allocated space. But that aside, looks good to me, my little endian printout is 0x25004645. 0x45 is 'E', 0x46 is 'F', and 0x00 and 0x25 are no man's land. — yano
@yano, My compiler considers CD as "DC". I have 2 byte ints. See my edit. — Sagar P
ok,, with 2 byte int's (shorts on my machine), forget the UB, still looks good. My printout now is 0x4443, where 0x43 is 'C' and 0x44 is 'D'. I suspect you're confusing string ASCII characters with hex values? Each character in your string corresponds to a byte, which can be represented with a 2 digit hex number. Use the printf format specifier "%x" to print in hex. 17475 is indeed 0x4443, which is what is expected from a little endian machine. — yano

yano yano · Accepted Answer · 2017-12-01T05:27:51

Working with 2 byte ints, this is what you have in memory

memAddr  |  0  |  1  |  2  |  3  |  4  |  5  |  6   |
data     | 'A' | 'B' | 'C' | 'D' | 'E' | 'F' | '\0' |
            ^ s points here
                        ^ p+1 points here

Now, it looks like you're using ASCII encoding, so this is what you really have in memory

memAddr  |  0   |  1   |  2   |  3   |  4   |  5   |  6   |
data     | 0x41 | 0x42 | 0x43 | 0x44 | 0x45 | 0x46 | 0x00 |
            ^ s points here
                          ^ p+1 points here

So for a little endian machine, that means the least significant bytes for a multi-byte type come first. There's no concept of endianess for a single byte char. An ASCII string is just a string of chars.. this has no endianess. Your ints are 2 bytes. So for an int starting at memory location 2, this byte is the least significant, and the one at address 3 is the most significant. This means the number here, read the way people generally read numbers, is 0x4443 (17475 in base 10, "DC" as an ASCII string), since 0x44 in memory location 3 is more significant than 0x43 in memory location 2. For big endian, of course, this would be reversed, and the number would be 0x4344 (17220 in base 10, "CD" as an ASCII string).

EDIT:

Addressing your comment... A c string is a NUL terminated array of chars, that's absolutely correct. Endianess only applies to the primitive types, short, int, long, long long, etc. ("primitive types" may be incorrect nomenclature, someone who knows can correct me). An array is simply a section of contiguous memory where 1 or more types occur directly next to each other, stored sequentially. There is no concept of endianess for the entire array, however, endianess does apply to the primitive types of the individual elements of the array. Let's say you have the following, assume 2 byte ints:

int array[3];  // with 2 byte ints, this occupies 6 contiguous bytes in memory
array[0] = 0x1234;
array[1] = 0x5678;
array[2] = 0x9abc;

This is what memory looks like: It will look like this no matter for a big or little endian machine

memAddr   |    0-1   |    2-3   |    4-5   |
data      | array[0] | array[1] | array[2] |

Notice there is no concept of endianess for the array elements. This is true no matter what the elements are. The elements could be primitive types, structs,, anything. The first element in the array is always at array[0].

But now, if we look at the what's actually in the array, this is where endianess does come into play. For a little endian machine, memory will look like this:

memAddr   |  0   |  1   |  2   |  3   |  4   |  5   |
data      | 0x34 | 0x12 | 0x78 | 0x56 | 0xbc | 0x9a |
             ^______^      ^______^      ^______^
             array[0]      array[1]      array[2]

The least significant bytes are first. A big endian machine would look like this:

memAddr   |  0   |  1   |  2   |  3   |  4   |  5   |
data      | 0x12 | 0x34 | 0x56 | 0x78 | 0x9a | 0xbc |
             ^______^      ^______^      ^______^
             array[0]      array[1]      array[2]

Notice the contents of each element of the array is subject to endianess (because it's an array of primitive types.. if it was an array of structs, the struct members wouldn't subject to some kind of endianess reversal,, endianess only applies to primitives). However, whether on the big or little endian machine, the array elements are still in the same order.

Getting back to your string, a string is simply a NUL terminated array of characters. chars are single bytes, so there's only one way to order them. Consider the code:

char word[] = "hey";

This is what you have in memory:

memAddr   |    0    |    1    |    2    |    3    |
data      | word[0] | word[1] | word[2] | word[3] |
                  equals NUL terminator '\0' ^

Just in this case, each element of the word array is a single byte, and there's only one way to order a single item, so whether on a little or big endian machine, this is what you'll have in memory:

memAddr   |  0   |  1   |  2   |  3   |
data      | 0x68 | 0x65 | 0x79 | 0x00 |

Endianess only applies to multi-byte primitive types. I highly recommend poking around in a debugger to see this in live action. All the popular IDEs have memory view windows, or with gdb you can print out memory. In gdb you can print memory as bytes, halfwords (2 bytes), words (4 bytes), giant words (8 bytes), etc. On a little endian machine, if you print out your string as bytes, you'll see the letters in order. Print out as halfwords, you'll see every 2 letters "reversed", print out as words, every 4 letters "reversed", etc. On a big endian machine, it would all print out in the same "readable" order.

Big Endian and Little endian little confusion

4 Answers