Working with 2 byte int
s, this is what you have in memory
memAddr | 0 | 1 | 2 | 3 | 4 | 5 | 6 |
data | 'A' | 'B' | 'C' | 'D' | 'E' | 'F' | '\0' |
^ s points here
^ p+1 points here
Now, it looks like you're using ASCII encoding, so this is what you really have in memory
memAddr | 0 | 1 | 2 | 3 | 4 | 5 | 6 |
data | 0x41 | 0x42 | 0x43 | 0x44 | 0x45 | 0x46 | 0x00 |
^ s points here
^ p+1 points here
So for a little endian machine, that means the least significant bytes for a multi-byte type come first. There's no concept of endianess for a single byte char
. An ASCII string is just a string of char
s.. this has no endianess. Your int
s are 2 bytes. So for an int
starting at memory location 2, this byte is the least significant, and the one at address 3 is the most significant. This means the number here, read the way people generally read numbers, is 0x4443 (17475 in base 10, "DC" as an ASCII string), since 0x44 in memory location 3 is more significant than 0x43 in memory location 2. For big endian, of course, this would be reversed, and the number would be 0x4344 (17220 in base 10, "CD" as an ASCII string).
EDIT:
Addressing your comment... A c
string is a NUL
terminated array of char
s, that's absolutely correct. Endianess only applies to the primitive types, short, int, long, long long
, etc. ("primitive types" may be incorrect nomenclature, someone who knows can correct me). An array is simply a section of contiguous memory where 1 or more types occur directly next to each other, stored sequentially. There is no concept of endianess for the entire array, however, endianess does apply to the primitive types of the individual elements of the array. Let's say you have the following, assume 2 byte int
s:
int array[3]; // with 2 byte ints, this occupies 6 contiguous bytes in memory
array[0] = 0x1234;
array[1] = 0x5678;
array[2] = 0x9abc;
This is what memory looks like: It will look like this no matter for a big or little endian machine
memAddr | 0-1 | 2-3 | 4-5 |
data | array[0] | array[1] | array[2] |
Notice there is no concept of endianess for the array elements. This is true no matter what the elements are. The elements could be primitive types, structs
,, anything. The first element in the array is always at array[0]
.
But now, if we look at the what's actually in the array, this is where endianess does come into play. For a little endian machine, memory will look like this:
memAddr | 0 | 1 | 2 | 3 | 4 | 5 |
data | 0x34 | 0x12 | 0x78 | 0x56 | 0xbc | 0x9a |
^______^ ^______^ ^______^
array[0] array[1] array[2]
The least significant bytes are first. A big endian machine would look like this:
memAddr | 0 | 1 | 2 | 3 | 4 | 5 |
data | 0x12 | 0x34 | 0x56 | 0x78 | 0x9a | 0xbc |
^______^ ^______^ ^______^
array[0] array[1] array[2]
Notice the contents of each element of the array is subject to endianess (because it's an array of primitive types.. if it was an array of structs
, the struct
members wouldn't subject to some kind of endianess reversal,, endianess only applies to primitives). However, whether on the big or little endian machine, the array elements are still in the same order.
Getting back to your string, a string is simply a NUL
terminated array of characters. char
s are single bytes, so there's only one way to order them. Consider the code:
char word[] = "hey";
This is what you have in memory:
memAddr | 0 | 1 | 2 | 3 |
data | word[0] | word[1] | word[2] | word[3] |
equals NUL terminator '\0' ^
Just in this case, each element of the word
array is a single byte, and there's only one way to order a single item, so whether on a little or big endian machine, this is what you'll have in memory:
memAddr | 0 | 1 | 2 | 3 |
data | 0x68 | 0x65 | 0x79 | 0x00 |
Endianess only applies to multi-byte primitive types. I highly recommend poking around in a debugger to see this in live action. All the popular IDEs have memory view windows, or with gdb
you can print out memory. In gdb
you can print memory as bytes, halfwords (2 bytes), words (4 bytes), giant words (8 bytes), etc. On a little endian machine, if you print out your string as bytes, you'll see the letters in order. Print out as halfwords, you'll see every 2 letters "reversed", print out as words, every 4 letters "reversed", etc. On a big endian machine, it would all print out in the same "readable" order.
17989
(hex:0x4645
), which seems perfectly normal to me. – jwodders
points to 6 bytes. you equatep
tos
, but when you print you dop+1
. Assuming you have 4 byteint
s, this will pointp
to'E'
. The next byte is'F'
, and then the next 2 bytes are beyond your allocated space. But that aside, looks good to me, my little endian printout is 0x25004645. 0x45 is'E'
, 0x46 is'F'
, and 0x00 and 0x25 are no man's land. – yanoint
's (short
s on my machine), forget the UB, still looks good. My printout now is 0x4443, where 0x43 is'C'
and 0x44 is 'D'
. I suspect you're confusing string ASCII characters with hex values? Each character in your string corresponds to a byte, which can be represented with a 2 digit hex number. Use theprintf
format specifier"%x"
to print in hex. 17475 is indeed 0x4443, which is what is expected from a little endian machine. – yano"ABCDEF"
and0xABCDEF
are very different... – Breaking not so bad