3
votes

I've heard that in x86 processors, bytes are stored in the memory in little-endian byte order.

Meaning that the least significant byte gets stored first.

I'm having trouble grasping the idea and its relationship with how bytes get stored in RAM.

For example,

#include <stdio.h>

char string[6];
scanf("%5s",string);

In the code above, if I input the word "Hello", "o" gets stored first(?)

From what I understand, in C (and in programming general?) when you declare a variable, the variable gets stored in the Stack portion of RAM. So the word "Hello" gets stored in the stack like this:


o    <Lower memory addresses>
l
l
e
H    <Higher memory addresses>

The stack grows from higher memory addresses towards the lower, and the processor starts reading the bytes starting from the first byte at the top of the stack (lower memory addresses).

Now if I print the value of the string, I should see "olleH".

But obviously it prints "Hello" instead.

Is this because of the little-endian byte order?

3
You are dealing with bytes here, so endianness doesn't change anything here.Jabberwocky
String has no endianness.user202729
string is not copied backwards eitherJean-François Fabre♦
(side note: You don't need the (in C) in the title, tagging your question with c automatically do that)user202729
Just lookup "endian". I am sure Wikipedia has a good description.Paul Ogilvie

3 Answers

7
votes

For simplicity, let’s discuss a machine in which each byte in memory has an address. (There are machines where memory is organized only as words with several bytes, not individual bytes.) In this machine, memory is like a big array, so we can write memory[37] to talk about the byte at address 37.

How Characters Are Stored

To store characters, we simply put them at successive memory locations, in order. For example, to store the characters “Hello” starting at address 100, we put H at memory[100], e at memory[101], l at memory[102], l at memory[103], and o at memory[104]. In some languages, we also put a zero value at memory[105] to mark the end of the string.

There is no endian issue here. Characters are in order.

How Integers Are Stored

Consider an integer like 5678. This integer will not fit into one eight-bit byte. In binary, it is 10110 00101110 (space for readability). That requires at least two bytes to store, one byte containing 10110, and one byte containing 00101110.

When we store it in memory starting at location 100, which byte do we put first? This is the endian issue. Some machines put the high-value byte (10110) in memory[100] and the low-value byte (00101110) in memory[101]. Other machines do it in the other order. The high-value byte is the “big end” of the number, and the low-value byte is the “little end,” leading to the term “endianness.” (The term actually comes from Jonathan Swift’s Gulliver’s Travels.)

(This example uses only two bytes. Integers can also use four bytes, or more.)

The endian issue arises whenever you have one object made out of smaller objects. This is why it is not a problem with individual characters—each character goes into one byte. (Although, there is no physical reason you could not store strings in reverse order in memory. We just do not.) It is a problem when an object has two or more bytes. You simply have to choose the order in which you put the bytes into memory.

How Stacks are Organized

Common implementations of stacks start at a high address and “grow” downward when adding things to the stack. There is no particular reason for this; we can make stacks work the other way too. It is just how things developed historically.

Stack growth largely occurs in chunks. When a function is called, it adds some space to the stack to make room for its local data. So it decreases the stack pointer by some amount and then uses that space.

However, within that space, individual objects are stored normally. They do not need to be reversed because the stack grows down. If the stack pointer changed from 2400 to 2200, and we now want to put an object at 2300, we just write its bytes to memory starting at 2300.

So, endianness is not an issue affected by stack order.

4
votes

In there you will see no endianness issue with char arrays. The sequence will not be changed by the endianness. Suppose it is an int then yes the bytes of that object will be stored in memory respecting the endianness. But for char arrays nothing but collection of bytes arranged in certain order. It is not changed.

Notice if you have array of int-s then yes those int elements are stored sequentially respecting the order you have specified. But the int values, those bytes of it - will be stored in little endian.

Another thing is - each element of the array of char will have increasing memory. For example, address of string[0] will be lesser than address of string[1] - so here you can't apply endianness - because then this constraint would be violated.

0
votes

Wikipedia gives a good description of endianness, but I couldn't find the real origin.

The reason for having "endianness", in particular little endinaness as big endianness is the more natural form, is the question of how the processor moves bytes from memory into and out of its registers given a data bus that is smaller than the native register width.

For example, having a data bus of 8 bits (so only 1 byte at a time can be moved from memory to the processor and vv) and having an integer width of 16 bits, then: is the first byte moved from memory to the register the most significant byte or the least significant byte?