Why is memory alignment needed?

Question

I know that this question has been asked a thousand times and I have read through every single answer and I still don't get it. Probably there is some fundamental error in my model of RAM which makes me unable to comprehend any answers.

I get all these little information thrown at from all around the internet, but I just can't connect them.

Here is what I think to know so far: Take the IA-32 Architecture for example, has a word boundary of 32 bits (boundary = the maximum the CPU can read from the Memory?). It will always read in its word boundary.

1) So, whatever address I give it, it will always read 4 bytes? What if I have a simple char at address x. Will it read 4 bytes from that address and then do something weird to only get the one byte?

2) If that is so, then is a string (a sequence of char) n_chars * 4 Bytes big? I'm pretty sure it isn't that way, but how am I supposed to interpret "will always read its word boundary" then?

3) Memory alignment seems to only come up with Data structures. Why? Is Memory unaligned in the rest of the memory? And I mean for Physical, Virtual, Kernel Space etc?

4) Why can I store a 32 Bit value only at addresses dividable by 4? I mean I get that it will eventually read only 32 bits, but why can it not read 32 bits from an odd address? Like what is the restriction here?

I'm just so confused please help me

You seem to have some misconceptions. It will all be clear once you get an answer! — fuz
It might work better if it is aligned, but the x86 processors will figure it out anyway. Perhaps by loading two parts and stick them together. Or chop them up and do two stores. Which might take longer. — Bo Persson
x86 is very forgiving and will take about anything you throw at it as long as it is not SIMD. Other processors were not, Itanium is notable as one that would generate a runtime error when forced to read from a misaligned address. It is however a violation of the memory model of a language, it matters when you use threading. If a 32-bit int is not aligned by 4 then there are non-zero odds that it crosses a cache line boundary. Which forces the processor to execute multiple bus cycles to glue the bytes together. Another processor can observe that, causing a mishap called tearing. — Hans Passant
1) depends on architecture, modern x86 the single core operates with 64B chunks of RAM (aligned of course), which means that always 64B chunks of memory are read/write to the upper cache from the core. This has "funny" side-effect, if you write concurrent code where several cores have their thread variables in the same 64B block, even if they don't share single byte, the performance will suffer, as the cores will have to sync with every write to their own byte the whole 64B block. These and similar details will often lead to unexpected performance results and lot of head scratching... — Ped7g

fuz fuz · Accepted Answer · 2017-10-28T20:15:09

In modern computers, memory is byte oriented. Each byte has its own address and can be fetched from RAM individually. For the sake of your program, you can assume that fetching a word behaves like fetching the bytes that make it up in an arbitrary order and then assembling them to a word in the register you load to.

Note that this is an abstraction. The memory chips is usually wired up in a way that 8 or more bytes are fetched at once. The CPU has some circuitry to abstract all of this away from the machine code. However, this abstraction is leaky which causes a number of effects:

if a datum is not aligned to its alignment requirement, memory access can take extra cycles because the datum spans more words than necessary. This penalty is avoided by aligning data sufficiently.
When fetching or writing an aligned datum, this translate into a single fetch or store in the hardware. Such a fetch or store is atomic which is an important property in concurrent code. When fetching or writing unaligned data, more than one fetch or store is needed and the operation is no longer atomic.
Some CPUs do not support reading/writing unaligned memory at all as this simplifies circuit design. This restriction has becomen increasingly rare in contemporary hardware.

So now, for your questions:

1) So, whatever address I give it, it will always read 4 bytes? What if I have a simple char at address x. Will it read 4 bytes from that address and then do something weird to only get the one byte?

Maybe. This depends on the hardware you use. But yes, you are going to get only one byte if you requested one byte. You shouldn't be concerned with how many bytes the hardware reads to give you that one byte.

2) If that is so, then is a string (a sequence of char) n_chars * 4 Bytes big? I'm pretty sure it isn't that way, but how am I supposed to interpret "will always read its word boundary" then?

A string is normally n_chars bytes big. When you read one char from the string, you get one byte. The hardware might read more bytes to fulfill your request but that's not something you need to care about. Note that Windows some times uses UTF-16 strings which occupy two bytes per character, but this trend hasn't really caught on.

3) Memory alignment seems to only come up with Data structures. Why? Is Memory unaligned in the rest of the memory? And I mean for Physical, Virtual, Kernel Space etc?

Memory alignment matters whenever you consider data in RAM. It doesn't matter if that memory is used inside the kernel or your user process. The MMU generally maps memory in a way that preserves alignment so it doesn't matter if you use physical or virtual memory. Data on disk doesn't have these alignment requirements but other performance characteristics might apply due to the sector size of the storage you use.

4) Why can I store a 32 Bit value only at addresses dividable by 4? I mean I get that it will eventually read only 32 bits, but why can it not read 32 bits from an odd address? Like what is the restriction here?

If you read 32 bits from an odd address, one of the following things happens depending on your CPU and operating system:

It just works
It works but is a little bit slower
The CPU silently ignores the low 2 bits and reads from the corresponding aligned address instead (this is rare nowadays)
The CPU throws an exception which crashes your program if you don't handle it
The CPU throws an exception which the operating system catches to emulate the memory access for you.

You generally shouldn't assume which of these happens. Never write code that reads unaligned data. If you need to read unaligned data, consider reading each byte on its own and then manually reassemble the bytes into the datum you want.

Why is memory alignment needed?

1 Answers