2
votes

I was learning about structure padding, and read that the reason behind structure padding is that if the members of the struct are not aligned, the processor won't be able to read/write them in only one cycle. In general, the location of a data type that consists of N bytes should be at an address which is a multiple of N.

Suppose this struct for example:

struct X
{
    char c;
    // 3 bytes padding here so that i is aligned.
    int i;
};

Here the size of this struct should be 8 bytes, c is aligned by default because it only occupies 1 byte, but i is not. For i, we need to add 3 bytes of padding before it so that it's "aligned" and can be accessed at only one cycle. Tell me if I'm missing something.

1 - How does alignment work? What do the members get aligned to?

2 - What's better for the CPU to access an N bytes data type located at an address that's a multiple of N? Why for example, in the struct above, if i is located at address XXX3 (ending in 3, in other words, not a multiple of 4), why not read a word starting from address XXX3? Why does it have to be a multiple of 4? Do most CPUs access addresses that are only multiples of the word size? I believe that CPUs can read a word from the memory starting at any byte. Am I wrong?

3 - Why doesn't the compiler reorder the members so that it takes as much space as possible? Does the ordering matter? I'm not sure if anybody uses actual offset numbers to access members. Meaning that if there is a struct X x, usually members are accessed like this: x.i not *(&x + 4). In the latter case, ordering would actually matter, but in the first case (which I believe everybody uses), the ordering shouldn't matter. I have to note that in this example, It doesn't matter since if i came before c also, there will be a 3 bytes padding at the end. I'm asking generally why?

4 - I've read that this is not important anymore and that CPUs now can usually access non-aligned members taking the same time as aligned ones. Is that true? If yes then why?

finally, if there is a good place to learn more, I would be thankful.

3
Alignment is a platform architecture constraint. Misaligned data access can be expensive (up to x16 as expensive in performance) as aligned access on some architectures, and can thwart atomic read/write (only relevant for multithreaded applications), or be unsupported entirely (causing a process fault). Other architectures can handle them without issue, and other architectures can handle them but at a performance penalty (so the compiler errs on the side of performance). - Eljay
@ssd As mentioned in question 2, if I have a double for example at address XXX3, why not read a whole 8 bytes starting from the address XXX3? Can't the CPU just access any location in the memory? Why should the address be a multiple of 8 in this example? - StackExchange123
@StackExchange123 : Yes, CPU can access that piece of memory and you can achieve this by writing your own assembly code. Compilers are just optimized to read in chunks. - ssd
@StackExchange123 : I've googled and found that some CPU's (arm, for example) have compiler directives (-munaligned-access) that you can turn off this aligned access thing. - ssd
Read this article. - Shubham

3 Answers

3
votes
  1. They get aligned to at least _Alignof(type). In principle an implementation is allowed to align further, but this is generally undesirable and no major implementation does.

  2. As noted in a comment by Eljay (emphasis mine):

    Alignment is a platform architecture constraint. Misaligned data access can be expensive (up to x16 as expensive in performance) as aligned access on some architectures, and can thwart atomic read/write (only relevant for multithreaded applications), or be unsupported entirely (causing a process fault). Other architectures can handle them without issue, and other architectures can handle them but at a performance penalty (so the compiler errs on the side of performance).

    The language standard is written to allow for such platform constraints.

  3. It's not allowed to, at least not if the address of the structure is taken in a way that makes the representation visible to the application. The language specification requires members to be in order. This is 6.7.2.1 Structure and union specifiers, ΒΆ15:

    Within a structure object, the non-bit-field members and the units in which bit-fields reside have addresses that increase in the order in which they are declared. A pointer to a structure object, suitably converted, points to its initial member (or if that member is a bit-field, then to the unit in which it resides), and vice versa. There may be unnamed padding within a structure object, but not at its beginning.

  4. No, it's not true. High-end cpus generally patch up misaligned access transparently, to allow certain types of sloppy code as well as operations that are necessarily misaligned (like memcpy or memmove through buffers with differing alignments), but that does not change the fact that these operations tend to be more expensive and that they're not available for some things like atomic operations.

3
votes

1 How does alignment work?

Memory for objects is allocated in such memory locations where the alignment requirement of the type is satisfied. That is: For an alignment requirement of N, the address of the memory location will be divisible by N.

1 What do the members get aligned to?

Objects are aligned to whatever is the alignment of the type of that object on the target system. This is the same for all objects, including member objects.

2 Do most CPUs access addresses that are only multiples of the word size?

Some CPUs do indeed only access addresses that are aligned.

2 I believe that CPUs can read a word from the memory starting at any byte. Am I wrong?

In case of some CPU, you are not wrong. You would be wrong to believe this to apply to all CPU.

2 - What's better for the CPU to access an N bytes data type located at an address that's a multiple of N?

On such CPU as mentioned above, reading address that isn't a multiple of N (i.e. aligned) will result in a segfault. A segfault will cause the process to terminate. It is better for the process to not terminate until after it has finished whatever it was supposed to do.

On some other CPU, accessing memory from aligned address can be faster. Faster is better.

Probably on all CPU, accessing misaligned memory will not be an atomic operation. Whether this is better or irrelevant depends on what you're doing.

3 - Why doesn't the compiler reorder the members so that it takes as much space as possible? I'm not sure if anybody uses actual offset numbers to access members.

Because the language guarantees the order of the member, a programmer can rely on that guarantee whether you think anyone would do so or not. There are some rare use cases for relying on it.

However, the guarantees for the programmers are not necessarily the only issue with arbitrary order of members. Another aspect is compatibility of libraries across separate compilers. All involved compilers have to agree on what the order of the members is. The specified order is the order of declaration.

4 - I've read that this is not important anymore and that CPUs now can usually access non-aligned members taking the same time as aligned ones. Is that true? If yes then why?

That is an overly generalised statement. It may be true for some CPU, for some use cases. I recommend to not assume such general statement to be a universal truth.

If we were to assume this to be true for a specific CPU, a reason for this could be that such new CPU can access unaligned memory, while the earlier CPU couldn't (such as the old ARMv4).

In another case where earlier CPU could perhaps read and write unaligned, but such operations could have been slower. If on a newer CPU the operations had equivalent speed, then alignment could become unimportant.

Older CPU's are still used and have not disappeared.

3
votes

C and C++ are not compatible tags. Choose one.

It requires less logic for a processor to access a naturally aligned object than an unaligned object.

That might seem like a 1970s response, but to fast forward a bit, imagine loading a 4 byte quantity from address 0x1ffffff.

What does the cpu do, exactly? Ask the memory system for the byte at 0x1ffffff, then the long at 0x2000000, then shift and mask them into a register?

That doesn't sound too bad, until you realize that it required two separate memory transactions to fulfill this. That is bad. Another CPU could have rewritten part of this in the intervening operation, so our load is invalid.

Extending the bus-lock-protocol to handling multiple transactions is likely a non-starter: it was a lot of work to get the bus protocols to work as is.

In practice, modern systems used cache-aligned accesses, so provided your unaligned access resides within a cache line, it is probably OK, but once it doesn't, you are at the mercy of unspecified bus controllers, etc...