I understand data structure alignment restrictions exist to optimize memory accesses because modern CPUs fetch memory in word-sized (or multiples of word-size) chunks. This would make me think that the that most optimal way to align data is to (fixed) word boundaries.
For example, consider the following structs on a 32-bit machine (compiled with gcc v6.2.0; CFLAGS: -Wall -g -std=c99 -pedantic):
struct layoutA {
char a; /* start: 0; end: 1; padding: 3 */
uint32 b; /* start: 4; end: 8; padding: 0 */
uint64 c; /* start: 8; end: 16; padding: 0 */
};
/* sizeof(struct layoutA) = 16 */
struct layoutB {
uint32 b; /* start: 0; end: 4; padding: 4 */
uint64 c; /* start: 8; end: 16; padding: 0 */
char a; /* start: 16; end: 0; padding: 3 */
};
/* sizeof(struct layoutB) = 24 */
Due to the self-alignment restriction, c forces the second struct to align itself to the 8-byte boundary instead of the word boundary (4-byte).
How does this reconcile with the original reason for alignment - memory optimization. It would appear that placing c at 4 should also help the CPU read it in 2 accesses (similar to the current case where it needs to access 2 words (at 8 and 12) to get the entire doubleword.
How does self-alignment optimize memory access? In other words, what benefit do we gain in the second case to justify the losing the space due to self-alignment?