3
votes

I don't understand why the compiler aligns int on 4 byte boundaries, short on 2 byte boundaries and char on 1 byte boundaries. I understand that the if the data bus width of the processor is 4 bytes, it takes 2 memory read cycles for reading an int from an address not a multiple of 4.
So, why doesn't the compiler align all data on 4 byte boundaries? For eg.:

struct s {
 char c;
 short s;
};

Here, 1) why does the compiler align short on a 2 byte boundary? Assuming that the processor can fetch 4 bytes on a single memory read cycle, wouldn't it take only 1 memory read cycle to read short in the above case even if there is no padding between char and short?

2) Why doesn't the compiler align short on a 4 byte boundary?

4
possible duplicate of Purpose of memory alignmentuser694733
the purpose of structure padding for alignment is to fetch the data in one machine read. In your case, the struct will be 4 and not 8. You can still fetch the char OR short in one cycle bu using masking. So while fetching the char the processor will fetch 4 bytes and mask out 24 bits.<br> However, if you had something like this:<br> struct s { char c; int i}; then the size would get 8 byte coz you need full 4 bytes for the integer to be fetched in read cycle.Nikhil Vidhani
@NikhilVidhani: My question is not regarding the purpose of padding. My question is about why the byte is padded between char and short and not after short. Assuming the processor can fetch 4 bytes in a single cycle, no matter where the padding happens, the short can be fetched in 1 cycle, right? So, what's the savings that we get in the above case? I guess there is some hardware level explanation for this.linuxfreak
@linuxfreak going by my instincts... i think it is easier to fetch (mask) last 16 bits than the bits 9-24 if short were to occupy byte 2 and 3.Nikhil Vidhani
@NikhilVidhani - Yeah.. I think so. To fetch the bits 9-24, the processor has to do bit shifting in addition to masking.linuxfreak

4 Answers

4
votes

These objects have to fit in arrays. An array is contiguous. Thus, if the first element is N byte aligned, and all objects are N bytes big, then necessarily all objects in the array are N byte aligned too.

So, if short would be 2 bytes big, but 4 bytes aligned, there would be 2 byte holes between all shorts in an array which is forbidden.

You do see that your assumption is slightly flawed. I could make a struct with 26 chars, and it wouldn't be 26 byte aligned. It could start anywhere. An N byte type with have an alignment equal to N or dividing N.

2
votes

First, your premise is incorrect. Every object is aligned at some fundamental alignment. For some scalar objects, the alignment may be the same as the data size of the object, but it might also be smaller or larger. For example, a classic 32-bit architecture (I'm thinking of i386 here) might include both 8-byte doubles and 10-byte long doubles, both with 4-byte alignment. Note that I said data size above; do not confuse this with sizeof.

The actual size of an object may be larger than the data size, because the size of an object must be a multiple of the object's alignment. The reason is that an object's alignment is always the same, regardless of context. In other words, the alignment of an object only depends on the type of the object.

Consequently, in the structures:

struct example1 {
  type1 a;
  type2 b;
};

struct example2 {
  type2 b;
  type1 a;
};

the alignment of both b's is the same. In order to be able to guarantee this alignment, it is necessary that the alignment of a composite type must be the maximum of the alignments of the member types. That means that struct example1 and struct example2 above have the same alignment.

The requirement that the alignment of an object be dependent only on its type implies that the size of a type must be a multiple of its alignment. (Any type can be the element type of an array, including an array of only one element. The size of the array is the product of the size of the element and the number of elements. So any padding necessary must be part of the size of the element.)

In general, rearranging members in a composite type might change the composite type's size but it cannot change the composite type's alignment. For example, both of the following structs have the same alignment -- which is the alignment of a double -- but the first one is almost certainly smaller:

struct compact {
  double d;   // Must be at offset 0
  char   c1;  // Will be at offset sizeof(double)
  char   c2;  // Will be at offset sizeof(double)+sizeof(char).
};

struct bloated {
  char   c1;  // Must be at offset 0
  double d;   // Will be at offset alignof(double)
  char   c2;  // Will be at offset (alignof(double) + sizeof(double))
};
0
votes

I think I found the answer to my question. There might be two reasons for why the byte is padded between char and short and not after short.

1) Some architectures might have 2 byte instructions that fetch only 2 bytes from the memory. If such is the case, 2 memory read cycles are required to fetch the short.

2) Some architecture might not have 2 byte instructions. Even in that case, the processor fetches the 4 bytes from memory to the register and masks the unrequired bytes to get the short value. If the byte is not padded between char and short, the processor has to shift the bytes in order to get the short value.

Both the above cases might result in slower performance. That is why the byte short is 2 byte aligned.

0
votes

The compiler align data as prescribed by the target processor (micro-)architecture and the ABI. Look for instance into the x86-64 ABI spec as an example.

If your compiler aligned differently than what some ABI specifies, you won't be able to call functions from libraries respecting that ABI!

In your example, if (on x86-64) the short field s was not aligned on 2 bytes, the processor would have to work more (perhaps issuing two accesses) to get that field.

Also, on many x86-64 chips, the cache line is often some multiple of 16 (or maybe less) bytes. So it makes sense to align a call stack frame on 16 bytes. And this is needed for vector-like local variables (AVX, SSE3, etc...)

On some processors, having badly aligned data would either give a fault (e.g. interrupt for a machine exception) or slow down the processing significantly. In addition, it could render some accesses non-atomic (for multi-core processing). Hence some ABIs prescribe more ABI that what is strictly necessary. Also, some recent features of CPUs (like vectorization, e.g. thru SIMD instructions like AVX or SSE3) do benefit from very aligned data (e.g. alignment to 16 bytes). Your compiler might optimize more -to use such instructions- if it know about such a strong alignment.