8
votes

I will list exactly what I do not understand, and show you the parts I can not understand as well.

First off,

The .Align Directive

  1. .align integer, pad. The .align directive causes the next data generated to be aligned modulo integer bytes

1.~ ? : What is implied with "causes the next data generated to be aligned modulo integer bytes?" I can surmise that the next data generated is a memory-to-register transfer, no? Modulo would imply the remainder of a division. I do not understand "to be aligned modulo integer bytes".......

What would be a remainder of a simple data declaration, and how would the next data generated being aligned by a remainder be useful? If the next data is aligned modulo, that is saying the next generated data, whatever that means exactly, is the remainder of an integer? That makes absolutely no sense.

What specifically would the .align, say, .align 8 directive issued in x86 for a data byte compiled from a C char, i.e., char CHARACTER = 0; be for? Or specifically coded directly with that directive, not preliminary Assembly code after compiling C? I have debugged in Assembly and noticed that any C/C++ data declarations, like chars, ints, floats, etc. will insert the directive .align 8 to each of them, and add other directives like .bss, .zero, .globl, .text, .Letext0, .Ltext0.

What are all of these directives for, or at least my main asking? I have learned a lot of the main x86 Assembly instructions, but never was introduced or pointed at all of these strange directives. How do they affect the opcodes, and are all of them necessary?

4
It just means that the assembler will place the next byte at an address evenly divisible by integer, so if e.g. the last byte was placed at 0x0eda, then ordinarily, the next byte would be placed at 0x0edb, but with an .align 8 directive in place, the next byte would be placed at 0x0ec0, the next address that is evenly divisible by 8 - microtherion
Note that .align is for anything the assembler outputs, such as machine code, not just for what you in C would call "data" - nos

4 Answers

7
votes

As mentioned in the comments, it means the compiler will add enough padding bytes so the next data lands on an "even" position (divisible by the alignment value). This is important because aligned memory access is much faster than unaligned memory access. (Loading a doubleword from 0x10000 is better than loading a doubleword from 0x10001). It might also be useful in case you are interfacing with other components and need to send/receive structs of data with a given padding/alignment.

7
votes

First, note that .align it is not a x86 specific concept, but a GNU GAS directive documented here. It can also be used for other architectures. x86 does not specify directives, only instructions.

Now let's play with it to understand it:

a.S

.byte 1
.align 16
sym: .byte 2

Compile and decompile:

as -o a.o a.S
objdump -Sd a.o

Output:

0000000000000000 <a-0x10>:
   0:   01 0f                   add    %ecx,(%rdi)
   2:   1f                      (bad)  
   3:   44 00 00                add    %r8b,(%rax)
   6:   66 2e 0f 1f 84 00 00    nopw   %cs:0x0(%rax,%rax,1)
   d:   00 00 00 

0000000000000010 <sym>:
  10:   02                      .byte 0x2

So sym was moved to byte 16, the first multiple of 16 after the first .byte 1 we've placed, to align it at 16 bytes.

The bytes used to fill between 01 and 02 are trash chosen by GAS (TODO how?)

Not let's try a different input:

.skip 5
.align 4
sym: .byte 2

Gives:

0000000000000000 <sym-0x8>:
   0:   00 00                   add    %al,(%rax)
   2:   00 00                   add    %al,(%rax)
   4:   00 0f                   add    %cl,(%rdi)
   6:   1f                      (bad)  
    ...

0000000000000008 <sym>:
   8:   02                      .byte 0x2

So this time sym was moved to 8, which is the first multiple of 4 that comes after 5.

3
votes

The main reason for the align directive is to speed up execution. If a call or jmp target is at an odd address, it may need extra bus transfers and/or an advance to the exact byte. The same is for data. In the old 80386 manual there were penalties for certain opcodes, when the target was misaligned.

I found it in the manual (from http://css.csail.mit.edu/6.858/2011/readings/i386.pdf‎) on page 24:

Such misaligned data transfers reduce performance by requiring extra memory
cycles. For maximum performance, data structures (including stacks) should
be designed in such a way that, whenever possible, word operands are aligned
at even addresses and doubleword operands are aligned at addresses evenly
divisible by four. Due to instruction prefetching and queuing within the
CPU, there is no requirement for instructions to be aligned on word or
doubleword boundaries. (However, a slight increase in speed results if the
target addresses of control transfers are evenly divisible by four.)
0
votes

Modulo refers to the modulo operation in arithmetic, ie the % symbol in c, or the "remainder" in other words.

"modulo n" usually implies that the modulus of the expression by n equals 0. If you want to place an address "modulo 4", that means that (address % 4) == 0, which is true for the following examples: 0,4,8,0xC,0x10, etc.

Hardware restrictions require that some data types by aligned by a large integers. For example, some DMA engines might require modulo 64.