9
votes

Intel's official optimization guide has a chapter on converting from MMX commands to SSE where they state the fallowing statment:

Computation instructions which use a memory operand that may not be aligned to a 16-byte boundary must be replaced with an unaligned 128-bit load (MOVDQU) followed by the same computation operation that uses instead register operands.

(chapter 5.8 Converting from 64-bit to 128-bit SIMD Integers, pg. 5-43)

I can't understand what they mean by "may not be aligned to a 16-byte boundary", could you please clarify it and give some examples?

3
When they say "may not be aligned", they mean if the code needs to work correctly when used with unaligned pointers. i.e. you can't assume that inputs are always aligned. (Jakob's answer covers what it means for an address to be aligned).Peter Cordes

3 Answers

15
votes

Certain SIMD instructions, which perform the same instruction on multiple data, require that the memory address of this data is aligned to a certain byte boundary. This effectively means that the address of the memory your data resides in needs to be divisible by the number of bytes required by the instruction.

So in your case the alignment is 16 bytes (128 bits), which means the memory address of your data needs to be a multiple of 16. E.g. 0x00010 would be 16 byte aligned, while 0x00011 would not be.

How to get your data to be aligned depends on the programming language (and sometimes compiler) you are using. Most languages that have the notion of a memory address will also provide you with means to specify the alignment.

0
votes

I'm guessing here, but could it be that "may not be aligned to a 16-byte boundary" means that this memory location has been aligned to a smaller value (4 or 8 bytes) before for some other purposes and now to execute SSE instructions on this memory you need to load it into a register explicitly?

-2
votes

Data that's aligned on a 16 byte boundary will have a memory address that's an even number — strictly speaking, a multiple of two. Each byte is 8 bits, so to align on a 16 byte boundary, you need to align to each set of two bytes.

Similarly, memory aligned on a 32 bit (4 byte) boundary would have a memory address that's a multiple of four, because you group four bytes together to form a 32 bit word.