Intel's official optimization guide has a chapter on converting from MMX commands to SSE where they state the fallowing statment:
Computation instructions which use a memory operand that may not be aligned to a 16-byte boundary must be replaced with an unaligned 128-bit load (MOVDQU) followed by the same computation operation that uses instead register operands.
(chapter 5.8 Converting from 64-bit to 128-bit SIMD Integers, pg. 5-43)
I can't understand what they mean by "may not be aligned to a 16-byte boundary", could you please clarify it and give some examples?