How to determine if memory is aligned?

Question

I am new to optimizing code with SSE/SSE2 instructions and until now I have not gotten very far. To my knowledge a common SSE-optimized function would look like this:

void sse_func(const float* const ptr, int len){
    if( ptr is aligned )
    {
        for( ... ){
            // unroll loop by 4 or 2 elements
        }
        for( ....){
            // handle the rest
            // (non-optimized code)
        }
    } else {
        for( ....){
            // regular C code to handle non-aligned memory
        }
    }
}

However, how do I correctly determine if the memory ptr points to is aligned by e.g. 16 Bytes? I think I have to include the regular C code path for non-aligned memory as I cannot make sure that every memory passed to this function will be aligned. And using the intrinsics to load data from unaligned memory into the SSE registers seems to be horrible slow (Even slower than regular C code).

Thank you in advance...

random-name, not sure but I think it might be more efficient to simply handle the first few 'unaligned' elements separately like you do with the last few. Then you can still use SSE for the 'middle' ones... — Rehno Lindeque
Better: use a scalar prologue to handle the misaligned elements up to the first alignment boundary. (gcc does this when auto-vectorizing with a pointer of unknown alignment.) Or if your algorithm is idempotent (like a[i] = foo(b[i])), do a potentially-unaligned first vector, then the main loop starting at the first alignment boundary after the first vector, then a final vector that ends at the last element. If the array was in fact misaligned and/or the count wasn't a multiple of the vector width, then some of those vectors will overlap, but that still beats scalar. — Peter Cordes
Best: supply an allocator that provides 16-byte aligned memory. Then operate on the 16-byte aligned buffer without the need to fixup leading or tail elements. This is what libraries like Botan and Crypto++ do for algorithms which use SSE, Altivec and friends. — jww

Christoph Christoph · Accepted Answer · 2009-12-14T01:26:57

#define is_aligned(POINTER, BYTE_COUNT) \
    (((uintptr_t)(const void *)(POINTER)) % (BYTE_COUNT) == 0)

The cast to void * (or, equivalenty, char *) is necessary because the standard only guarantees an invertible conversion to uintptr_t for void *.

If you want type safety, consider using an inline function:

static inline _Bool is_aligned(const void *restrict pointer, size_t byte_count)
{ return (uintptr_t)pointer % byte_count == 0; }

and hope for compiler optimizations if byte_count is a compile-time constant.

Why do we need to convert to void * ?

The C language allows different representations for different pointer types, eg you could have a 64-bit void * type (the whole address space) and a 32-bit foo * type (a segment).

The conversion foo * -> void * might involve an actual computation, eg adding an offset. The standard also leaves it up to the implementation what happens when converting (arbitrary) pointers to integers, but I suspect that it is often implemented as a noop.

For such an implementation, foo * -> uintptr_t -> foo * would work, but foo * -> uintptr_t -> void * and void * -> uintptr_t -> foo * wouldn't. The alignment computation would also not work reliably because you only check alignment relative to the segment offset, which might or might not be what you want.

In conclusion: Always use void * to get implementation-independant behaviour.

How to determine if memory is aligned?

8 Answers