best cross-platform method to get aligned memory

Question

Here is the code I normally use to get aligned memory with Visual Studio and GCC

inline void* aligned_malloc(size_t size, size_t align) {
    void *result;
    #ifdef _MSC_VER 
    result = _aligned_malloc(size, align);
    #else 
     if(posix_memalign(&result, align, size)) result = 0;
    #endif
    return result;
}

inline void aligned_free(void *ptr) {
    #ifdef _MSC_VER 
        _aligned_free(ptr);
    #else 
      free(ptr);
    #endif

}

Is this code fine in general? I have also seen people use _mm_malloc, _mm_free. In most cases that I want aligned memory it's to use SSE/AVX. Can I use those functions in general? It would make my code a lot simpler.

Lastly, it's easy to create my own function to align memory (see below). Why then are there so many different common functions to get aligned memory (many of which only work on one platform)?

This code does 16 byte alignment.

float* array = (float*)malloc(SIZE*sizeof(float)+15);

// find the aligned position
// and use this pointer to read or write data into array
float* alignedArray = (float*)(((unsigned long)array + 15) & (~0x0F));

// dellocate memory original "array", NOT alignedArray
free(array);
array = alignedArray = 0;

See: http://www.songho.ca/misc/alignment/dataalign.html and How to allocate aligned memory only using the standard library?

Edit: In case anyone cares, I got the idea for my aligned_malloc() function from Eigen (Eigen/src/Core/util/Memory.h)

Edit: I just discovered that posix_memalign is undefined for MinGW. However, _mm_malloc works for Visual Studio 2012, GCC, MinGW, and the Intel C++ compiler so it seems to be the most convenient solution in general. It also requires using its own _mm_free function, although on some implementations you can pass pointers from _mm_malloc to the standard free / delete.

While the unsigned long cast of the address might work in practice, it may not be portable between ILP32 / LP64 / LLP64 (win64) data models. — Brett Hale

Mats Petersson Mats Petersson · Accepted Answer · 2013-05-04T17:43:11

The first function you propose would indeed work fine.

Your "homebrew" function also works, but has the drawback that if the value is already aligned, you have just wasted 15 bytes. May not matter sometimes, but the OS may well be able to provide memory that is correctly allocated without any waste (and if it needs to be aligned to 256 or 4096 bytes, you risk wasting a lot of memory by adding "alignment-1" bytes).

best cross-platform method to get aligned memory

5 Answers