Why use _mm_malloc? (as opposed to _aligned_malloc, alligned_alloc, or posix_memalign)

Question

There are a few options for acquiring an aligned block of memory but they're very similar and the issue mostly boils down to what language standard and platforms you're targeting.

C11

void * aligned_alloc (size_t alignment, size_t size)

POSIX

int posix_memalign (void **memptr, size_t alignment, size_t size)

Windows

void * _aligned_malloc(size_t size, size_t alignment);

And of course it's also always an option to align by hand.

Intel offers another option.

Intel

void* _mm_malloc (int size, int align)
void _mm_free (void *p)

Based on source code released by Intel, this seems to be the method of allocating aligned memory their engineers prefer but I can't find any documentation comparing it to other methods. The closest I found simply acknowledges that other aligned memory allocation routines exist.

https://software.intel.com/en-us/articles/memory-management-for-optimal-performance-on-intel-xeon-phi-coprocessor-alignment-and

To dynamically allocate a piece of aligned memory, use posix_memalign, which is supported by GCC as well as the Intel Compiler. The benefit of using it is that you don’t have to change the memory disposal API. You can use free() as you always do. But pay attention to the parameter profile:

int posix_memalign (void **memptr, size_t align, size_t size);

The Intel Compiler also provides another set of memory allocation APIs. C/C++ programmers can use _mm_malloc and _mm_free to allocate and free aligned blocks of memory. For example, the following statement requests a 64-byte aligned memory block for 8 floating point elements.

farray = (float *)__mm_malloc(8*sizeof(float), 64);

Memory that is allocated using _mm_malloc must be freed using _mm_free. Calling free on memory allocated with _mm_malloc or calling _mm_free on memory allocated with malloc will result in unpredictable behavior.

The clear differences from a user perspective is that _mm_malloc requires direct CPU and compiler support and memory allocated with _mm_malloc must be freed with _mm_free. Given these drawbacks, what is the reason for ever using _mm_malloc? Can it have a slight performance advantage? Historical accident?

@alk There's no reason to be rude. If the answer is obvious to you then please explain. — Praxeolitic
It might sound rude, it isn't meant this way. It is a question, probably a bit sarcastic. — alk
Perhaps I should have better asked why you think the document does not answer your question.... ;-) — alk
@alk Hmmm... I'm just not seeing an answer in the linked doc... if it's there either my eyes or brain have fallen out of my head today (or both). Wait, did you read this whole question? Especially the last paragraph? — Praxeolitic
But you are right, the document does not answer your question. Please excuse my imputation. However the interesting part is the one about the functions accessing the "scalable" memory pools, which use the same signature as the _mm_*() functions. — alk

Jeff Hammond Jeff Hammond · Accepted Answer · 2015-09-20T03:00:18

Intel compilers support POSIX (Linux) and non-POSIX (Windows) operating systems, hence cannot rely upon either the POSIX or the Windows function. Thus, a compiler-specific but OS-agnostic solution was chosen.

C11 is a great solution but Microsoft doesn't even support C99 yet, so who knows if they will ever support C11.

Update: Unlike the C11/POSIX/Windows allocation functions, the ICC intrinsics include a deallocation function. This allows this API to use a separate heap manager from the default one. I don't know if/when it actually does that, but it can be useful to support this model.

Disclaimer: I work for Intel but have no special knowledge of these decisions, which happened long before I joined the company.

Why use _mm_malloc? (as opposed to _aligned_malloc, alligned_alloc, or posix_memalign)

3 Answers