0
votes

The documentation for _alloca() says here:

The _alloca routine returns a void pointer to the allocated space, which is guaranteed to be suitably aligned for storage of any type of object.

However, here it says:

_alloca is required to be 16-byte aligned and additionally required to use a frame pointer.

So it seems that in the first reference they forgot about 32-byte aligned AVX/AVX2 types like __m256d.

Another thing that confuses me is that the first page says _alloca() is deprecated, while it suggests to use instead a function that may allocate memory from the heap rather than the stack (which is unacceptable in my multi-threaded application).

So can someone point me whether there is some modern (perhaps, new C/C++ standard?) way for aligned stack memory allocation?

Clarification 1: Please, don't provide solutions which require the array size to be compile-time constant. My function allocates variable number of array items depending on run-time parameter value.

4
First, decide if you are asking about C or C++, though _alloca is not part of either of them.user2100815
alloca align allocation on 16byte. if you need another align - allocate more memory and align yourselfRbMm
Will std::aligned_storage work for your needs? You can specify the alignment as the second template parameter and it comes from the stack given the example implementation which uses alignas. en.cppreference.com/w/cpp/types/aligned_storageJoe
What is alignof(__m256d), for the benefit of people who don't have your platform extensions?Kerrek SB
@KerrekSB, it was in the question: 32 bytes.Serge Rogatch

4 Answers

4
votes

Overallocate with _alloca(), then hand-align. Like this:

const int align = 32;
void *p =_alloca(n + align - 1);
__m256d *pm = (__m256d *)((((int_ptr_t)p + align - 1) / align) * align);

Replace const with #define, if necessary.

2
votes

_alloca() is certainly not a standard or portable way of handling alignment on the stack. Luckily in C++11 we got alignas and std::aligned_storage. Neither of these forces you to put anything on the heap, so they should work for your use case. For example, to align an array of structs to a 32 byte boundary:

#include <type_traits>

struct bar { int member; /*...*/ };
void fun() {
  std::aligned_storage<sizeof(bar), 32>::type array[16];
  auto bar_array = reinterpret_cast<bar*>(array);
}

Or if you just want to align a single variable on the stack to a boundary:

void bun() {
  alignas(32) bar b;
}

You can also use the alignof operator to get the alignment requirements for a given type.

1
votes

C++11 introduced the alignof operator:

An alignof expression yields the alignment requirement of its operand type.

You can use it as follows:

struct s {};
typedef s __attribute__ ((aligned (64))) aligned_s;

std::cout << alignof(aligned_s); // Outputs: 64

Note: If your type's alignment is bigger than its size, the compiler won't let you declare arrays of the array type(See more here):

error: alignment of array elements is greater than element size

But, if your type's alignment is smaller then its size, you can safely allocate arrays:

aligned_s arr[32];
-- OR --
constexpr size_t arr_size = 32;
aligned_s arr[arr_size];

Compilers that support VLAs, will allow those for the newly defined type as well.

1
votes

The "modern" way is:

Don't make variable-length allocation on the stack.

In the context of your question - wanting to allocate on the heap but refraining from doing so - I'm assuming you may be allocating more than some small compile-time constant amount of memory. In that case, you're simply going to smash your stack with that alloca() call. Instead, use a thread-safe memory allocator. I'm sure there are libraries for this on GitHub (and at worst you could protect allocation calls with a global mutex, although that's slow if you need lots of them).

On the other hand, if you do know in advance what's the cap on the allocation size - just pre-allocate that much memory in thread-local storage; or use a fixed-size local array (which will get allocated on the stack).