26
votes

In my program I have a function that does a simple vector addition c[0:15] = a[0:15] + b[0:15]. The function prototype is:

void vecadd(float * restrict a, float * restrict b, float * restrict c);

On our 32-bit embedded architecture there is a load/store option of loading/storing double words, like:

r16 = 0x4000  ;
strd r0,[r16] ; stores r0 in [0x4000] and r1 in [0x4004]

The GCC optimizer recognizes the vector nature of the loop and generates two branches of the code - one for the case where the 3 arrays are double word aligned (so it uses the double load/store instructions) and the other for the case that the arrays are word-aligned (where it uses the single load/store option).

The problem is that the address alignment check is costly relative to the addition part and I want to eliminate it by hinting the compiler that a, b and c are always 8-aligned. Is there a modifier to add to the pointer declaration to tell this to the compiler?

The arrays that are used for calling this function have the aligned(8) attribute, but it is not reflected in the function code itself. is it possible to add this attribute to the function parameters?

6
Even if my code below can't help you (because of it being C++), you might want to printf("%p") &array[0] and &array[1] in your code just to make sure that the align is being obeyed, and per element - not just on the array start address.Joe
@Joe - it is actually required that it DOES NOT align per array element. It really has to be a contiguous array of floats, whose origin is 8-aligned.ysap

6 Answers

13
votes

If the attributes don't work, or aren't an option ....

I'm not sure, but try this:

void vecadd (float * restrict a, float * restrict b, float * restrict c)
{
   a = __builtin_assume_aligned (a, 8);
   b = __builtin_assume_aligned (b, 8);
   c = __builtin_assume_aligned (c, 8);

   for ....

That should tell GCC that the pointers are aligned. From that whether it does what you want depends on whether the compiler can use that information effectively; it might not be smart enough: these optimizations aren't easy.

Another option might be to wrap the float inside a union that must be 8-byte aligned:

typedef union {
  float f;
  long long dummy;
} aligned_float;

void vedadd (aligned_float * a, ......

I think that should enforce 8-byte alignment, but again, I don't know if the compiler is smart enough to use it.

10
votes

Following a piece of example code I've found on my system, I tried the following solution, which incorporate ideas from a few of the answers given earlier: basically, create a union of a small array of floats with a 64-bit type - in this case a SIMD vector of floats - and call the function with a cast of the operand float arrays:

typedef float f2 __attribute__((vector_size(8)));
typedef union { f2 v; float f[2]; } simdfu;

void vecadd(f2 * restrict a, f2 * restrict b, f2 * restrict c);

float a[16] __attribute__((aligned(8)));
float b[16] __attribute__((aligned(8)));
float c[16] __attribute__((aligned(8)));

int main()
{
    vecadd((f2 *) a, (f2 *) b, (f2 *) c);
    return 0;
}

Now the compiler does not generate the 4-aligned branch.

However, the __builtin_assume_aligned() would be the preferable solution, preventing the cast and possible side effects, if it only worked...

EDIT: I noticed that the builtin function is actually buggy on our implementation (i.e, not only it doesn't work, but it causes calculation errors later in the code.

6
votes

How to tell GCC that a pointer argument is always double-word-aligned?

It looks like newer versions of GCC have __builtin_assume_aligned:

Built-in Function: void * __builtin_assume_aligned (const void *exp, size_t align, ...)

This function returns its first argument, and allows the compiler to assume that the returned pointer is at least align bytes aligned. This built-in can have either two or three arguments, if it has three, the third argument should have integer type, and if it is nonzero means misalignment offset. For example:

void *x = __builtin_assume_aligned (arg, 16);

means that the compiler can assume x, set to arg, is at least 16-byte aligned, while:

void *x = __builtin_assume_aligned (arg, 32, 8);

means that the compiler can assume for x, set to arg, that (char *) x - 8 is 32-byte aligned.

Based on some other questions and answers on Stack Overflow circa 2010, it appears the built-in was not available in GCC 3 and early GCC 4. But I do not know where the cut-off point is.

2
votes

gcc versions have been dodgy about align() on simple typedefs and arrays. Typically to do what you want, you would have to wrap the float in a struct, and have the contained float have the alignment restriction.

With operator overloading you can almost make this painless, but it does assume you can use c++ syntax.

#include <stdio.h>
#include <string.h>

#define restrict __restrict__

typedef float oldfloat8 __attribute__ ((aligned(8)));

struct float8
{
    float f __attribute__ ((aligned(8)));

    float8 &operator=(float _f) { f = _f; return *this; }
    float8 &operator=(double _f) { f = _f; return *this; }
    float8 &operator=(int _f) { f = _f; return *this; }

    operator float() { return f; }
};

int Myfunc(float8 * restrict a, float8 * restrict b, float8 * restrict c);

int MyFunc(float8 * restrict a, float8 * restrict b, float8 * restrict c)
{
    return *c = *a* *b;
}

int main(int argc, char **argv)
{
    float8 a, b, c;

    float8 p[4];

    printf("sizeof(oldfloat8) == %d\n", (int)sizeof(oldfloat8));
    printf("sizeof(float8) == %d\n", (int)sizeof(float8));

    printf("addr p[0] == %p\n", &p[0] );
    printf("addr p[1] == %p\n", &p[1] );

    a = 2.0;
    b = 7.0;
    MyFunc( &a, &b, &c );
    return 0;
}
1
votes

Alignment specifications usually only work for alignments that are smaller than the base type of a pointer, not larger.

I think easiest is to declare your whole array with an alignment specification, something like

typedef float myvector[16];
typedef myvector alignedVector __attribute__((aligned (8));

(The syntax might not be correct, I always have difficulties to know where to put these __attribute__s)

And use that type throughout your code. For your function definition I'd try

void vecadd(alignedVector * restrict a, alignedVector * restrict b, alignedVector * restrict c);

This gives you an additional indirection but this is only syntax. Something like *a is just a noop and only reinterprets the pointer as a pointer to the first element.

-1
votes

I never used it, but there is _attribute_((aligned (8)))

If I read the documentation right, then it is used this way:

void vecadd(float * restrict a __attribute__((aligned (8))), 
            float * restrict b __attribute__((aligned (8))), 
            float * restrict c __attribute__((aligned (8))));

see http://ohse.de/uwe/articles/gcc-attributes.html#type-aligned