4
votes

i am trying to get rid of unaligned loads and stores for SSE instructions for my application by replacing the

_mm_loadu_ps()

by

_mm_load_ps()

and allocating memory with:

float *ptr = (float *) _mm_malloc(h*w*sizeof(float),16)

instead of:

float *ptr = (float *) malloc(h*w*sizeof(float))

However wehen i print the pointer addresses using:

printf("%p\n", &ptr)

I get output:

0x2521d20
0x2521d28
0x2521d30
0x2521d38
0x2521d40
0x2521d48
...

This is not 16-byte aligned, even though i used the _mm_malloc function? And when using the aligned load/store operations for the SSE instructions i yield a segmentation error since the data isn't 16-byte aligned.

Any ideas why it isn't aligned properly or any other ideas to fix this?

Thanks in advance!


Update

Using the

printf("%p\n",ptr)

solved the problem with the memory alignment, the data is indeed properly aligned.

However i still get a segmentation fault when trying to do an aligned load/store on this data and i'm suspecting it's a pointer issue.

When allocating the memory:

contents* instance;
instance.values = (float *) _mm_malloc(h*w*sizeof(float),16);    

I have a struct with:

typedef struct{
  ...
  float** values;
  ...
}contents;

In the code i then execute in another function, with a pointer to contents passed as argument:

__m128 tmp = _mm_load_ps(&contents.values);

Do you guys see anything i am missing? Thanks for all the help so far :)

1
Are you sure, that h * w * sizeof( float ) is a multiple of 16?Christopher
I'm pretty sure it is but h and w are variable. Is it required this is a multiple of 16?Ricky
Yes, of course. This loads 4 floats. So you have to have at least 4 floats available.Christopher
I already handled this with a switch statement when loading the variables, but this isn't required for the memory allocation and isn't the problem, i think it's a problem with pointers but i'm not sureRicky
Then you have to post the crashing code.Christopher

1 Answers

4
votes

Change:

printf("%p\n", &ptr)

to:

printf("%p\n", ptr)

It's the memory that ptr is pointing to that needs to be 16 byte aligned, not the actual pointer variable itself.