3
votes

We have a (Numeric 3 float) vector class that I would love to align to 16-bytes in order to allow SIMD oerations. Using declspec to 16-byte align it causes a slew of C2719 errors (parameter': formal parameter with __declspec(align('#')) won't be aligned). If I can't pass around a vector aligned, what's the point? Even using a const reference to the vector is causing the compiler error which really annoys me.

Is there a way to do what I want here - get 16-byte class alignment while allowing struct passing without having to do some silly trickery to __m128 types?

5
You want to align the whole class? Why not just align the backing store and manage that inside the class? - Carl Norum
This is a backend edit of a large engine. Just seeing if there was a quick and dirty way to get a large amount of code SSE enabled without having to do a lot of rewrites outside of the vector class. - Michael Dorgan
But even if you align the class object, what makes the data it contains aligned? Maybe I'm missing something, but I'd guess you really want to do SIMD on the stuff in the vectors, right? - Carl Norum
The only data in the class is 3 floats. Right now, any sort of operator= or whatever cannot assume alignment. Just trying to force 16-byte alignment of those values. - Michael Dorgan
I get it - different kind of vector. - Carl Norum

5 Answers

6
votes

You're not likely to get much of a benefit from using SIMD unless you're operating on a bunch of these 3-dimensional vector structures at a time, in which case you would probably pass them in an array, which you could align as you need to. The other case where you might obtain some benefit from SIMD is if you're doing a lot of computations on each vector and you can parallelize the operations on the three channels. In that case, then doing some manual manipulation at the beginning of a function to coax it into a __m128 type might still afford you some benefit.

3
votes

If I can't pass around a vector aligned, what's the point?

__declspec(align(#)) does seem rather useless. C++11 has support for what you want; alignas appears to work in all the ways that __declspec(align(#)) is broken. For example, using alignas to declare your type will cause parameters of that type to be aligned.

Unfortunately Microsoft's compiler doesn't support standard alignment specifiers yet, and the only compiler I know of that does is Clang, which has limited support for Windows.

Anyway, I just wanted to point out that C++ has this feature and it will probably be available to you eventually. Unless you can move to another platform then for now you're probably best off with not passing parameters by value, as others have mentioned

1
votes

Surely you don't need to pass the array by value? Pass a pointer to the 16-byte-aligned array instead. Or have I misunderstood something?

1
votes

There is a __declspec(passinreg) that's supported on Xbox360, but not in Visual Studio for Windows at the moment.

You can vote for the request to support the feature here: http://connect.microsoft.com/VisualStudio/feedback/details/381542/supporting-declspec-passinreg-in-windows

For vector arguments in our engine we use a VectorParameter typedef'ed to either const Vector or const Vector& depending on whether the platform supports passing by register.

1
votes

While the question is old, situation with VC++ compiler hasn't changed much, so perhaps, these notes will be of value to someone. 1) The simple fix to allow classes or structs with __declspec(align(X)) to be passed to functions is to pass by reference. Use consts as needed. 2) There is definitely a reason to use SIMD for vector algebra. I was able to speed up the animation and skinning pass in our engine by 20% by switching just quat multiply and quat rotate functions to SIMD. No alignment, no arrays. Just two functions that took float[4] params. For something that wasn't poorly written to begin with and to result in measurable FPS improvement, this is nothing to sneeze at. And since these are the sort of things that can be hard to optimize later, there is really no such thing as premature optimization for vector algebra. 3) If you make your vectors into a class, all of the excessive _mm_store_ps and _mm_load_ps instructions on the stack optimize out under /O2. So while gain of having a single add via SIMD might be negligible, if you have cases where you run several operations back to back, the resulting code is blazing fast.