I am attempting to re-write a raytracer using Streaming SIMD Extensions. My original raytracer used inline assembly and movups instructions to load data into the xmm registers. I have read that compiler intrinsics are not significantly slower than inline assembly (I suspect I may even gain speed by avoiding unaligned memory accesses), and much more portable, so I am attempting to migrate my SSE code to use the intrinsics in xmmintrin.h. The primary class affected is vector, which looks something like this:
#include "xmmintrin.h"
union vector {
__m128 simd;
float raw[4];
//some constructors
//a bunch of functions and operators
} __attribute__ ((aligned (16)));
I have read previously that the g++ compiler will automatically allocate structs along memory boundaries equal to that of the size of the largest member variable, but this does not seem to be occurring, and the aligned attribute isn't helping. My research indicates that this is likely because I am allocating a whole bunch of function-local vectors on the stack, and that alignment on the stack is not guaranteed in x86. Is there any way to force this alignment? I should mention that this is running under native x86 Linux on a 32-bit machine, not Cygwin. I intend to implement multithreading in this application further down the line, so declaring the offending vector instances to be static isn't an option. I'm willing to increase the size of my vector data structure, if needed.
std::aligned_storage
, you can get aligned storage in a way that is portable and will work on other compilers too - James McNellisstd::aligned_storage
; that said, my machine has g++ 4.4.3, but there's a second machine I was hoping to be able to run it on that's locked to 3.4.6. - Octavianus(float*)my_m128
is not safe. The may-alias-anything property of Intel vector types only goes one way, just how you can access anything withchar*
, but it's not guaranteed safe to access achar[]
with anint*
. (It's safe in MSVC, which is likegcc -fno-strict-aliasing
, but in other compilers you should use unions or shuffle intrinsics to access elements of vectors.) See print a __m128i variable for an example. - Peter Cordes