Assume I have an array of size 8, filled with unsigned int.
unsigned int t[8]
Now I want to load the first 16 bits of each element into a 128-bit register:
__m128i to_fill
Is there a fast way to do this? Instead of using a loop and masking out the bits for each element?