0
votes

Assume I have an array of size 8, filled with unsigned int.

unsigned int t[8]

Now I want to load the first 16 bits of each element into a 128-bit register:

__m128i to_fill

Is there a fast way to do this? Instead of using a loop and masking out the bits for each element?

1

1 Answers

3
votes

You would need to load two vectors of 4 x 32 bit ints, mask out the high 16 bits of each element, and then pack them into a single vector of 8 x 16 bits ints.

__m128i v_lo = _mm_loadu_si128((__m128i *)&t[0]);
__m128i v_hi = _mm_loadu_si128((__m128i *)&t[4]);
v_lo = _mm_and_si128(v_lo, _mm_set1_epi32(0xffff));
v_hi = _mm_and_si128(v_hi, _mm_set1_epi32(0xffff));
__m128i v = _mm_packs_epi32(v_lo, v_hi);