6
votes

In SSE, if I have a 128-bit register containing 4 floats i.e.

A = a b c d ('a','b','c','d' are floats and 'A' is a 128-bit SSE register)

and

B = e f g h

then if I want

C = a e b f

I can simply do:

C = _mm_unpacklo_ps(A,B);

Similarly if I want

D = c g d h

I can do:

D = _mm_unpackhi_ps(A,B);

If I have an AVX register containing doubles, is it possible to do the same with a single instruction?

Based on how these intrinsics work, I know that I can't use _mm256_unpacklo_pd(), _mm256_shuffle_pd(), _mm256_permute2f128_pd() or _mm256_blend_pd(). Is there any instruction apart from these that I can use or do I have to use a combination of the above instructions?

1

1 Answers

4
votes

One way that I can think of is the following:

A1 = _mm256_unpacklo_pd(A,B);
A2 = _mm256_unpackhi_pd(A,B);

C = _mm256_permute2f128_pd(A1,A2,0x20);
D = _mm256_permute2f128_pd(A1,A2,0x31);

If anyone has a better solution, please do post below.