I am trying to learn what _mm256_permute2f128_ps()
does, but can't fully understand the intel's code-example.
DEFINE SELECT4(src1, src2, control) {
CASE(control[1:0]) OF
0: tmp[127:0] := src1[127:0]
1: tmp[127:0] := src1[255:128]
2: tmp[127:0] := src2[127:0]
3: tmp[127:0] := src2[255:128]
ESAC
IF control[3]
tmp[127:0] := 0
FI
RETURN tmp[127:0]
}
dst[127:0] := SELECT4(a[255:0], b[255:0], imm8[3:0])
dst[255:128] := SELECT4(a[255:0], b[255:0], imm8[7:4])
dst[MAX:256] := 0
Specifically, I don't understand:
the
imm8[3:0]
notation. Are they using it as a 4-byte mask? But I've seen people invoke_mm256_permute2f128_pd(myVec, myVec, 5)
, where imm8 is used as a number (number 5).Inside the
SELECT4
function, what doescontrol[1:0]
mean? Is control a byte-mask, or used as a number? How many bytes is it made of?- why
IF control[3]
is used in intel's example. Doesn't it undo the choice3:
insideCASE
? Why would we ever want to settmp[127 to 0]
to zero, if we've been outputting into it?