I would like to create an SSE register with values that I can store in an array of integers, from another SSE register which contains flags 0xFFFF and zeros. For example:
__m128i regComp = _mm_cmpgt_epi16(regA, regB);
For the sake of argument, lets assume that regComp was loaded with { 0, 0xFFFF, 0, 0xFFFF }. I would like to convert this into say { 0, 80, 0, 80 }.
What I had in mind was to create an array of integers, initialized to 80 and load them to a register regC. Then, do a _mm_and_si128 bewteen regC and regComp and store the result in regD. However, this does not do the trick, which led me to think that I do not understand the positive flags in SSE registers. Could someone answer the question with a brief explanation why my solution does not work?
short valA[16] = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16 };
short valB[16] = { 5, 5, 5, 5, 5, 5, 5, 5, 5, 10, 10, 10, 10, 10, 10, 10 };
short ones[16] = { 1 };
short final[16];
__m128i vA, vB, vOnes, vRes, vRes2;
vOnes = _mm_load_si128((__m128i *)&(ones)[0] );
for( i=0 ; i < 16 ;i+=8){
vA = _mm_load_si128((__m128i *)&(valA)[i] );
vB = _mm_load_si128((__m128i *)&(valB)[i] );
vRes = _mm_cmpgt_epi16(vA,vB);
vRes2 = _mm_and_si128(vRes,vOnes);
_mm_storeu_si128((__m128i *)&(final)[i], vRes2);
}
__m128*type, you can just use them directly. No need to take their address and do a_mm_load. The load intrinsics are mostly there to avoid ugly casts from pointers to scalar types. (Also if you aren't using AVX, and need to load unaligned data, then you need theloaduintrinsics.) - Peter Cordes