1
votes

I would like to create an SSE register with values that I can store in an array of integers, from another SSE register which contains flags 0xFFFF and zeros. For example:

__m128i regComp = _mm_cmpgt_epi16(regA, regB);

For the sake of argument, lets assume that regComp was loaded with { 0, 0xFFFF, 0, 0xFFFF }. I would like to convert this into say { 0, 80, 0, 80 }.

What I had in mind was to create an array of integers, initialized to 80 and load them to a register regC. Then, do a _mm_and_si128 bewteen regC and regComp and store the result in regD. However, this does not do the trick, which led me to think that I do not understand the positive flags in SSE registers. Could someone answer the question with a brief explanation why my solution does not work?

short valA[16] = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16 };
short valB[16] = { 5, 5, 5, 5, 5, 5, 5, 5, 5, 10, 10, 10, 10, 10, 10, 10 };
short ones[16] = { 1 };
short final[16];

__m128i vA, vB, vOnes, vRes, vRes2;

vOnes = _mm_load_si128((__m128i *)&(ones)[0] );

for( i=0 ; i < 16 ;i+=8){
   vA = _mm_load_si128((__m128i *)&(valA)[i] );
   vB = _mm_load_si128((__m128i *)&(valB)[i] );

   vRes = _mm_cmpgt_epi16(vA,vB);

   vRes2 = _mm_and_si128(vRes,vOnes);
   _mm_storeu_si128((__m128i *)&(final)[i], vRes2);
 }
2
Actually, this works. Could you post a complete code? - galinette
If your variables / arrays already have __m128* type, you can just use them directly. No need to take their address and do a _mm_load. The load intrinsics are mostly there to avoid ugly casts from pointers to scalar types. (Also if you aren't using AVX, and need to load unaligned data, then you need the loadu intrinsics.) - Peter Cordes

2 Answers

2
votes

You only set the first element of array ones to 1 (the rest of the array is initialised to 0).

I suggest you get rid of the array ones altogether and then change this line:

vOnes = _mm_load_si128((__m128i *)&(ones)[0] );

to:

vOnes = _mm_set1_epi16(1);

Probably a better solution though, if you just want to convert SIMD TRUE (0xffff) results to 1, would be to use a shift:

for (i = 0; i < 16; i += 8) {
    vA = _mm_loadu_si128((__m128i *)&pA[i]);
    vB = _mm_loadu_si128((__m128i *)&pB[i]);

    vRes = _mm_cmpgt_epi16(vA, vB);    // generate 0xffff/0x0000 results

    vRes = _mm_srli_epi16(vRes, 15);   // convert to 1/0 results

    _mm_storeu_si128((__m128i *)&final[i], vRes2);
}
1
votes

Try this for loading 1:

vOnes = _mm_set1_epi16(1);

This is shorter than creating a constant array.

Be careful, providing less array values than array size in C++ initializes the other values to zero. This was your error, and not the SSE part.

Don't forget the debugger, modern ones display SSE variables properly.