4
votes

I have to extract non-zero values of an __m128i register. For example I have a vector with eight unsigned shorts.

__m128i vector {40, 0, 22, 0, 0, 0, 0, 8}

I want to extract the 40, 22 and 8 with a minimal amount of SSE instructions. The non-zero values will then be stored in an array of non zero values.

{40, 22, 8, more values from different vectors ... }

Is it possible to shuffle them or is there a good intrinsic to extract and store?

2
Can we assume SSE 4 ? - Paul R
yes we can but I would prefer SSSE3. - martin s
Does the order of the non-zero values need to be preserved ? - Paul R
yes its important to keep the order of the non-zero values. - martin s

2 Answers

2
votes

If you look at this paper, the authors describe how to use _mm_cmpestrm instruction to do basically want you want. The core of their algorithm is this (which I've modified slightly to do what you want, instead of what they want):

__m128i res_v = _mm_cmpestrm(
    vector, 
    8, 
    mm_setzero_si128(),
    8,
    _SIDD_UWORD_OPS|_SIDD_CMP_EQUAL_ANY|_SIDD_BIT_MASK|_SIDD_NEGATIVE_POLARITY);
int r = _mm_extract_epi32(res_v, 0);

__m128i p = _mm_shuffle_epi8(vector, sh_mask[r]);

If you build the look-up table sh_mask as described in the paper, then p should have the non-zero elements (without any re-ordering) followed by the zero elements. The number of bits set in r will tell you the number of non-zero elements.

_mm_cmpestrm is in SSE4, unfortunately.

2
votes

Based on anjruu's answer, here's an SSSE3 version that has not been tested in any way:

; xmm0 = input
pxor xmm1, xmm1
pcmpeqb xmm1, xmm0
pmovmskb eax, xmm1
shl eax, 4
pshufb xmm0, [table + eax]

The table is different of course, but not that hard to work out, just keep in mind that the index is "inverted" - eg index 0 corresponds to having no zeros and 0xFFFF corresponds to all zeros.