I want to find max of two vectors containing 8 x 16 bit unsigned int elements.
__m128i vi_A= _mm_loadu_si128(reinterpret_cast<const __m128i*>(&pSrc[0])); // 8 16-Bit Elements
__m128i vi_B= _mm_loadu_si128(reinterpret_cast<const __m128i*>(&pSrc1[0])); // 8 16-Bit Elements
__m128i vi_Max = _mm_max_epi16(vi_A,vi_B); //<-- Error
But this _mm_max_epi16
is a signed int comparison and this causes overflow.
So I tried to use the unsigned version of it by using SSE4.1 intrinsic
vi_Max = _mm_max_epu16(vi_A,vi_B);
but I'm not allowed to use SSE4.1 intrinsics. So what is the efficient way to find the max of these elements?