Just starting on intrinsics, and hit something that exposed my ignorance. Here's an artificial version of what I'm seeing (VS2015):
__m128i test;
//test.m128i_u16[0] = 127;
//test.m128i_u16[1] = 128;
//test.m128i_u16[2] = 129;
//test.m128i_u16[3] = 130;
//test.m128i_u16[4] = 131;
//test.m128i_u16[5] = 132;
//test.m128i_u16[6] = 133;
//test.m128i_u16[7] = 134;
test.m128i_u16[0] = 50;
test.m128i_u16[1] = 70;
test.m128i_u16[2] = 90;
test.m128i_u16[3] = 110;
test.m128i_u16[4] = 50;
test.m128i_u16[5] = 70;
test.m128i_u16[6] = 90;
test.m128i_u16[7] = 110;
__m128i result = _mm_packus_epi16 (test, test);
So that last command "converts packed 16-bit integers from a and b to packed 8-bit integers using unsigned saturation, and store the results in dst". If I run as shown, I get what I expect:
- m128i_i8 char[16]
[0] 50 char
[1] 70 char
[2] 90 chara
[3] 110 char
[4] 50 char
[5] 70 char
[6] 90 char
[7] 110 char
[8] 50 char
[9] 70 char
[10] 90 char
[11] 110 char
[12] 50 char
[13] 70 char
[14] 90 char
[15] 110 char
but if I swap the inputs above (use the commented value set), then I get what looks to be integer saturated results: -
m128i_i8 char[16]
[0] 127 char
[1] -128 char
[2] -127 char
[3] -126 char
[4] -125 char
[5] -124 char
[6] -123 char
[7] -122 char
[8] 127 char
[9] -128 char
[10] -127 char
[11] -126 char
[12] -125 char
[13] -124 char
[14] -123 char
[15] -122 char
What am I missing here? Interpretion, wrong command?