Intel Intrinsics pack commands misunderstanding

Question

Just starting on intrinsics, and hit something that exposed my ignorance. Here's an artificial version of what I'm seeing (VS2015):

__m128i test;

//test.m128i_u16[0] = 127;
//test.m128i_u16[1] = 128;
//test.m128i_u16[2] = 129;
//test.m128i_u16[3] = 130;
//test.m128i_u16[4] = 131;
//test.m128i_u16[5] = 132;
//test.m128i_u16[6] = 133;
//test.m128i_u16[7] = 134;

test.m128i_u16[0] = 50;
test.m128i_u16[1] = 70;
test.m128i_u16[2] = 90;
test.m128i_u16[3] = 110;
test.m128i_u16[4] = 50;
test.m128i_u16[5] = 70;
test.m128i_u16[6] = 90;
test.m128i_u16[7] = 110;

__m128i result = _mm_packus_epi16 (test, test);

So that last command "converts packed 16-bit integers from a and b to packed 8-bit integers using unsigned saturation, and store the results in dst". If I run as shown, I get what I expect:

-       m128i_i8    char[16]
        [0] 50      char
        [1] 70      char
        [2] 90      chara
        [3] 110     char
        [4] 50      char
        [5] 70      char
        [6] 90      char
        [7] 110     char
        [8] 50      char
        [9] 70      char
        [10] 90     char
        [11] 110    char
        [12] 50     char
        [13] 70     char
        [14] 90     char
        [15] 110    char

but if I swap the inputs above (use the commented value set), then I get what looks to be integer saturated results: -

    m128i_i8        char[16]
        [0]     127     char
        [1]     -128    char
        [2]     -127    char
        [3]     -126    char
        [4]     -125    char
        [5]     -124    char
        [6]     -123    char
        [7]     -122    char
        [8]     127     char
        [9]     -128    char
        [10]    -127    char
        [11]    -126    char
        [12]    -125    char
        [13]    -124    char
        [14]    -123    char
        [15]    -122    char

What am I missing here? Interpretion, wrong command?

Your question would be a LOT shorter and easier to read if you'd make a table where the corresponding inputs and outputs lined up (horizontally or vertically). — Peter Cordes

Peter Cordes Peter Cordes · Accepted Answer · 2016-10-05T04:57:32

You appear to be printing your result vector as holding int8_t, not uint8_t elements, even though you did an unsigned saturation. So every value above 127 is printed as a negative number.

So everything that saturated to 0xFF will print as -1. (Everything that saturated to 0 will print as 0, but none of your int16_t inputs were negative).

Also note that PACKUSWB treats its input as signed, in case that wasn't clear.

Intel Intrinsics pack commands misunderstanding

1 Answers