Intrinsics SIMD instruction to replace values

Question

I wonder how it would be possible to replace byte values in a Vector128<byte>

I think it is okay to assume the code below where we have a resultvector with those values : <0,0,0,0,1,1,1,1,0,0,0,0,0,0,0,0>

Here I like to create a new vector where all "0" will be replaced with "2" and all "1" will be replaced with "0" like this : <2,2,2,2,0,0,0,0,2,2,2,2,2,2,2,2>

I am not sure if there is an intrinsics for this or how to achieve this?

Thank you!

        //Create array
        byte[] array = new byte[16];
        for (int i = 0; i < 4; i++) { array[i] = 0; }
        for (int i = 4; i < 8; i++) { array[i] = 1; }
        for (int i = 8; i < 16; i++) { array[i] = 0; }


        fixed (byte* ptr = array)
        {
            byte* pointarray = &*((byte*)(ptr + 0)); 
            System.Runtime.Intrinsics.Vector128<byte> resultvector = System.Runtime.Intrinsics.X86.Avx.LoadVector128(&pointarray[0]);

            //<0,0,0,0,1,1,1,1,0,0,0,0,0,0,0,0>
            //resultvector
        }

Maybe something like 2-(resultvector<<1) (if shifts are not possible, just add to itself). But how did you actually compute resultvector? (If you are able to modify that calculation, there could be more efficient ways). — chtz
I edited my post. I think that example is equal to how the real example is where I am left with this resultvector. I am not sure what this means: 2-(resultvector<<1) ? I use C#. — Andreas
<< is the shift operator. I'm not very familiar with C# so I don't know if it allows that on SIMD types (or if C# overloads operators for SIMD types ...). Essentially you need to calculate 2-(a[i]+a[i]) for each element (for which there must be a way with just two SIMD instructions). — chtz
You probably did not understand what I meant by "how you actually compute resultvector". If your code was how you actually calculate it, just replace array[i]=0; by array[i]=2; and array[i]=1; by array[i]=0 (I'm sure this is not what you need ...). — chtz
Yes, I am not sure if << is possible to use. I have never seen that one to be used in C#. I think it is not possible. I beleive there must be some kind of SIMD instruction in C# to use somehow, I hope. — Andreas

Soonts Soonts · Accepted Answer · 2020-08-01T15:09:28

The instruction for that is pshufb, available in modern .NET as Avx2.Shuffle, and Ssse3.Shuffle for 16-byte version. Both are really fast, 1 cycle latency on modern CPUs.

Pass your source data into shuffle control mask argument, and a special value for the first argument which is the bytes being shuffled, something like this:

// Create AVX vector with all zeros except the first byte in each 16-byte lane which is 2
static Vector256<byte> makeShufflingVector()
{
    Vector128<byte> res = Vector128<byte>.Zero;
    res = Sse2.Insert( res.AsInt16(), 2, 0 ).AsByte();
    return Vector256.Create( res, res );
}

See _mm_shuffle_epi8 section on page 18 of this article for details.

Update: if you don’t have SSSE3, you can do the same in SSE2, in 2 instructions instead of 1:

static Vector128<byte> replaceZeros( Vector128<byte> src )
{
    src = Sse2.CompareEqual( src, Vector128<byte>.Zero );
    return Sse2.And( src, Vector128.Create( (byte)2 ) );
}

By the way, there’s a performance problem in .NET that prevents compiler from loading constants outside of loops. If you gonna call that method in a loop and want to maximize the performance, consider passing both constant vectors, with zero and 2, as method parameters.

Intrinsics SIMD instruction to replace values

1 Answers