1
votes

I don't understand how I differentiate between vbit, vbsl and vbif with neon intrinsics. I need to do the vbit operation but if I use the vbslq instruction from the intrinsics I don't get what I want.

For example I have a source vector like this:

uint8x16_t source = 39 62 9b 52 34 5b 47 48 47 35 0 0 0 0 0 0

The destination vector is:

uint8x16_t destination = 0 0 0 0 0 0 0 0 0 0 0 0 c3 c8 c5 d5

I would like to have as an output this:

39 62 9b 52 34 5b 47 48 47 35 0 0 c3 c8 c5 d5

meaning that I want to copy the first ten bytes from the source and leave the other 6 unchanged. I'm using this mask:

{0,0,0,0,0,0,0xFF,0xFF,0xFF,0xFF,0xFF,0xFF,0xFF,0xFF,0xFF,0xFF};

What is the correct way to use the vbslq_u8?

1
Which intrinsic are you you using and what do you want to it to do ? Maybe you could post the relevant section of your code as it is now and explain what you need to happen ? - Paul R
I need to do the same exact thing of this stackoverflow.com/questions/18312814/… . I tried to do as it's said in the answer but the result I get is not what I want. The instruction I use is the vbslq_u8 but I don't understand what it does exactly. - user1926328
OK - so post the code, also post an example of the input data you are passing to the intrinsic, what you expect the output data to be, and what the actual data is. - Paul R
@PaulR . I edited my answer with all the info. - user1926328

1 Answers

6
votes

The ARM documentation is not very clear, but it looks like you would need to use the intrinsic like this:

uint8x16_t src =  {0x39,0x62,0x9b,0x52,0x34,0x5b,0x47,0x48,
                   0x47,0x35,0x00,0x00,0x00,0x00,0x00,0x0};
uint8x16_t dest = {0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,
                   0x00,0x00,0x00,0x00,0xc3,0xc8,0xc5,0xd5};
uint8x16_t mask = {0xff,0xff,0xff,0xff,0xff,0xff,0xff,0xff,
                   0xff,0xff,0x00,0x00,0x00,0x00,0x00,0x00};

dest = vbslq_u8(mask, src, dest);

Note that order of bytes in the mask needs to correspond with the order in the source/dest registers (they seem to be swapped in your question ?).

Also note that the first param to the intrinsic appears to be the selection mask, where a 1 bit selects the corresponding bit from the second param and a 0 bit selects the corresponding bit from the third param.