I want to do the following: I have 8 values (8 x 1Byte) in a Neon D-Register (=64Bit). Now I need to shift every value 3 to the left, but I dont want to lose any Bits. Afterwards I need to add to every value in the vector the same 32Bit value.
As I understood it i can use the VQSHL instruction to put the result in 2 D-Registers if it overflows? How do I know if an overflow occured and guarantee/force that all of my data are in the new registers?
Also could you help me with some Code for the shift and Add part?
Example Code:
out0 = CONSTANT_32BIT + ( input0 << 3)
out1 = CONSTANT_32BIT + ( input1 << 3)
out_n = CONSTANT_32BIT + ( input_n << 3)
So in theory i could do 8 or 16 of these instructions in parallel using Neon registers?
Target is an ARM Cortex-A9 if this is important.