I am new to assembler and NEON programming. My task is to convert part of an algorithm from C to ARM Assembler using NEON instructions. The algorithm takes an int32 array, loads different values from this array, does some bitshifting and Xor and writes the result in another array. Later I will use an array with 64bit values, but for now i just try to rewrite the code.
C Pseudo code:
out_array[index] = shiftSome( in_array[index] ) ^ shiftSome( in_array[index] );
So here are my questions regarding NEON Instructions:
1.) If i load a register like this:
vld1.32 d0, [r1]
will it load only 32Bit from the memory or 2x32Bit to fill the 64Bit Neon D-Register?
2.) How can I access the 2/4/8 (i32, i16, i8) parts of the D-Register?
3.) I am trying to load different values from the array with an offset, but it doesn't seem to work...what am I doing wrong... here is my code: (it is an integer array so I´m trying to load for example the 3-element, which should have an offset of 64Bit = 8 Byte)
asm volatile(
"vld1.32 d0, [%0], #8 \n"
"vst1.32 d0, [%1]" : : "r" (a), "r" (out): "d0", "r5");
where "a" is the array and "out" is an pointer to an integer (for debugging).
4.) After I load a value from the array I need to shift it to the right but it doesn't seem to work:
vshr.u32 d0, d0, #24 // C code: x >> 24;
5.) Is it possible to only load 1 Byte in a Neon register so that I don't have to shift/mask something to get only the one Byte i need?
6.) I need to use Inline assembler, but I am not sure what the last line is for:
input list : output list : what is this for?
7.) Do you know any good NEON References with code examples?
The Program should run on an Samsung Galaxy S2, cortex-A9 Processor if that makes any difference. Thanks for the help.
----------------edit-------------------
That is what i found out:
- It will always load the full Register (64Bit)
- You can use the "vmov" instruction to transfer part of a neon register to an arm register.
- The offset should be in an arm register and will be added to the base address after the memory access.
- It is the "clobbered reg list". Every Register that is used and neither in the input or output list, should be written here.