I am trying to add two advanced SIMD vector inside my assembly code. Here, I have two vectors v0 and v1 and I want to add upper half of v0 with lower half of v1 and put the result in upper half of v0. Performance is critical in my code, so I am trying to find a way that I can do this with one addition instruction. I know that I can move the upper half into another register and simply use UADDL instruction.
In AArch32 NEON instruction set, it can be done using Dn instead of Qn. For example in my case it can be done as:
vqadd.u64 d1, d1, d2
Is there any way around that I can do this in AArch64 advanced SIMD instructions?
1
votes
You'll have to rearrange your code to avoid the situation. Can you post the code fragment to illustrate how you've got to the point of needing to do this?
- sh1
1 Answers
0
votes
As indicated by @sh1, you will need to rearrange some things.
The equivalent AArch64 instruction for vqadd is {sqadd or uqadd}. However, they will add, let's say, the 8 single bytes 0-7 in v0 to the 8 single bytes 0-7 in v1; which is not quite what you want. But if you can rearrange the load instruction of, let's say, v1 you can achieve the intended goal.
.data
array: .ascii "73167176531330624919225119674426574742355349194934"
...
ldr x20,=array // ptr
ld1 {v0.16b, v1.16b}, [x20] // load multiple 1-element structures to two consecutive elements
uqadd v0.8b,v1.8b,v0.8b
...
(gdb) p $v0.b.s
$14 = {7, 3, 1, 6, 7, 1, 7, 6, 5, 3, 1, 3, 3, 0, 6, 2}
(gdb) p $v1.b.s
$15 = {4, 9, 1, 9, 2, 2, 5, 1, 1, 9, 6, 7, 4, 4, 2, 6}
(gdb)
(gdb) p $v0.b.s
$26 = {11, 12, 2, 15, 9, 3, 12, 7, 0, 0, 0, 0, 0, 0, 0, 0}