UADDL vs UADDL2 in Aarch64 NEON

Question

NEON Assembly

I am trying to understand the arm-v8 NEON. Let me tell an example what I am trying to do.

I load 16 Bytes (pixels in uchar) from array A. Now I want to try "lengthening ADD" to ushort. From the documentation, I see UADDL and UADDL will do lengthening add for lower half and upper half of the source registers respectively. I could write following code to get it working:

ld1 {V10.16B}, [x0]

uaddl V11.8H, V10.8B, V10.8B    
uaddl2 V12.8H, V10.16B, V10.16B 

st1 {V11.8H}, [x1], #16 
st1 {V12.8H}, [x1], #16

NEON Intrinsics

Coming to NEON Intrinsics, Syntax is as follows: (Refer Page 8)

uint16x8_t vaddl_u8 (uint8x8_t a, uint8x8_t b)
uint16x8_t vaddl_high_u8 (uint8x16_t a, uint8x16_t b)

Here, input to both the functions are of different types.

So once I load a uint8x16_t variable, how am I supposed to pass this variable to vaddl_u8? Is there any casting that can I do? Or do I have to copy the lower half to another variable? (That means, it is an extra cost)

So my question is, how can I implement this piece of assembly code with NEON intrinsics?

UPDATE

I am using aarch64-linux-gnu-g++ (gcc version 5.4.0) in Ubuntu 16.04.

You can cast uint8x16_t to uint8x8_t for free, right? With a cast intrinsic, I think. Do that for the low half, and it should compile to the asm you'd hope for. — Peter Cordes
@PeterCordes : That's what I am searching for, but couldn't find it. — Abid Rahman K

Jake 'Alquimista' LEE Jake 'Alquimista' LEE · Accepted Answer · 2017-12-11T17:56:42

You should know that both uint8x16_t and uint8x8_t are different data types.

Below is what I would do:

uint8x16_t a, b, c;
uint8x8_t low, high;
.
.
.
a = vld1q_u8(pSrc);

low = vget_low_u8(a);
high = vget_high_u8(a);

b = vaddl_u8(low, low);
c = vaddl_u8(high, high);

vst1q_u8(pDst++, b);
vst1q_u8(pDst++, c);

BTW, may I ask where you have vaddl_high_u8 from???

The auto-completion on Android Studio 3.0.1 doesn't show it as a viable option.

UADDL vs UADDL2 in Aarch64 NEON

NEON Assembly

NEON Intrinsics

UPDATE

1 Answers