I'm trying to reverse the order of a 128 bit vector (uint16x8).
For example if I have
a b c d e f g h
I would like to obtain
h g f e d c b a
Is there a simple way to do that with NEON intrinsics? I tried with the VREV but it doesn't work.
You want vrev64.16
instruction however it doesn't swap between double registers of a single quad register. You need to achieve that using an additional vswp
.
For intrinsics
q = vrev64q_u16(q)
should do the trick for swapping inside double words, then you need to swap double words in quad register. However that gets cumbersome since there is no vswp
intrinsics directly which forces you to use something like
q = vcombine_u16(vget_high_u16(q), vget_low_u16(q))
which actually ends up as a vswp
instruction.
See below for an example.
#include <stdio.h>
#include <stdlib.h>
#include <arm_neon.h>
int main() {
uint16_t s[] = {0x101, 0x102, 0x103, 0x104, 0x105, 0x106, 0x107, 0x108};
uint16_t *t = malloc(sizeof(uint16_t) * 8);
for (int i = 0; i < 8; i++) {
t[i] = 0;
}
uint16x8_t a = vld1q_u16(s);
a = vrev64q_u16(a);
a = vcombine_u16(vget_high_u16(a), vget_low_u16(a));
vst1q_u16(t, a);
for (int i = 0; i < 8; i++) {
printf("0x%3x ", t[i]);
}
printf("\n");
return 0;
}
which generates an assembly like below
vld1.16 {d16-d17}, [sp:64]
movs r4, #0
vrev64.16 q8, q8
vswp d16, d17
vst1.16 {d16-d17}, [r5]
and outputs
$ rev
0x108 0x107 0x106 0x105 0x104 0x103 0x102 0x101
vrev
? – auselen