I'm trying to load an array of char values into NEON registers, and then treat them as 16-bit or 32-bit integer values. So something like this...
void SubVector(short* c, const unsigned char* a, const unsigned char* b, int n)
{
for(int i = 0; i < n; i++)
{
c[i] = (short)a[i] - (short)b[i];
}
}
I'm not sure how to load the data. Should I load the 8-bit data into lanes, and then reinterpret the registers as shorts? Or load and convert? What would be the fastest way?
Does anyone have a example on how they would do this with NEON intrinsics?
Thanks!