I'm a little stuck! I would like to optimize the following code with ARM NEON but I'm not sure how to do it.
uint8_t* srcPtr = src->get();
uint8_t* dstPtr = dst->get();
int i;
for(i=0; i< SIZE; i++){
dstPtr++ = srcPtr[0];
dstPtr++ = srcPtr[1];
dstPtr++ = srcPtr[0];
dstPtr++ = srcPtr[1];
dstPtr++ = srcPtr[0];
dstPtr++ = srcPtr[1];
srcPtr+= 2;
}
Say if the srcPtr in uint8_t contains
0 1 2 3
the dstPtr would be
0 1 0 1 0 1 2 3 2 3 2 3
Can someone please help me ?