I need to compute the same operation as the SSE one:
__m128i result1=_mm_avg_epu8 (upper, lower);
With NEON I do the following:
uint8x16_t result1=vhaddq_u8(upper, lower);
The results should be the same but with the SSE instruction I obtain:
91cb c895 aaa3 b0d4 cfc0 c1b0 aac7 b9b9
whereas with the NEON instruction I obtain:
91ca c894 a9a2 b0d3 cec0 c1af aac7 b8b8
I don't understand why the two results are different. Can you help me?