3
votes

I need to compute the same operation as the SSE one:

__m128i result1=_mm_avg_epu8 (upper, lower);

With NEON I do the following:

uint8x16_t result1=vhaddq_u8(upper, lower);

The results should be the same but with the SSE instruction I obtain:

91cb c895 aaa3 b0d4 cfc0 c1b0 aac7 b9b9

whereas with the NEON instruction I obtain:

91ca c894 a9a2 b0d3 cec0 c1af aac7 b8b8 

I don't understand why the two results are different. Can you help me?

2

2 Answers

6
votes

The Neon "halving add" operation vhadd works like this:

A = (B + C) >> 1

whereas the SSE average intrinsic _mm_avg_epu8 does this:

A = (B + C + 1) >> 1

In other words Neon does a truncating average with its "halving add" operation, whereas SSE correctly rounds the result.

Fortunately there is a Neon instruction which rounds in the same way as SSE's _mm_avg_epu8 - it's called vrhadd - Vector Rounding Halving Add.

5
votes

You could use vrhadd [1] [2].

Vector rounding halving add: vrhadd -> Vr[i]:=(Va[i]+Vb[i]+1)>>1