Converting a short array to floating point using ARM neon

Question

I've just started trying to optimised some android code using NEON. I'm having a few issues, however. The main issue is that I really can't work out how to do a quick 16-bit to float conversion.

I see its possible to convert multiple 32-bit ints to float in 1 SIMD instruction using vcvt.s32.f32. However how do I convert a set of 4 S16s to 4 S32s? I assume it has something to do with the VUZP instruction but I cannot figure out how...

Equally I see that its possible to use VCVT.s16.f32 to convert 1 16-bit to a float at a time but while this is helpful it seems very wasteful not to be able to do it using SIMD.

I've written assembler on many different platforms over the years but I find the ARM documentation completely unfathomable for some reason.

As such any help would be HUGELY appreciated.

Also is there any way to get the throughput and latency figures for the NEON unit?

Thanks in advance!

Not that familiar with NEON, but can't you "widen" the 4 shorts to 4 ints and then convert? Looking at GCCs intrinsics I think maybe vaddl.s16 with a zero second operand might do it. — user786653
Yup .. that seems to work. Can't believe i didn't notice that instruction .. — Goz

Anoop K. Prabhu Anoop K. Prabhu · Accepted Answer · 2011-10-18T12:45:59

If no other computation is to be done along with the conversion from 16bit integer to 32bit integer you can go for uint32x4_t = vmovl_u16 (uint16x4_t)

If any simple addition or multiplication etc is being performed before the conversion, you can combine them in a single instruction like int32x4_t = vmull_u16 (int16x4_t, int16x4_t) or int32x4_t = vaddl_u16 (int16x4_t, int16x4_t) etc and thus saving some amount of cycles.

Converting a short array to floating point using ARM neon

3 Answers