1
votes

I disassembled an arm binary previously compiled with neon flags:

-mcpu=cortex-a9 -mfpu=neon -mfloat-abi=softfp -ftree-vectorize

The dump shows a vdiv.f64 instruction generated by the compiler. According to the arm manual for armv7 (cortex-a9) neon simd isa does not support vdiv instruction but the floating point (vfp) engine does. Why is this instruction generated? is it then a floating point instruction that will be executed by the vfp? Both neon and VFP support addition and multiplication for floating point so how can I differenciate them from eahc other?

1
Yes, this is a VFP instruction. You can easily see this, because AArch32 neon doesn't work on 64-bit floating-point at all.EOF
thank you for your answer, but what if I see a "vadd" generated by the compiler, how can I know if it is it a NEON or a VFP instruction since both engines implement this instruction? I am working with an arm cortex-a9 processor and the arm-none-linux-gnueabi* toolchain.raul garcia
neon uses register names D[n] and Q[n] and instruction-postfixes F32 (and I[n] for integer instructions), VFP uses S[n] and D[n] and instruction-postfixes F64 or F32. It turns out that the combination is unambiguous.EOF

1 Answers

2
votes

In the case of Cortex-A9, the NEON FPU option also implements VFP; it is a superset of the cut-down 16-register VFP-only FPU option.

More generally, the architecture does not allow implementing floating-point Advanced SIMD without also implementing at least single-precision VFP, therefore GCC's -mfpu=neon implies VFPv3 as well. It is permissible to implement integer-only Advanced SIMD without any floating-point capability at all, but I'm not sure GCC can support that (or that anyone's ever built such a thing).

The actual VFP and Advanced SIMD variants of instructions are unambiguous from the syntax - anything operating on double-precision data (i.e. <op>.F64) is obviously VFP, as Advanced SIMD doesn't support double-precision. Single precision operations (i.e. <op>.F32) operating on 32-bit s registers are scalar, thus VFP; if they're operating on larger 64-bit d or 128-bit q registers, then they are handling multiple 32-bit values at once, thus are vectorised Advanced SIMD instructions.