I am looking to do some NEON manual code optimization using inline ASM neon instructions inside C++ functions, target is ARM Cortex-A9 (i.MX6Q).
When it comes to making the correct flags for the compiler, I got a bit confused with -mfpu. My goal is to use the hard FPU with floating point operations and use NEON only with ASM code.
Is it safe to assume that by setting -mfpu=vfpv3, the NEON coprocessor is still accessible by calling ASM neon instructions?
By setting -mfpu=neon-fp16, will the FPU core be unused?
Will the FPU outperform NEON when it comes to making non vectorized floating point operations?