Floating point support on ARM Linux distributions is not trivial. Because of that you should use a toolchain matching your system that is operating system & hardware and use the right compile switches.
First thing you need to understand ARM's calling convention which is about "how arguments are passed when you call a function?". ARM being a RISC architecture, can only work on registers. There are no instructions manipulating memory directly. If you need to change a value in memory you first need to load it to a register, modify it, then you need to store it back on the memory.
When you call a function you may need to pass arguments to it, you can put arguments on stack (memory) but since ARM can only work with registers first thing your function would probably do will be loading them back to registers. To avoid this waste ARM calling convention uses registers to pass arguments. However since ARM has a limited number of registers, calling convention also dictates you to use only first four (r0-r3) registers for the first four arguments, remaining must still use stack to be passed.
Second thing is early ARM cores didn't have any floating point support, operations where implemented in software. (This is what is still supported via gcc's -mfloat-abi=soft
.)
We can easily demonstrate what this means via following snippet.
float pi2(float a) {
return a * 3.14f;
}
Compiling this via -c -O3 -mfloat-abi=soft
and obdump
ing gives us
00000000 <pi2>:
0: f24f 51c3 movw r1, #62915 ; 0xf5c3
4: b508 push {r3, lr}
6: f2c4 0148 movt r1, #16456 ; 0x4048
a: f7ff fffe bl 0 <__aeabi_fmul>
e: bd08 pop {r3, pc}
As you can see (actually it is not visible :) ) pi2
gets its parameter in r0
, populates pi constant
on r1
and uses __aeabi_fmul
to multiply those and return result in r0
. Since __aeabi_fmul
also uses same calling convention, details about r0
is not visible. All our function does to populate r1
and delegate it to __aeabi_fmul
.
When floating hardware support added to ARM (again because of architecture style), it came with its own set of registers (s0, s1, ...).
If we compile same snippet with -c -O3 -mfloat-abi=softfp
and dump we get
00000000 <pi2>:
0: eddf 7a04 vldr s15, [pc, #16] ; 14 <pi2+0x14>
4: ee07 0a10 vmov s14, r0
8: ee27 7a27 vmul.f32 s14, s14, s15
c: ee17 0a10 vmov r0, s14
10: 4770 bx lr
12: bf00 nop
14: 4048f5c3 .word 0x4048f5c3
As you can see now compiler doesn't create a call to __aeabi_fmul
but instead it creates a vmul.f32
instruction after it moves argument located in r0
to s14
and populates 3.14
on s15
. After multiplication instruction it moves result available in s14
back to r0
since any caller of this function would expect it because of the calling convention.
Now if you think pi2
as a library provided to you by some third party, you can understand that both soft and softfp implementations do the same thing for you and you can use them interchangeably. If system provides them for you, you wouldn't care if your app runs on a system with hardware floating point support or not. This was quite good to keep old software running on new hardware.
However while keeping compability this approach introduces the overhead of moving values between ARM registers and FP registers. This obviously effects performance and addressed by a new calling convention, called hard
by gcc
. This new convention states that if you have floating point arguments in your function you can utilize floating point registers interleaved with normal ones, as well as you can return floating point values in floating point register s0
.
Again if we compile our snippet with -c -O3 -mfloat-abi=hard
and dump we get
00000000 <pi2>:
0: eddf 7a02 vldr s15, [pc, #8] ; c <pi2+0xc>
4: ee20 0a27 vmul.f32 s0, s0, s15
8: 4770 bx lr
a: bf00 nop
c: 4048f5c3 .word 0x4048f5c3
You can see there is no registers getting moved around. Argument to pi2
gets passed in s0
, compiler created code to populate 3.14
in s15
and uses vmul.f32 s0, s0, s15
to get result we want in s0
.
Big problem with this new convention is while you improve the code produced by compiler you completely kill compability. You can't expect an application built with hard
convention to work with libraries built for soft/softfp
and an application built for softfp won't work with libraries built for hard.
For more information on calling conventions you should check ARM's website.