ARM, VFP, floating-point, lazy context switching

Question

I am writing an operating system for an ARM processor (Cortex-A9).

I am trying to implement lazy context switching of the floating-point registers. The idea behind this is that the floating-point extension is initially disabled for a thread and so there is no need to save floating-point context on a task-switch.

When a thread attempts to use a floating-point instruction, it triggers an exception. The operating system then enables floating-point extension and knows that floating-point context must be saved for this thread in the next context switches. Then the floating-point instruction is re-executed.

My problem is that the compiler generates floating-point instructions even when no floating-point operations are used in c code. This is an example of a disassembly of a function that uses no floating point in c:

10002f5c <rmtcpy_from>:
10002f5c:   e1a0c00d    mov ip, sp
10002f60:   e92ddff0    push    {r4, r5, r6, r7, r8, r9, sl, fp, ip, lr, pc}
10002f64:   e24cb004    sub fp, ip, #4
10002f68:   ed2d8b02    vpush   {d8}
...
10002f80:   ee082a10    vmov    s16, r2
...
10002fe0:   ee180a10    vmov    r0, s16
...
1000308c:   ecbc8b02    vldmia  ip!, {d8}
...

When I have many of such functions, lazy context switching makes no sense.

Does anybody know how to tell the compiler that floating-point instructions should only be generated when there is a floating point operation in the c code ?

I use gcc 9.2.0. The floating point options are: -mhard-float -mfloat-abi=hard -mfpu=vfp

Here is a example c function (not useable, only a demo):

void func(char *a1, char *a2, char *a3);
int bar_1[1], foo_1, foo_2;

void fpu_test() {
    int oldest_idx = -1;
    while (1) {
        int *oldest = (int *)0;
        int idx = oldest_idx;
        for (int i = 0; i < 3; i++) {
            if (++idx >= 3)
                idx = 0;
            int *lec = &bar_1[idx];
            if (*lec) {
                if (*lec - *oldest < 0) {
                    oldest = lec;
                    oldest_idx = idx;
                }
            }
        }
        if (oldest) {
            foo_1++;
            if (foo_2)
                func("1", "2", "3");
        }
    }
}

gcc command line:

$HOME/devel/opt/cross-musl/bin/arm-linux-musleabihf-gcc  -O2 -march=armv7-a -mtune=cortex-a9 -mhard-float -mfloat-abi=hard -mfpu=vfp -Wa,-ahlms=fpu_test.lst -mapcs-frame -c fpu_test.c -o fpu_test.o

Assembler listing:

...
  35 0000 0DC0A0E1      mov ip, sp
  36 0004 003000E3      movw    r3, #:lower16:foo_2
  37 0008 F0DF2DE9      push    {r4, r5, r6, r7, r8, r9, r10, fp, ip, lr, pc}
  38 000c 006000E3      movw    r6, #:lower16:foo_1
  39 0010 003040E3      movt    r3, #:upper16:foo_2
  40 0014 04B04CE2      sub fp, ip, #4
  41 0018 006040E3      movt    r6, #:upper16:foo_1
  42 001c 004000E3      movw    r4, #:lower16:bar_1
  43 0020 028B2DED      vpush.64    {d8}                <=== this is the problem
...

all of those options are telling the compiler to build for floating point, do you have an example/minimal C function that demonstrates the problem and a full gcc command line (or enough to demonstrate the problem)? — old_timer
I updated my post with a example c function with the gcc command line and part of the assembler listing — ErwinP
-mapcs-frame Generate a stack frame that is compliant with the ARM Procedure Call Standard for all functions, even if this is not strictly necessary for correct execution of the code. Specifying -fomit-frame-pointer with this option causes the stack frames not to be generated for leaf functions. The default is -mno-apcs-frame. This option is deprecated. — old_timer
if I remove that then this push goes away (as does the stack frame which is a waste of a register) — old_timer
removing -mapcs-frame works for simple functions. In my project the number of vpush instructions was reduced, but they are still there. — ErwinP

Eric Postpischil Eric Postpischil · Accepted Answer · 2020-05-27T05:29:35

GCC has a command-line switch for this, -mgeneral-regs-only.. When using the command-line switch, you may need to separate code that deliberately uses floating-point registers or operations into separate source files so that it can be compiled without that switch.

As of GCC 9.3 (perhaps 9?), for ARM targets, this is available as a function attribute:

void MyFunction(char *MyParameter) __attribute__ ((general-regs-only));

Putting the attribute after the declaration is an older syntax and required a non-definition declaration. Testing suggests GCC now accepts an attribute before the declarator and may be used with a definition:

void __attribute__ ((general-regs-only)) MyFunction(char *MyParameter)
{...}

You may also be able to negate the attribute with __attribute__ ((nogeneral-regs-only)), although I do not see this documented.

This can also be controlled with a pragma.

There are also +nofp options within the -march and -mcpu switches, but I think -mgeneral-regs-only is what you want.

ARM, VFP, floating-point, lazy context switching

2 Answers