Bare-metal ARM Raspberry Pi + qemu strange behavior with floating point division

Question

I'm currently teaching myself bare-metal ARM kernel development, I've settled on using the Raspberry Pi 2 as a target platform on the basis of being well documented. I'm currently emulating the device using qemu. In a function called by my kernel I'm required to divide a numerical constant by a function argument and store the result as a floating point number for future calculations. Calling this function causes qemu to go off the rails. Here's the function itself ( setting PL011 baud rate ):

void pl011_set_baud_rate(pl011_uart_t *uart, uint32_t baud_rate) {
    float divider = PL011_UART_CLOCK / (16.0f * baud_rate);
    uint16_t integer_divider = (uint16_t)divider;
    uint8_t fractional_divider = ((divider - integer_divider) * 64) + 0.5;
    mmio_write(uart->IBRD, integer_divider);        // Integer baud rate divider
    mmio_write(uart->FBRD, fractional_divider);     // Fractional baud rate divider
};

I'd post a minimal verifiable example, but just about anything will trigger the issue. If you even use:

void test(uint32_t test_var) {
    float test_div = test_var / 16;
    (void)test_div;    // squash [-Wunused-variable] warnings
    // goes off the rails here
};

You'll get the same result.

Stepping through the function in gdb, stepping past float divider... will cause qemu to jump out of the function and head straight to the halt loop in my bootloader code ( for when the kernel main returns )

Checking info args in gdb shows the correct arguments. Checking info locals will show the correct value for float divider. Checking info stack shows the correct stack trace and arguments. Initially I suspected sp might be in the wrong place, but that doesn't check out since the stack trace looks normal. ( for bare-metal )

(gdb) info stack
#0  pl011_set_baud_rate (uart=0x3f201000, baud_rate=115200) at kernel/uart/pl011.c:23
#1  0x0000837c in pl011_init (uart=0x3f201000) at kernel/uart/pl011.c:49
#2  0x0000806c in uart_init () at kernel/uart/uart.c:12
#3  0x00008030 in kernel_init (r0=0, r1=0, atags=0) at kernel/boot/start.c:10
#4  0x00008008 in _start () at kernel/boot/boot.S:6
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb)

Here's the register dump from right before the line that causes the unpredictable behavior:

r0             0x3f201000       1059065856
r1             0x1c200  115200
r2             0x7ff    2047
r3             0x0      0
r4             0x0      0
r5             0x0      0
r6             0x0      0
r7             0x0      0
r8             0x0      0
r9             0x0      0
r10            0x0      0
r11            0x7fcc   32716
r12            0x0      0
sp             0x7fb0   0x7fb0
lr             0x837c   33660
pc             0x8248   0x8248 <pl011_set_baud_rate+20>
cpsr           0x600001d3       1610613203

My Makefile is:

INCLUDES=include
INCLUDE_PARAMS=$(foreach d, $(INCLUDES), -I$d)

CC=arm-none-eabi-gcc

C_SOURCES:=kernel/boot/start.c kernel/uart/uart.c kernel/uart/pl011.c
AS_SOURCES:=kernel/boot/boot.S

SOURCES=$(C_SOURCES)
SOURCES+=$(AS_SOURCES)

OBJECTS=
OBJECTS+=$(C_SOURCES:.c=.o)
OBJECTS+=$(AS_SOURCES:.S=.o)


CFLAGS=-std=gnu99 -Wall -Wextra -fpic -ffreestanding -mcpu=cortex-a7 -mfpu=neon-vfpv4 -mfloat-abi=hard
LDFLAGS=-ffreestanding -nostdlib

LIBS=-lgcc

DEBUG_FLAGS=

BINARY=kernel.bin

.PHONY: all clean debug

all: $(BINARY)

debug: DEBUG_FLAGS += -ggdb
debug: $(BINARY)

$(BINARY): $(OBJECTS)
    $(CC) -T linker.ld $(LDFLAGS) $(LIBS) $(OBJECTS) -o $(BINARY)

%.o: %.c
    $(CC) $(INCLUDE_PARAMS) $(CFLAGS) $(DEBUG_FLAGS) -c $< -o $@

%.o: %.S
    $(CC) $(INCLUDE_PARAMS) $(CFLAGS) $(DEBUG_FLAGS) -c $< -o $@

clean:
    rm $(BINARY) $(OBJECTS)

As you can see I'm linking against lgcc, and using -mfpu=neon-vfpv4 -mfloat-abi=hard, so at very least gcc should supply it's own floating point division functions from lgcc.

Can anyone point me in the right direction for debugging this issue? I suspect I'm either using the incorrect compiler arguments and not loading the correct function for floating-point division, or there's some issue with the stack.

Can anyone shed any insight here?

I haven't touched the floating point unit in ARM in kernel mode, but, in x86 it is such a nuisance due to lazy context switching and all that you'd rather go through the extra mile to calculate using integers instead. — Antti Haapala
It'd require some pretty complex gymnastics to avoid using division here! You need to perform that arbitrary calculation on the fractional remainder of the division and then store it. I could get around all of this by just hardcoding the values, but I'm trying to avoid that too. — ajxs
did you enable the floating point unit before you did this? this is baremetal yes? you have to enable the floating point coprocessors first, I think there is a bit for single and a bit for double... — old_timer
mrc p15, 0, r0, c1, c0, 2 orr r0,r0,#0x300000 ;@ single precision orr r0,r0,#0xC00000 ;@ double precision mcr p15, 0, r0, c1, c0, 2 — old_timer
on the original pi, being a completely different core means that may not work at all. trying to find if/where I enabled it on an armv7. there is a really good bare metal forum at the pi site BTW where many/most have been through this. — old_timer

old_timer old_timer · Accepted Answer · 2018-01-08T15:59:01

Did you check to see that the fpu coprocessor(s) were enabled?

On the original pi1/pi-zero I use this

;@ enable fpu
mrc p15, 0, r0, c1, c0, 2
orr r0,r0,#0x300000 ;@ single precision
orr r0,r0,#0xC00000 ;@ double precision
mcr p15, 0, r0, c1, c0, 2
mov r0,#0x40000000
fmxr fpexc,r0

the last couple of lines were probably there to intentionally crash if it didnt work.

you may have an armv7 or an armv8 core in the pi2 unfortunately as there are two variations. I suspect either way the specific register and instructions may vary from those above for the armv6 based raspberry pi.

Bare-metal ARM Raspberry Pi + qemu strange behavior with floating point division

2 Answers