

I have been working on writing my own bare metal code for a Raspberry PI as I build up my bare metal skills and learn about kernel mode operations. However, due to the complexity, amount of documentation errors, and missing/scattered info, it has been extremely difficult to finally bring up a custom kernel on the Raspberry PI. However, I finally got that working.

A very broad overview of what is happening in the bootstrap process

My kernel loads into 0x80000, sends all cores except core 0 into an infinite loop, sets up the Stack Pointer, and calls a C function. I can setup the GPIO pins and turn them on and off. Using some additional circuitry, I can drive LEDs and confirm that my code is executing.

The problem

However when it comes to the UART, I have hit a wall. I am using UART0 (PL011). As far as I can tell, the UART is not outputting, although I could be missing it on my scope since I only have an analog oscilloscope. The code gets stuck when outputting the string. I have determined through hours of reflashing my SD card with different YES/NO questions to my LEDs that it is stuck in an infinite loop waiting for the UART Transmit FIFO Full flag to clear. The UART only accepts 1 byte before becoming full. I can not figure out why it is not transmitting the data out. I am also not sure if I have correctly set my baud-rate, but I don't think that would cause the TX FIFO to stay filled.

Getting a foothold in the code

Here is my code. The execution begins at the very beginning of the binary. It is constructed by being linked with symbol "my_entry_pt" from assembly source "entry.s" in the linker script. That is where you will find the entry code. However, you probably only need to look at the last file which is the C code in "base.c". The rest is just bootstrapping up to that. Please disregard some comments/names which don't make sense. This is a port (of primarily the build infrastructure) from an earlier bare-metal project of mine. That project used a RISC-V development board which uses a memory mapped SPI flash to store the binary code of the program.:


TUPLE   := aarch64-unknown-linux-gnu
CC      := $(TUPLE)-gcc
OBJCPY  := $(TUPLE)-objcopy
STRIP   := $(TUPLE)-strip
CFLAGS  := -Wall -Wextra -std=c99 -O2 -march=armv8-a -mtune=cortex-a53 -mlittle-endian -ffreestanding -nostdlib -nostartfiles -Wno-unused-parameter -fno-stack-check -fno-stack-protector
LDFLAGS := -static

# Global Library

# Kernel
#  - Core (Entry/System Setup/Globals)
KFILES  := $(KFILES) ./src/kernel/base.o
KFILES  := $(KFILES) ./src/kernel/entry.o

# Programs
#  - Init

export TUPLE
export CC
export OBJCPY
export STRIP
export CFLAGS
export LDFLAGS
export GFILES
export KFILES
export UFILES

.PHONY: all rebuild clean

all: prog-metal.elf prog-metal.elf.strip prog-metal.elf.bin prog-metal.elf.hex prog-metal.elf.strip.bin prog-metal.elf.strip.hex

rebuild: clean
    $(MAKE) all

    rm -f *.elf *.strip *.bin *.hex $(GFILES) $(KFILES) $(UFILES)

%.o: %.c
    $(CC) $(CFLAGS) $^ -c -o $@

%.o: %.s
    $(CC) $(CFLAGS) $^ -c -o $@

prog-metal.elf: $(GFILES) $(KFILES) $(UFILES)
    $(CC) $(CFLAGS) $^ -T ./bare_metal.ld $(LDFLAGS) -o $@

prog-%.elf.strip: prog-%.elf
    $(STRIP) -s -x -R .comment -R .text.startup -R .riscv.attributes $^ -o $@

%.elf.bin: %.elf
    $(OBJCPY) -O binary $^ $@

%.elf.hex: %.elf
    $(OBJCPY) -O ihex $^ $@

%.strip.bin: %.strip
    $(OBJCPY) -O binary $^ $@

%.strip.hex: %.strip
    $(OBJCPY) -O ihex $^ $@

emu: prog-metal.elf.strip.bin
    qemu-system-aarch64 -kernel ./prog-metal.elf.strip.bin -m 1G -cpu cortex-a53 -M raspi3 -serial stdio -display none

emu-debug: prog-metal.elf.strip.bin
    qemu-system-aarch64 -kernel ./prog-metal.elf.strip.bin -m 1G -cpu cortex-a53 -M raspi3 -serial stdio -display none -gdb tcp::1234 -S

    $(TUPLE)-gdb -ex "target remote localhost:1234" -ex "layout asm" -ex "tui reg general" -ex "break *0x00080000" -ex "break *0x00000000" -ex "set scheduler-locking step"


This is not actually needed (At least not on actual hardware.), but 
it explicitly sets the entry point in the .elf file to be the same 
as the true entry point to the program. The global symbol my_entry_pt
is located at the start of src/kernel/entry.s.  More on this below.

    This is the memory address where this program will reside.
    It is the reset vector.
    ram (rwx)  : ORIGIN = 0x00080000, LENGTH = 0x0000FFFF

    Force the linker to starting at the start of memory section: ram
    . = 0x00080000;
    .text : {
        Make sure the .text section from src/kernel/entry.o is 
        linked first.  The .text section of src/kernel/entry.s 
        is the actual entry machine code for the kernel and is 
        first in the file.  This way, at reset, exection starts 
        by jumping to this machine code.
        src/kernel/entry.o (.text);
        Link the rest of the kernel's .text sections.
        *.o (.text);
    } > ram
    Put in the .rodata in the flash after the actual machine code.
    .rodata : {
        *.o (.rodata);
        *.o (.rodata.*);
    } > ram
    END: Read Only Data
    START: Writable Data
    .sbss : {
        *.o (.sbss);
    } > ram
    .bss : {
        *.o (.bss);
    } > ram
    section_KHEAP_START (NOLOAD) : ALIGN(0x10) {
        At the very end of the space reserved for global variables 
        in the ram, link in this custom section.  This is used to
        add a symbol called KHEAP_START to the program that will 
        inform the C code where the heap can start.  This allows the 
        heap to start right after the global variables.
        src/kernel/entry.o (section_KHEAP_START);
    } > ram
    Discard everything that hasn't be explictly linked.  I don't
    want the linker to guess where to put stuff.  If it doesn't know, 
    don't include it.  If this casues a linking error, good.  I want 
    to know that I need to fix something, rather than a silent failure 
    that could cause hard to debug issues later.  For instance, 
    without explicitly setting the .sbss and .bss sections above, 
    the linker attempted to put my global variables after the 
    machine code in the flash.  This would mean that ever access to 
    those variables would mean read a write to the external SPI flash 
    IC on real hardware.  I do not believe that initialized globals 
    are possible since there is nothing to initialize them.  So I don't
    want to, for instance, include the .data section.
    /DISCARD/ : {
        * (.*);


.section .text

.globl my_entry_pt

// This is the Arm64 Kernel Header (64 bytes total)
  b end_of_header // Executable code (64 bits)
  .align 3, 0, 7
  .quad my_entry_pt // text_offset (64 bits)
  .quad 0x0000000000000000 // image_size (64 bits)
  .quad 0x000000000000000A // flags (1010: Anywhere, 4K Pages, LE) (64 bits)
  .quad 0x0000000000000000 // reserved 2 (64 bits)
  .quad 0x0000000000000000 // reserved 3 (64 bits)
  .quad 0x0000000000000000 // reserved 4 (64 bits)
  .int 0x644d5241 // magic (32 bits)
  .int 0x00000000 // reserved 5 (32 bits)

  // Check What Core This Is
  mrs x0, VMPIDR_EL2
  and x0, x0, #0x3
  cmp x0, #0x0
  // If this is not core 0, go into an infinite loop
  bne loop

  // Setup the Stack Pointer
  mov x2, #0x00030000
  mov sp, x2
  // Get the address of the C main function
  ldr x1, =kmain
  // Call the C main function
  blr x1

  b loop

.section section_KHEAP_START




void pstr(char* str) {
    volatile unsigned int* AUX_MU_IO_REG = (unsigned int*)(0x3f201000 + 0x00);
    volatile unsigned int* AUX_MU_LSR_REG = (unsigned int*)(0x3f201000 + 0x18);
    while (*str != 0) {
        while (*AUX_MU_LSR_REG & 0x00000020) {
            // TX FIFO Full
        *AUX_MU_IO_REG = (unsigned int)((unsigned char)*str);

signed int kmain(unsigned int argc, char* argv[], char* envp[]) {
    char* text = "Test Output String\n";
    volatile unsigned int* AUXENB = 0;
    //AUXENB = (unsigned int*)(0x20200000 + 0x00);
    //*AUXENB |= 0x00024000;
    //AUXENB = (unsigned int*)(0x20200000 + 0x08);
    //*AUXENB |= 0x00000480;

    // Set Baud Rate to 115200
    AUXENB = (unsigned int*)(0x3f201000 + 0x24);
    *AUXENB = 26;
    AUXENB = (unsigned int*)(0x3f201000 + 0x28);
    *AUXENB = 0;

    AUXENB = (unsigned int*)(0x3f200000 + 0x04);
    *AUXENB = 0;
    // Set GPIO Pin 14 to Mode: ALT0 (UART0)
    *AUXENB |= (04u << ((14 - 10) * 3));
    // Set GPIO Pin 15 to Mode: ALT0 (UART0)
    *AUXENB |= (04u << ((15 - 10) * 3));

    AUXENB = (unsigned int*)(0x3f200000 + 0x08);
    *AUXENB = 0;
    // Set GPIO Pin 23 to Mode: Output
    *AUXENB |= (01u << ((23 - 20) * 3));
    // Set GPIO Pin 24 to Mode: Output
    *AUXENB |= (01u << ((24 - 20) * 3));

    // Turn ON Pin 23
    AUXENB = (unsigned int*)(0x3f200000 + 0x1C);
    *AUXENB = (1u << 23);

    // Turn OFF Pin 24
    AUXENB = (unsigned int*)(0x3f200000 + 0x28);
    *AUXENB = (1u << 24);

    // Enable TX on UART0
    AUXENB = (unsigned int*)(0x3f201000 + 0x30);
    *AUXENB = 0x00000101;


    // Turn ON Pin 24
    AUXENB = (unsigned int*)(0x3f200000 + 0x1C);
    *AUXENB = (1u << 24);

    return 0;
Have you set the ALT mode of the GPIO pins you use for serial communication (should be pins 0 and 1, I guess)? I have seen some weird behavior if the multiplexer for these pins is not set correctly.PMF
GPIO Pins 0 and 1 are for the EEPROM Hat interface. I have set the GPIO Pin modes for the UART on the UART TX and RX pins. Those are GPIO Pins 14 and 15. You can see that where I have the comment "// Set GPIO Pin 14 to Mode: ALT0 (UART0)" in my C code. Do you mean that I should also enable the other UART pins like the RTS and CTS? I am also starting to wonder if this problem has something to do with the the implementation on the RPI. It has a bunch of settings in the boot-loader that would seem to indicate some kind of strange implementation.Echelon X-Ray
My bad, I confused the numbers (on the Pi4, the GPIO pins 0 and 1 can also be used as UART if not used for a Hat). Comparing your code to the documentation, I would also say that it should work.PMF
I appreciate your help. I need to sleep. I did just have an idea, but I'll need to look into it further. I'll post an update tomorrow.Echelon X-Ray

2 Answers


Debugging up the this point

So it turns out that all of us were right. My initial trouble shooting in response to @Xiaoyi Chen was wrong. I rebooted back into Raspberry Pi OS to check on a hunch. I was connected to the PI using a 3.3V UART adapter connected to pins 8 (GPIO 14, UART0 TX), 10 (GPIO 15, UART0 RX), and GND(for a common ground of course). I could see the boot messages and a getty login prompt which I could log into. I figured that meant that the PL011 was working, but when I actually checked the process list in htop, I found that getty was actually running on /dev/ttyS0 not /dev/ttyAMA0. /dev/ttyAMA0 was actually being tied to the bluetooth module with the hciattach command in another process listing.

According to the documentation here: https://www.raspberrypi.org/documentation/configuration/uart.md , /dev/ttyS0 is the mini UART while /dev/AMA0 is the PL011, but it also says that UART0 is PL011 and UART1 is the mini UART. Furthermore, the GPIO Pinouts and the BCM2835 documentation say that GPIO Pins 14 and 15 are for the UART0 TX and RX. So something did not add up if I can see the login prompt on pins 14 and 15 when Linux is using the mini UART, but I am supposedly physically connected to the PL011. If I SSHed in and tried to open /dev/ttyAMA0 with minicom, I could see nothing happening. However, if I did the same with /dev/ttyS0, it would conflict with the login terminal. This confirmed to me that /dev/ttyS0 was in fact in use for the boot console.

The Answer

If I set "dtoverlay=disable-bt" in config.txt, the above behavior changed to match expectations. Rebooting the PI made it once again come up with a console on header pins 8 and 10, but checking the process listing showed that this time getty was using /dev/ttyAMA0. If then set "dtoverlay=disable-bt" in config.txt with my custom kernel, the program executed as expected, printing out my string and turned on the second LED. Since the outputs of the PL011 were never actually setup, since it was redirected by some magic, it makes sense that it would not be working as @PMF suggested. This whole deal has just reaffirmed my assertion that the documentation for this so-called "learning computer" is atrocious.

For those who are curious, here are the last few lines from my config.txt:


Remaining Questions

A few things still bother me. I could have sworn that I already tried setting "dtoverlay=disable-bt".

Secondly it does seem that this preforms some kind of magic under the hood that is not documented(I know of no documentation for it.) and I do not understand. I can find nothing in the published schematics that redirect the output of GPIO 14 and 15 from the SOC. So ether the schematics are incomplete or there is some proprietary magic taking place inside the SOC which redirects the pins, contradicting the documentation.

I also have questions about precedence when it comes to the config.txt options and setting things up elsewhere.

Anyway, thank you for the help everyone.


My suggestion:

  • flash your SD card to a rpi distribution to make sure the hardware is still working
  • if the hardware is good, check the difference of your code with the in-kernel serial driver