Introduction
I have been working on writing my own bare metal code for a Raspberry PI as I build up my bare metal skills and learn about kernel mode operations. However, due to the complexity, amount of documentation errors, and missing/scattered info, it has been extremely difficult to finally bring up a custom kernel on the Raspberry PI. However, I finally got that working.
A very broad overview of what is happening in the bootstrap process
My kernel loads into 0x80000, sends all cores except core 0 into an infinite loop, sets up the Stack Pointer, and calls a C function. I can setup the GPIO pins and turn them on and off. Using some additional circuitry, I can drive LEDs and confirm that my code is executing.
The problem
However when it comes to the UART, I have hit a wall. I am using UART0 (PL011). As far as I can tell, the UART is not outputting, although I could be missing it on my scope since I only have an analog oscilloscope. The code gets stuck when outputting the string. I have determined through hours of reflashing my SD card with different YES/NO questions to my LEDs that it is stuck in an infinite loop waiting for the UART Transmit FIFO Full flag to clear. The UART only accepts 1 byte before becoming full. I can not figure out why it is not transmitting the data out. I am also not sure if I have correctly set my baud-rate, but I don't think that would cause the TX FIFO to stay filled.
Getting a foothold in the code
Here is my code. The execution begins at the very beginning of the binary. It is constructed by being linked with symbol "my_entry_pt" from assembly source "entry.s" in the linker script. That is where you will find the entry code. However, you probably only need to look at the last file which is the C code in "base.c". The rest is just bootstrapping up to that. Please disregard some comments/names which don't make sense. This is a port (of primarily the build infrastructure) from an earlier bare-metal project of mine. That project used a RISC-V development board which uses a memory mapped SPI flash to store the binary code of the program.:
[Makefile]
TUPLE := aarch64-unknown-linux-gnu
CC := $(TUPLE)-gcc
OBJCPY := $(TUPLE)-objcopy
STRIP := $(TUPLE)-strip
CFLAGS := -Wall -Wextra -std=c99 -O2 -march=armv8-a -mtune=cortex-a53 -mlittle-endian -ffreestanding -nostdlib -nostartfiles -Wno-unused-parameter -fno-stack-check -fno-stack-protector
LDFLAGS := -static
GFILES :=
KFILES :=
UFILES :=
# Global Library
#GFILES := $(GFILES)
# Kernel
# - Core (Entry/System Setup/Globals)
KFILES := $(KFILES) ./src/kernel/base.o
KFILES := $(KFILES) ./src/kernel/entry.o
# Programs
# - Init
#UFILES := $(UFILES)
export TUPLE
export CC
export OBJCPY
export STRIP
export CFLAGS
export LDFLAGS
export GFILES
export KFILES
export UFILES
.PHONY: all rebuild clean
all: prog-metal.elf prog-metal.elf.strip prog-metal.elf.bin prog-metal.elf.hex prog-metal.elf.strip.bin prog-metal.elf.strip.hex
rebuild: clean
$(MAKE) all
clean:
rm -f *.elf *.strip *.bin *.hex $(GFILES) $(KFILES) $(UFILES)
%.o: %.c
$(CC) $(CFLAGS) $^ -c -o $@
%.o: %.s
$(CC) $(CFLAGS) $^ -c -o $@
prog-metal.elf: $(GFILES) $(KFILES) $(UFILES)
$(CC) $(CFLAGS) $^ -T ./bare_metal.ld $(LDFLAGS) -o $@
prog-%.elf.strip: prog-%.elf
$(STRIP) -s -x -R .comment -R .text.startup -R .riscv.attributes $^ -o $@
%.elf.bin: %.elf
$(OBJCPY) -O binary $^ $@
%.elf.hex: %.elf
$(OBJCPY) -O ihex $^ $@
%.strip.bin: %.strip
$(OBJCPY) -O binary $^ $@
%.strip.hex: %.strip
$(OBJCPY) -O ihex $^ $@
emu: prog-metal.elf.strip.bin
qemu-system-aarch64 -kernel ./prog-metal.elf.strip.bin -m 1G -cpu cortex-a53 -M raspi3 -serial stdio -display none
emu-debug: prog-metal.elf.strip.bin
qemu-system-aarch64 -kernel ./prog-metal.elf.strip.bin -m 1G -cpu cortex-a53 -M raspi3 -serial stdio -display none -gdb tcp::1234 -S
debug:
$(TUPLE)-gdb -ex "target remote localhost:1234" -ex "layout asm" -ex "tui reg general" -ex "break *0x00080000" -ex "break *0x00000000" -ex "set scheduler-locking step"
[bare_metal.ld]
/*
This is not actually needed (At least not on actual hardware.), but
it explicitly sets the entry point in the .elf file to be the same
as the true entry point to the program. The global symbol my_entry_pt
is located at the start of src/kernel/entry.s. More on this below.
*/
ENTRY(my_entry_pt)
MEMORY
{
/*
This is the memory address where this program will reside.
It is the reset vector.
*/
ram (rwx) : ORIGIN = 0x00080000, LENGTH = 0x0000FFFF
}
SECTIONS
{
/*
Force the linker to starting at the start of memory section: ram
*/
. = 0x00080000;
.text : {
/*
Make sure the .text section from src/kernel/entry.o is
linked first. The .text section of src/kernel/entry.s
is the actual entry machine code for the kernel and is
first in the file. This way, at reset, exection starts
by jumping to this machine code.
*/
src/kernel/entry.o (.text);
/*
Link the rest of the kernel's .text sections.
*/
*.o (.text);
} > ram
/*
Put in the .rodata in the flash after the actual machine code.
*/
.rodata : {
*.o (.rodata);
*.o (.rodata.*);
} > ram
/*
END: Read Only Data
START: Writable Data
*/
.sbss : {
*.o (.sbss);
} > ram
.bss : {
*.o (.bss);
} > ram
section_KHEAP_START (NOLOAD) : ALIGN(0x10) {
/*
At the very end of the space reserved for global variables
in the ram, link in this custom section. This is used to
add a symbol called KHEAP_START to the program that will
inform the C code where the heap can start. This allows the
heap to start right after the global variables.
*/
src/kernel/entry.o (section_KHEAP_START);
} > ram
/*
Discard everything that hasn't be explictly linked. I don't
want the linker to guess where to put stuff. If it doesn't know,
don't include it. If this casues a linking error, good. I want
to know that I need to fix something, rather than a silent failure
that could cause hard to debug issues later. For instance,
without explicitly setting the .sbss and .bss sections above,
the linker attempted to put my global variables after the
machine code in the flash. This would mean that ever access to
those variables would mean read a write to the external SPI flash
IC on real hardware. I do not believe that initialized globals
are possible since there is nothing to initialize them. So I don't
want to, for instance, include the .data section.
*/
/DISCARD/ : {
* (.*);
}
}
[src/kernel/entry.s]
.section .text
.globl my_entry_pt
// This is the Arm64 Kernel Header (64 bytes total)
my_entry_pt:
b end_of_header // Executable code (64 bits)
.align 3, 0, 7
.quad my_entry_pt // text_offset (64 bits)
.quad 0x0000000000000000 // image_size (64 bits)
.quad 0x000000000000000A // flags (1010: Anywhere, 4K Pages, LE) (64 bits)
.quad 0x0000000000000000 // reserved 2 (64 bits)
.quad 0x0000000000000000 // reserved 3 (64 bits)
.quad 0x0000000000000000 // reserved 4 (64 bits)
.int 0x644d5241 // magic (32 bits)
.int 0x00000000 // reserved 5 (32 bits)
end_of_header:
// Check What Core This Is
mrs x0, VMPIDR_EL2
and x0, x0, #0x3
cmp x0, #0x0
// If this is not core 0, go into an infinite loop
bne loop
// Setup the Stack Pointer
mov x2, #0x00030000
mov sp, x2
// Get the address of the C main function
ldr x1, =kmain
// Call the C main function
blr x1
loop:
nop
b loop
.section section_KHEAP_START
.globl KHEAP_START
KHEAP_START:
[src/kernel/base.c]
void pstr(char* str) {
volatile unsigned int* AUX_MU_IO_REG = (unsigned int*)(0x3f201000 + 0x00);
volatile unsigned int* AUX_MU_LSR_REG = (unsigned int*)(0x3f201000 + 0x18);
while (*str != 0) {
while (*AUX_MU_LSR_REG & 0x00000020) {
// TX FIFO Full
}
*AUX_MU_IO_REG = (unsigned int)((unsigned char)*str);
str++;
}
return;
}
signed int kmain(unsigned int argc, char* argv[], char* envp[]) {
char* text = "Test Output String\n";
volatile unsigned int* AUXENB = 0;
//AUXENB = (unsigned int*)(0x20200000 + 0x00);
//*AUXENB |= 0x00024000;
//AUXENB = (unsigned int*)(0x20200000 + 0x08);
//*AUXENB |= 0x00000480;
// Set Baud Rate to 115200
AUXENB = (unsigned int*)(0x3f201000 + 0x24);
*AUXENB = 26;
AUXENB = (unsigned int*)(0x3f201000 + 0x28);
*AUXENB = 0;
AUXENB = (unsigned int*)(0x3f200000 + 0x04);
*AUXENB = 0;
// Set GPIO Pin 14 to Mode: ALT0 (UART0)
*AUXENB |= (04u << ((14 - 10) * 3));
// Set GPIO Pin 15 to Mode: ALT0 (UART0)
*AUXENB |= (04u << ((15 - 10) * 3));
AUXENB = (unsigned int*)(0x3f200000 + 0x08);
*AUXENB = 0;
// Set GPIO Pin 23 to Mode: Output
*AUXENB |= (01u << ((23 - 20) * 3));
// Set GPIO Pin 24 to Mode: Output
*AUXENB |= (01u << ((24 - 20) * 3));
// Turn ON Pin 23
AUXENB = (unsigned int*)(0x3f200000 + 0x1C);
*AUXENB = (1u << 23);
// Turn OFF Pin 24
AUXENB = (unsigned int*)(0x3f200000 + 0x28);
*AUXENB = (1u << 24);
// Enable TX on UART0
AUXENB = (unsigned int*)(0x3f201000 + 0x30);
*AUXENB = 0x00000101;
pstr(text);
// Turn ON Pin 24
AUXENB = (unsigned int*)(0x3f200000 + 0x1C);
*AUXENB = (1u << 24);
return 0;
}