0
votes

Can someone help me out please! I do not know if the answer is general, or specific to the board and software versions I am working with. I am out of my previous areas here, and do not even know the right question to ask.

EDITs added at the bottom

What I currently want, is to create a program that will run standalone (bare metal; no OS) on a A20-OLinuXino-Micro-4GB board, that needs to use (at least) some standard math library calls. Eventually, I will want to load it into NAND, and run it on powerup, but for now I am trying to manually load it (loady) from the U-Boot (github.com/linux-sunxi/u-boot-sunxi/wiki) serial 'console', after booting from an SD card. Standalone is needed, because the linux distro level access to the hardware GPIO ports is not very flexible, when working with more than one bit (port in a port group) at a time, and quite slow. Too slow for the target application, and I did not really want to try modifying / adding a kernel module just to see if that would be fast enough.

Are there some standard gcc / ld flags needed to create a bare metal standalone program, and include some library routines? Beyond -ffreestanding and -static? Is there some special glue code needed? Is there something else I have not even thought of?

If found and looked over Beagleboard bare metal programming (stackoverflow.com/questions/6870712/beagleboard-bare-metal-programming). The answer there is good info, but is assembler, and does not reference any library. Application hangs when calling printf to uart with bare metal raspberry pi might show a cause for the problem. The (currently) bottom answer points to problems with VFP, and I already ran across problems with soft/hard floating point options. That shows some assembler code, but I am missing details about how to add a wrapper/glue to combine with c code. My assembler coding is rusty, but would adding equivalent code at the start of hello_world (at least before the reference to the sin() function (likely) get things working? Maybe adding it into the libstubs code.

I am using another A20 board for the main development environment.

$ gcc --version gcc (Debian 4.6.3-14) 4.6.3 Copyright (C) 2011 Free
Software Foundation, Inc. This is free software; see the source for
copying conditions.  There is NO warranty; not even for
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

$ ld.bfd --version GNU ld (GNU Binutils for Debian) 2.22 Copyright
2011 Free Software Foundation, Inc. This program is free software; you
may redistribute it under the terms of the GNU General Public License
version 3 or (at your option) a later version. This program has
absolutely no warranty.

$ uname -a Linux a20-OLinuXino 3.4.67+ #6 SMP PREEMPT Fri Nov 1
17:32:40 EET 2013 armv7l GNU/Linux

I have been able to create bootable U-Boot images for the board on SD cards from the repo, either building directly from the linux-sunxi distro that was supplied with the board, or by cross-compiling from a Fedora 21 machine. Same for the standalone hello_world program that came in the examples for U-boot, which can be loaded and run from the U-Boot console.

However, reducing the sample program to bare minimum, then adding code that needs math.h, -lm and -lc fails (in various iterations) with 'software interrupt' or 'undefined operation' type errors. The original sample program was being linked with -lgcc, but a little checking showed that nothing was actually being included from the library. The identical binary was created without the library, so the question might be 'what does it take to use any library with a bare metal program?'

sun7i# go 0x48000000
## Starting application at 0x48000000 ...
Hello math World
undefined instruction
pc : [<48000010>]          lr : [<4800000c>]
sp : 7fb66da0  ip : 7fb672c0     fp : 00000000
r10: 00000002  r9 : 7fb66f0c     r8 : 7fb67778
r7 : 7ffbbaf8  r6 : 00000001     r5 : 7fb6777c  r4 : 48000000
r3 : 00000083  r2 : 7ffbc7fc     r1 : 0000000a  r0 : 00000011
Flags: nZCv  IRQs off  FIQs off  Mode SVC_32
Resetting CPU ...

To get that far, I had to tweak build options, to specify hardware floating point, since that is how the base libraries were compiled.

Here are the corresponding source and build script files

hello_world.c

#include <common.h>
#include <math.h>

int hello_world (void)
{
    double tst;
    tst = 0.33333333333;
    printf ("Hello math World\n");
    tst = sin(0.5);
//  printf ("sin test %d : %d\n", (int)tst, (int)(1000 * tst));
    return (0);
}

build script

#! /bin/bash
UBOOT="/home/olimex/u-boot-sunxi"
SRC="$UBOOT/examples/standalone"
#INCLS="-nostdinc -isystem /usr/lib/gcc/arm-linux-gnueabihf/4.6/include -I$UBOOT/include -I$UBOOT/arch/arm/include"
INCLS="-I$UBOOT/include -I$UBOOT/arch/arm/include"
#-v
GCCOPTS="\
 -D__KERNEL__ -DCONFIG_SYS_TEXT_BASE=0x4a000000\
 -Wall -Wstrict-prototypes -Wno-format-security\
 -fno-builtin -ffreestanding -Os -fno-stack-protector\
 -g -fstack-usage -Wno-format-nonliteral -fno-toplevel-reorder\
 -DCONFIG_ARM -D__ARM__ -marm -mno-thumb-interwork\
 -mabi=aapcs-linux -mword-relocations -march=armv7-a\
 -ffunction-sections -fdata-sections -fno-common -ffixed-r9\
 -mhard-float -pipe"
# -msoft-float -pipe
OBJS="hello_world.o libstubs.o"

LDOPTS="--verbose -g -Ttext 0x48000000"
#--verbose
#LIBS="-static -L/usr/lib/gcc/arm-linux-gnueabihf/4.6 -lm -lc"
LIBS="-static -lm -lc"
#-lgcc

gcc -Wp,-MD,stubs.o.d $INCLS $GCCOPTS -D"KBUILD_STR(s)=#s"\
 -D"KBUILD_BASENAME=KBUILD_STR(stubs)"\
 -D"KBUILD_MODNAME=KBUILD_STR(stubs)"\
 -c -o stubs.o $SRC/stubs.c

ld.bfd -r -o libstubs.o stubs.o

gcc -Wp,-MD,hello_world.o.d $INCLS $GCCOPTS -D"KBUILD_STR(s)=#s"\
 -D"KBUILD_BASENAME=KBUILD_STR(hello_world)"\
 -D"KBUILD_MODNAME=KBUILD_STR(hello_world)"\
 -c -o hello_world.o hello_world.c

ld.bfd $LDOPTS -o hello_world -e hello_world $OBJS $LIBS

objcopy -O binary hello_world hello_world.bin

EDITS added:

The application that this is to be part of needs both some fairly high speed GPIO and some math functions. Should only need sin() and maybe sqrt(). My previous testing for the GPIO got the toggling of single pin (port in a port group) up to 8MHz. The constraints for the application need to get the full cycle time in the 10µs (100Hhz) range, which includes reading all pins from a single port, and writing a few pins on other ports, synchronized with the timing limitations of the attached ADC chip (3 ADC reads). I have bare metal code that is doing (simulating) that process in about 2.1µs. Now I need to add in the math to process the values, the output of which will set some more outputs. Future planned improvements including using SIMD for the math, and dedicating the second core to the math, while the first does the GPIO and 'feeds' the calculations.

The needed math code / logic has already been written into a simulation program using very standard (c99) code. I just need to port it into the bare metal program. Need to get 'math' to work first.

1
I would have gone with a dedicated Android GPIO driver on the OS already installed, so bypassing the horrendous nighm.. hard work of trying to get bootable image to run with no hardware debugger. ''software interrupt' or 'undefined operation' type errors' - yes, that is normal for a long time when trying to get anything at all to work from nothing on a development board :(Martin James
yes this is all doable. what math are you after, just soft float or something else? a lot of libraries (printf being an extremely heavy one for example) have at least some system dependencies, even the math libraries (divide by zero handler) . so if you want to use the stock unmodified libraries you need some sort of replacement for the system calls, newlib for example, but it is unlikely to completely replace everything you need. I think you need to start with a non-printf hello world, either blink an led (typical) or shove some bytes out the uart.old_timer
get that compiling, linking, bootstrapping, and running and you are 80% of the way there. then either find clean bare metal libraries (little demand so unlikely) or take existing and extract the parts you want and make them clean, or add the calls they need to allow them to compile. newlib for example makes the backend easy, and you can go through and change all the functions to return the appropriate pass fail and not actually do anything, add a few more lines and then the very heavy printf actually works to spit stuff out the uart for example.old_timer
I wouldn't really recommend diving into the deep end with bare-metal on a mobile application SoC. For starters, the problem presented here actually has very little to do with the title question, and you've already proven that you have everything to learn about how things work. You're lucky enough to have the bootloader's exception handler telling you exactly what's wrong: disassemble your binary and see what the instruction at that address is (hint: it's going to be a floating point instruction). Huh, how's that undefined? Well, you don't have an OS enabling the FPU for you...Notlikethat
basic math be it fixed point or float is to some extent in libgcc, if you use gcc to link then it can find libgcc, if you use ld (to make controlling the bootstrap code and memory map easier) then you have to explicitly add libgcc on the linker command line (well path to it), at least that has been my experience. you might have to create a dummy function for the divide by zero and anything else it wants so that it links without error.old_timer

1 Answers

0
votes

As first thing, I suggest reading this excellent paper on Bare Metal programming with ARM and GNU http://www.state-machine.com/arm/Building_bare-metal_ARM_with_GNU.pdf.

Then, I would make sure you avoid any syscall to the Linux Kernel (which you don't have and your compiler will try to make), e.g. avoiding returning values in void main() - that should never return, anyway.

Finally, I would either user newlib or, if you need to use a small subset of what libraries have to offer you, write a custom implementation.

Keep in mind you are using an Allinner SoC which is not the best for bare metal documentation, but you can find the TRM here http://www.soselectronic.com/a_info/resource/c/20_UM-V1.020130322.pdf, so I would check if libraries (if you decide to use them) or your code need some special silicon hardware to be initialized (some interconnect fabric, clock and power domains, etc.).

I strongly suggest, if you just need to use sin() and similar, to just deploy your own.