Can someone help me out please! I do not know if the answer is general, or specific to the board and software versions I am working with. I am out of my previous areas here, and do not even know the right question to ask.
EDITs added at the bottom
What I currently want, is to create a program that will run standalone (bare metal; no OS) on a A20-OLinuXino-Micro-4GB board, that needs to use (at least) some standard math library calls. Eventually, I will want to load it into NAND, and run it on powerup, but for now I am trying to manually load it (loady) from the U-Boot (github.com/linux-sunxi/u-boot-sunxi/wiki) serial 'console', after booting from an SD card. Standalone is needed, because the linux distro level access to the hardware GPIO ports is not very flexible, when working with more than one bit (port in a port group) at a time, and quite slow. Too slow for the target application, and I did not really want to try modifying / adding a kernel module just to see if that would be fast enough.
Are there some standard gcc / ld flags needed to create a bare metal standalone program, and include some library routines? Beyond -ffreestanding and -static? Is there some special glue code needed? Is there something else I have not even thought of?
If found and looked over Beagleboard bare metal programming (stackoverflow.com/questions/6870712/beagleboard-bare-metal-programming). The answer there is good info, but is assembler, and does not reference any library. Application hangs when calling printf to uart with bare metal raspberry pi might show a cause for the problem. The (currently) bottom answer points to problems with VFP, and I already ran across problems with soft/hard floating point options. That shows some assembler code, but I am missing details about how to add a wrapper/glue to combine with c code. My assembler coding is rusty, but would adding equivalent code at the start of hello_world (at least before the reference to the sin() function (likely) get things working? Maybe adding it into the libstubs code.
I am using another A20 board for the main development environment.
$ gcc --version gcc (Debian 4.6.3-14) 4.6.3 Copyright (C) 2011 Free
Software Foundation, Inc. This is free software; see the source for
copying conditions. There is NO warranty; not even for
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
$ ld.bfd --version GNU ld (GNU Binutils for Debian) 2.22 Copyright
2011 Free Software Foundation, Inc. This program is free software; you
may redistribute it under the terms of the GNU General Public License
version 3 or (at your option) a later version. This program has
absolutely no warranty.
$ uname -a Linux a20-OLinuXino 3.4.67+ #6 SMP PREEMPT Fri Nov 1
17:32:40 EET 2013 armv7l GNU/Linux
I have been able to create bootable U-Boot images for the board on SD cards from the repo, either building directly from the linux-sunxi distro that was supplied with the board, or by cross-compiling from a Fedora 21 machine. Same for the standalone hello_world program that came in the examples for U-boot, which can be loaded and run from the U-Boot console.
However, reducing the sample program to bare minimum, then adding code that needs math.h, -lm and -lc fails (in various iterations) with 'software interrupt' or 'undefined operation' type errors. The original sample program was being linked with -lgcc, but a little checking showed that nothing was actually being included from the library. The identical binary was created without the library, so the question might be 'what does it take to use any library with a bare metal program?'
sun7i# go 0x48000000
## Starting application at 0x48000000 ...
Hello math World
undefined instruction
pc : [<48000010>] lr : [<4800000c>]
sp : 7fb66da0 ip : 7fb672c0 fp : 00000000
r10: 00000002 r9 : 7fb66f0c r8 : 7fb67778
r7 : 7ffbbaf8 r6 : 00000001 r5 : 7fb6777c r4 : 48000000
r3 : 00000083 r2 : 7ffbc7fc r1 : 0000000a r0 : 00000011
Flags: nZCv IRQs off FIQs off Mode SVC_32
Resetting CPU ...
To get that far, I had to tweak build options, to specify hardware floating point, since that is how the base libraries were compiled.
Here are the corresponding source and build script files
hello_world.c
#include <common.h>
#include <math.h>
int hello_world (void)
{
double tst;
tst = 0.33333333333;
printf ("Hello math World\n");
tst = sin(0.5);
// printf ("sin test %d : %d\n", (int)tst, (int)(1000 * tst));
return (0);
}
build script
#! /bin/bash
UBOOT="/home/olimex/u-boot-sunxi"
SRC="$UBOOT/examples/standalone"
#INCLS="-nostdinc -isystem /usr/lib/gcc/arm-linux-gnueabihf/4.6/include -I$UBOOT/include -I$UBOOT/arch/arm/include"
INCLS="-I$UBOOT/include -I$UBOOT/arch/arm/include"
#-v
GCCOPTS="\
-D__KERNEL__ -DCONFIG_SYS_TEXT_BASE=0x4a000000\
-Wall -Wstrict-prototypes -Wno-format-security\
-fno-builtin -ffreestanding -Os -fno-stack-protector\
-g -fstack-usage -Wno-format-nonliteral -fno-toplevel-reorder\
-DCONFIG_ARM -D__ARM__ -marm -mno-thumb-interwork\
-mabi=aapcs-linux -mword-relocations -march=armv7-a\
-ffunction-sections -fdata-sections -fno-common -ffixed-r9\
-mhard-float -pipe"
# -msoft-float -pipe
OBJS="hello_world.o libstubs.o"
LDOPTS="--verbose -g -Ttext 0x48000000"
#--verbose
#LIBS="-static -L/usr/lib/gcc/arm-linux-gnueabihf/4.6 -lm -lc"
LIBS="-static -lm -lc"
#-lgcc
gcc -Wp,-MD,stubs.o.d $INCLS $GCCOPTS -D"KBUILD_STR(s)=#s"\
-D"KBUILD_BASENAME=KBUILD_STR(stubs)"\
-D"KBUILD_MODNAME=KBUILD_STR(stubs)"\
-c -o stubs.o $SRC/stubs.c
ld.bfd -r -o libstubs.o stubs.o
gcc -Wp,-MD,hello_world.o.d $INCLS $GCCOPTS -D"KBUILD_STR(s)=#s"\
-D"KBUILD_BASENAME=KBUILD_STR(hello_world)"\
-D"KBUILD_MODNAME=KBUILD_STR(hello_world)"\
-c -o hello_world.o hello_world.c
ld.bfd $LDOPTS -o hello_world -e hello_world $OBJS $LIBS
objcopy -O binary hello_world hello_world.bin
EDITS added:
The application that this is to be part of needs both some fairly high speed GPIO and some math functions. Should only need sin() and maybe sqrt(). My previous testing for the GPIO got the toggling of single pin (port in a port group) up to 8MHz. The constraints for the application need to get the full cycle time in the 10µs (100Hhz) range, which includes reading all pins from a single port, and writing a few pins on other ports, synchronized with the timing limitations of the attached ADC chip (3 ADC reads). I have bare metal code that is doing (simulating) that process in about 2.1µs. Now I need to add in the math to process the values, the output of which will set some more outputs. Future planned improvements including using SIMD for the math, and dedicating the second core to the math, while the first does the GPIO and 'feeds' the calculations.
The needed math code / logic has already been written into a simulation program using very standard (c99) code. I just need to port it into the bare metal program. Need to get 'math' to work first.