Your linker script
ENTRY(Reset_Handler)
MEMORY
{
FLASH(rx):ORIGIN =0x08000000,LENGTH =1024K
SRAM(rwx):ORIGIN =0x20000000,LENGTH =128K
}
SECTIONS
{
.text :
{
*(.isr_vector)
*(.text)
*(.rodata)
. = ALIGN(4);
_etext = .;
}> FLASH
_la_data = LOADADDR(.data);
.data :
{
_sdata = .;
*(.data)
*(.data.*)
. = ALIGN(4);
_edata = .;
}> SRAM AT> FLASH
.bss :
{
_sbss = .;
__bss_start__ = _sbss;
*(.bss)
*(.bss.*)
*(COMMON)
. = ALIGN(4);
_ebss = .;
__bss_end__ = _ebss;
. = ALIGN(4);
end = .;
__end__ = .;
}> SRAM
}
Since you have read the arm and st documents you know that the vector table starts with a stack pointer load value then the reset vector then other vectors, can be hundreds depending on the chip. The chip vendor maps the application flash at 0x08000000 and with certain boot options that can be mirrored to 0x00000000 where it needs to be for arm to boot off of it. And ram starts at 0x20000000 and is of some size based on the chip.
.cpu cortex-m4
.word 0x20001000
.word Reset_Handler
.word loop
.word loop
.globl Reset_Handler
.thumb_func
Reset_Handler:
b loop
.thumb_func
loop:
b .
.align
.word 0x11223344
.word _edata
.word _sdata
.word _la_data
.word _ebss
.word _sbss
.word 0x55667788
Is not a bad starting point. The linker as you know from reading up on it can generate variables if you will which you can then use in your code as seen in the C code and is just as available here.
build it
arm-none-eabi-as --warn --fatal-warnings -mcpu=cortex-m4 so.s -o so.o
arm-none-eabi-ld -nostdlib -nostartfiles -T so.ld so.o -o so.elf
arm-none-eabi-objdump -D so.elf > so.list
arm-none-eabi-objcopy -O binary so.elf so.bin
arm-none-eabi-objcopy -O srec --srec-forceS3 so.elf so.srec
examine the dump
Disassembly of section .text:
08000000 <Reset_Handler-0x10>:
8000000: 20001000 andcs r1, r0, r0
8000004: 08000011 stmdaeq r0, {r0, r4}
8000008: 08000013 stmdaeq r0, {r0, r1, r4}
800000c: 08000013 stmdaeq r0, {r0, r1, r4}
08000010 <Reset_Handler>:
8000010: e7ff b.n 8000012 <loop>
08000012 <loop>:
8000012: e7fe b.n 8000012 <loop>
8000014: 11223344 ; <UNDEFINED> instruction: 0x11223344
8000018: 20000000 andcs r0, r0, r0
800001c: 20000000 andcs r0, r0, r0
8000020: 08000030 stmdaeq r0, {r4, r5}
8000024: 20000000 andcs r0, r0, r0
8000028: 20000000 andcs r0, r0, r0
800002c: 55667788 strbpl r7, [r6, #-1928]! ; 0xfffff878
That is disassembled so it is trying to disassemble everything, look at this
08000000 <Reset_Handler-0x10>:
8000000: 20001000 sp initialization value
8000004: 08000011 reset handler address orred with one (see the docs)
8000008: 08000013 some other handler
800000c: 08000013 some other handler
8000014: 11223344 .word 0x11223344
8000018: 20000000 .word _edata
800001c: 20000000 .word _sdata
8000020: 08000030 .word _la_data
8000024: 20000000 .word _ebss
8000028: 20000000 .word _sbss
800002c: 55667788 .word 0x55667788
There is no .data so edata and sdata are at the same place. la_data is a kind of strange thing, and then no .bss either so start and end in the same place. so add some
.cpu cortex-m4
.word 0x20001000
.word Reset_Handler
.word loop
.word loop
.globl Reset_Handler
.thumb_func
Reset_Handler:
b loop
.thumb_func
loop:
b .
.align
.word 0x11223344
.word _edata
.word _sdata
.word _la_data
.word _ebss
.word _sbss
.word 0x55667788
.section .bss
.byte 0
.section .data
.byte 0x66
Disassembly of section .text:
08000000 <Reset_Handler-0x10>:
8000000: 20001000 andcs r1, r0, r0
8000004: 08000011 stmdaeq r0, {r0, r4}
8000008: 08000013 stmdaeq r0, {r0, r1, r4}
800000c: 08000013 stmdaeq r0, {r0, r1, r4}
08000010 <Reset_Handler>:
8000010: e7ff b.n 8000012 <loop>
08000012 <loop>:
8000012: e7fe b.n 8000012 <loop>
8000014: 11223344 ; <UNDEFINED> instruction: 0x11223344
8000018: 20000004 andcs r0, r0, r4
800001c: 20000000 andcs r0, r0, r0
8000020: 08000030 stmdaeq r0, {r4, r5}
8000024: 20000008 andcs r0, r0, r8
8000028: 20000004 andcs r0, r0, r4
800002c: 55667788 strbpl r7, [r6, #-1928]! ; 0xfffff878
Disassembly of section .data:
20000000 <_sdata>:
20000000: 00000066 andeq r0, r0, r6, rrx
Disassembly of section .bss:
20000004 <__bss_start__>:
20000004: 00000000 andeq r0, r0, r0
8000018: 20000004 andcs r0, r0, r4
800001c: 20000000 andcs r0, r0, r0
8000020: 08000030 stmdaeq r0, {r4, r5}
8000024: 20000008 andcs r0, r0, r8
8000028: 20000004 andcs r0, r0, r4
so .data goes from 0x20000000 to 0x20000004(-1) and bss from 0x20000004 to 0x20000008(-1)
S00A0000736F2E7372656338
S315080000000010002011000008130000081300000863
S31508000010FFE7FEE744332211040000200000002019
S315080000203000000808000020040000208877665584
S309080000306600000058
S70508000011E1
and at address 0x0800030 we can see the .data value
So you can simply re-write the C code in assembly language (did not need to do this analysis but good to). If you do not put alignment into the linker script then you have to do a byte by byte copy like the C code or if lucky and want to put the code in for it you can try to instrument something faster but both ends need to be unaligned in the same way.
The things you need to do in your bootstrap for an mcu like this, minimum,
1) stack pointer
2) .data
3) .bss
4) call/branch to C entry point
5) infinite loop
Many folks will say you should never return from main() but
1) you can protect them anyway, and they will thank you later
2) they perhaps have not created a purely event driven solution
Does not hurt. So as you read in the documentation from arm they have a mechanism for loading the stack pointer, if you use that then that checks the first box.
Not intended to be lean and mean, wholly untested, maybe buggy:
.cpu cortex-m4
.syntax unified
.word 0x20001000
.word Reset_Handler
.word loop
.word loop
.globl Reset_Handler
.thumb_func
Reset_Handler:
/*copy .data section to SRAM */
/*uint32_t size = (uint32_t)&_edata - (uint32_t)&_sdata;*/
ldr r0,=_edata
ldr r1,=_sdata
subs r0,r0,r1
bne data_loop_done
/*uint8_t *pDst = (uint8_t*)&_sdata; //sram*/
/*uint8_t *pSrc = (uint8_t*)&_la_data; //flash*/
ldr r2,=_la_data
/*
for(uint32_t i =0 ; i < size ; i++)
{
*pDst++ = *pSrc++;
}
*/
data_loop:
ldrb r3,[r2]
adds r2,#1
strb r3,[r1]
adds r1,#1
subs r0,r0,#1
bne data_loop
data_loop_done:
/*
Init. the .bss section to zero in SRAM
size = (uint32_t)&_ebss - (uint32_t)&_sbss;
pDst = (uint8_t*)&_sbss;
for(uint32_t i =0 ; i < size ; i++)
{
*pDst++ = 0;
}
*/
ldr r0,=_ebss
ldr r1,=_sbss
mov r2,#0
subs r0,r0,r1
bne bss_loop_done
bss_loop:
strb r2,[r1]
adds r1,#1
bne bss_loop
bss_loop_done:
/*__libc_init_array();*/
bl __libc_init_array
/*main();*/
bl main
b loop
.thumb_func
loop:
b .
__libc_init_array:
bx lr
main:
bx lr
.align
.word 0x11223344
.word _edata
.word _sdata
.word _la_data
.word _ebss
.word _sbss
.word 0x55667788
.section .bss
.byte 0
.section .data
.byte 0x66
But functional
08000010 <Reset_Handler>:
8000010: 4814 ldr r0, [pc, #80] ; (8000064 <main+0x1e>)
8000012: 4915 ldr r1, [pc, #84] ; (8000068 <main+0x22>)
8000014: 1a40 subs r0, r0, r1
8000016: d106 bne.n 8000026 <data_loop_done>
8000018: 4a14 ldr r2, [pc, #80] ; (800006c <main+0x26>)
0800001a <data_loop>:
800001a: 7813 ldrb r3, [r2, #0]
800001c: 3201 adds r2, #1
800001e: 700b strb r3, [r1, #0]
8000020: 3101 adds r1, #1
8000022: 3801 subs r0, #1
8000024: d1f9 bne.n 800001a <data_loop>
08000026 <data_loop_done>:
...
8000064: 20000004 andcs r0, r0, r4
8000068: 20000000 andcs r0, r0, r0
800006c: 08000078 stmdaeq r0, {r3, r4, r5, r6}
If you are careful you can do it without forcing thumb2 instructions where not necessary. You may be able to improve this with thumb2 instructions but if the linker script does its job then you can use ldr/str and do a word at a time possibly comparing with the end value not a size. Whichever...
Hmm, yeah I did leave an instruction out of the above code...
ldr r0,=_ebss
ldr r1,=_sbss
mov r2,#0
cmp r0,r1
beq bss_loop_done
bss_loop:
str r2,[r1]
adds r1,#4
cmp r0,r1
bne bss_loop
bss_loop_done:
should be four or more times faster depending on the system (chip). BUT you have to insure that the start and end addresses are aligned. You can go further than that by increasing the alignment to a double-word boundary
ldr r0,=_ebss
ldr r1,=_sbss
mov r2,#0
mov r3,#0
cmp r0,r1
beq bss_loop_done
bss_loop:
stm r1!,{r2,r3}
cmp r0,r1
bne bss_loop
bss_loop_done:
Could have used the stm in the word at a time loop and saved an instruction. You might see a gain with 4 words at a time but might not on a cortex-m, getting up to 2 words is a nice balance. And you can do the same optimizations with the .data copy.
I hope this was not a homework assignment, you still get to find and debug it if it were. But it is a simple matter of reading and porting the code. Looking at the endless supply of examples out there.
Looking at the linker script now on the screen it was designed for:
.cpu cortex-m4
.syntax unified
.section .isr_vector
.word 0x20001000
.word Reset_Handler
.word loop
.word loop
.section .text
.globl Reset_Handler
.thumb_func
Reset_Handler:
b loop
.thumb_func
loop:
b .
Disassembly of section .text:
08000000 <Reset_Handler-0x10>:
8000000: 20001000 andcs r1, r0, r0
8000004: 08000011 stmdaeq r0, {r0, r4}
8000008: 08000013 stmdaeq r0, {r0, r1, r4}
800000c: 08000013 stmdaeq r0, {r0, r1, r4}
08000010 <Reset_Handler>:
8000010: e7ff b.n 8000012 <loop>
08000012 <loop>:
8000012: e7fe b.n 8000012 <loop>
So that you do not have to get the objects on the command line in a certain order.
There is an intimate relationship between the linker script and the bootstrap code, you can't really have one without the other, they are a pair. You cannot or should not attempt to mix and match various linker scripts and bootstrap code from projects willy nilly, need to keep them together as designed.
Linker scripts are not portable and assembly language is not assumed to be portable so IMO you should make each as simple and lean and mean as possible, less is more, less to port, less to maintain, less toolchain specific stuff. That is not the general view of developers they love to make grossly over complicated linker scripts. The C library can play a role here too, with the gnu model the C library is really a separate part and you can insert whichever one you want (and it comes with its related bootstrap and linker script), but that depends on how that library works, the target, etc.
A microcontroller without an RTOS is not really C library friendly so you have to ask yourself do I really need a C library, how much simpler and smaller (and cheaper) and readable and more maintainable can I make this project?
Mine tend to look like this
.thumb_func
reset:
bl main
b .
MEMORY
{
rom : ORIGIN = 0x08000000, LENGTH = 0x1000
ram : ORIGIN = 0x20000000, LENGTH = 0x1000
}
SECTIONS
{
.text : { *(.text*) } > rom
.rodata : { *(.rodata*) } > rom
.bss : { *(.bss*) } > ram
}
For each one of us reading this with this experience you are going to see a different style, different opinion, etc. That is another feature of bare-metal, the freedom to do it your own way, only truly bound by the hardware rules, nothing else. No-one's solution is really wrong, it just reflects their style.