0
votes

I am studying ARM assembler and when I get to the part explaining how to read/write from a file I do not understand how it branches, the code is this:

@ fopen input argv[1]
PUSH {R1}
LDR R0, [R1,#0x04]
LDR R1, =r
BL fopen
LDR R1, =fin
STR R0, [R1]

That BL fopen where does it branch? The only reference to fopen is this:

.global .fopen

Later in the program. I am thinking that maybe I did not understand how the instructions starting with a dot works, but the only hing I found online is that they are called directives. Can anyone clarify this?

1
.global fopen tells the assembler, that the label "fopen" is defined somewhere else, and that the linker will insert the matching address. this can e.g. be external libraries, or even your own code e.g. from another assembler file, with code you already finished somewhen elseTommylee2k
.global fopen tells the assember this is a global label defined here, .extern fopen for example (which like the extern in C isnt enforced in gas) says that the label is defined elsewhere. in this case the linker resolves the branch link with the C library call fopen when linking that library.old_timer

1 Answers

3
votes

Here is an example that covers what you are asking and perhaps more...

one.s

.globl _start
_start:
    bl notmain
    bl hello
    b .

two.s

.extern notmain

.globl hello
hello:
    bl notmain
    bl there
    bx lr

there:
    bx lr

three.c

unsigned int x;
void notmain ( void )
{
    x=5;
}

so.ld

MEMORY
{
    bob : ORIGIN = 0x08000000, LENGTH = 0x1000
    ted : ORIGIN = 0x20000000, LENGTH = 0x1000
}

SECTIONS
{
    .text : { *(.text*) } > bob
    .rodata : { *(.rodata*) } > bob
    .bss : { *(.bss*) } > ted
}

arm-none-eabi-as --warn --fatal-warnings one.s -o one.o
arm-none-eabi-as --warn --fatal-warnings two.s -o two.o
arm-none-eabi-gcc -Wall -Werror -O2 -nostdlib -nostartfiles -ffreestanding -c three.c -o three.o
arm-none-eabi-ld -T so.ld one.o two.o three.o -o so.elf
arm-none-eabi-objdump -D so.elf > so.list

So for as simple as these look there is a lot to cover here, first and foremost, an assembly language is defined by the assembler, the program that parses it, some will try to be compatible with some other assembler (nasm and masm for example). But some are not, in particular when it comes to directives which are the non-assembly instruction type things. So the above is gnu based, binutils for the assembler and linker, gcc for the compiler. With a generic full sized ARM target.

Gnu linker wants a label called _start somewhere, to use as the entry point, note it doesnt care about main, main is usually dragged in by the "bootstrap" code which is often prepared for you by whomever prepared the toolchain. In this case I am making my own, and using the toolchain as a set of tools...

So you can use .globl or .global they are the same, there are other ways to do this and in other assembly languages you might have a FUNCTION or PROCEDURE or other directive you use to declare this label something more than just a label. Gnu assembler (gas) considers labels (_start: in this case) to be local like putting a static on a function in a C file, gas assumes labels are local, C assumes labels are global (with natural exceptions those inside a function, etc...

Just like when you call another function from one file in a C program the linker resolves that later (even if you do gcc hello.c -o hello gcc is calling the whole chain, multiple programs to deal with just compiling to assembly language then the assembler then the linker then cleans up the temporary files unless you tell it not to). Unlike gcc, gas doesnt complain when we use a label that is not defined anywhere or declared external.

So looking at the output of one.o

Note without any processors, etc just a toolchain you can do these experiments yourself. with my code at least you can use arm-none-linux-gnueabi, arm-linux-gnueabi as well dont necessarily need arm-none-eabi

Disassembly of section .text:

00000000 <_start>:
   0:   ebfffffe    bl  0 <notmain>
   4:   ebfffffe    bl  0 <hello>
   8:   eafffffe    b   8 <_start+0x8>

Okay that was an accident I swear. First off the addressing starts with zero, because this is an object it is not linked yet. Because notmain and hello are at this point unresolved externals the assembler does what it can it makes a bl instruction but doesnt have an offset to use, so gas chooses to encode a branch to self basically. Next the dot on the last line means self b . means branch to self, I could have put a label in front and said branch to that label

here: b here

and ended up with cleaner easier to read code. gnu assembler has other interesting things you can do

1:
 b 1f
 b 1b
1:

the 1 is from the label 1 the f means forward branch to the label numbered 1 looking forward. branch to the label 1 looking backward (1b) so the first instruction branches forward two the second backward two.

c:  ea000000    b   14 <_start+0x14>
  10:   eafffffd    b   c <_start+0xc>

you can go crazy with assembler specific nuances that can make the code easier to type but harder to read, and less portable, up to you.

arm-none-eabi-objdump -D two.o

00000000 <hello>:
   0:   ebfffffe    bl  0 <notmain>
   4:   eb000000    bl  c <there>
   8:   e12fff1e    bx  lr

0000000c <there>:
   c:   e12fff1e    bx  lr

In two.s I did declare an external label just to do it, didnt hurt, cleaner assembler does less work. I also declared and used a local label, there, the assembler can find this label and can produce the proper bl with an offset for this label, so the linker doesnt have to, for the notmain label though it still has to fill something in and leave it for the linker to fix later.

00000000 <notmain>:
   0:   e3a02005    mov r2, #5
   4:   e59f3004    ldr r3, [pc, #4]    ; 10 <notmain+0x10>
   8:   e5832000    str r2, [r3]
   c:   e12fff1e    bx  lr
  10:   00000000    andeq   r0, r0, r0

three.o comes from a C program, the label/function notmain is automatically global because I didnt add a static in front of it likewise x is a global variable.

But x is in the .bss section, which is separate from the .text section where code goes so at this time the compiler does not know how far away that label is, so it generates code which is semi-specific to this instruction set, others like x86 might just use a far mov rather than pc relative which is what is used here. The 0x00000000 at address/offset 0x10 is a location the linker will fill in with the address to x the code can generate the read of that address then read of the data at that address and do the assignment.

MEMORY
{
    bob : ORIGIN = 0x08000000, LENGTH = 0x1000
    ted : ORIGIN = 0x20000000, LENGTH = 0x1000
}

SECTIONS
{
    .text : { *(.text*) } > bob
    .rodata : { *(.rodata*) } > bob
    .bss : { *(.bss*) } > ted
}

linker scripts and/or command lines are very specific to the linker from a vendors toolchain, like compiler pragmas or other directives, assembler directives, linker scripts are tool specific and not expected to be portable/compatible with other tools or other versions of the same tool.

It doesnt get too much simpler than this for gnu ld (the linker) I avoided words like rom and ram to show they dont have an impact they are just names that connect the dots between the description of the memory space and what sections I want in those memory spaces.

Put all of this together with the linker whose job it is to...link all this stuff together.

Disassembly of section .text:

08000000 <_start>:
 8000000:   eb000005    bl  800001c <notmain>
 8000004:   eb000000    bl  800000c <hello>
 8000008:   eafffffe    b   8000008 <_start+0x8>

0800000c <hello>:
 800000c:   eb000002    bl  800001c <notmain>
 8000010:   eb000000    bl  8000018 <there>
 8000014:   e12fff1e    bx  lr

08000018 <there>:
 8000018:   e12fff1e    bx  lr

0800001c <notmain>:
 800001c:   e3a02005    mov r2, #5
 8000020:   e59f3004    ldr r3, [pc, #4]    ; 800002c <notmain+0x10>
 8000024:   e5832000    str r2, [r3]
 8000028:   e12fff1e    bx  lr
 800002c:   20000000    andcs   r0, r0, r0

Disassembly of section .bss:

20000000 <x>:
20000000:   00000000    andeq   r0, r0, r0

I said I wanted .text to be at 0x08000000 and that is where it is, I put one.o on the command line first so its code came first, then two then three. You can do linker script stuff to change this but otherwise it goes by command line order in my experience with gnu linker.

00000000 <_start>:
   0:   ebfffffe    bl  0 <notmain>
   4:   ebfffffe    bl  0 <hello>
   8:   eafffffe    b   8 <_start+0x8>

08000000 <_start>:
 8000000:   eb000005    bl  800001c <notmain>
 8000004:   eb000000    bl  800000c <hello>
 8000008:   eafffffe    b   8000008 <_start+0x8>

comparing before and after linking the linker has found an object specified on the command line that contains the external labels (notmain and hello) and modified the instructions to access them properly.

Likewise the hello function from two.s has its call to notmain resolved. the call to there didnt change as it was resolved by the assembler.

Lastly notmain's has the address to x filled in so that it can modify it as the software desired.

So for as simple as this code appears on the surface there is a lot of generic toolchain as well as toolchain specific stuff going on.

In your case for the linker to succeed a library or other object with the fopen label (function) will get linked in and the call to that function resolved.

Whenever someone asks how do I do X in assembly, the answer is usually the same way you do it in some other language, in this case how do I open a file in assembly language well first off you have to have an operating system and or lots of code to deal with file systems and hardware, in this case the answer is simply call the fopen library function which deals with operating system specific stuff, and the operating system deals with file and hardware stuff...