32
votes

This is my assembly level code ...

section .text
global _start
_start: mov eax, 4
        mov ebx, 1
        mov ecx, mesg
        mov edx, size
        int 0x80
exit:   mov eax, 1
        int 0x80
section .data
mesg    db      'KingKong',0xa
size    equ     $-mesg

Output:

root@bt:~/Arena# nasm -f elf a.asm -o a.o
root@bt:~/Arena# ld -o out a.o
root@bt:~/Arena# ./out 
KingKong

My question is What is the global _start used for? I tried my luck with Mr.Google and I found that it is used to tell the starting point of my program. Why cant we just have the _start to tell where the program starts like the one given below which produces a kinda warning on the screen

section .text
_start: mov eax, 4
        mov ebx, 1
        mov ecx, mesg
        mov edx, size
        int 0x80
exit:   mov eax, 1
        int 0x80
section .data
mesg    db      'KingKong',0xa
size    equ     $-mesg

root@bt:~/Arena# nasm -f elf a.asm
root@bt:~/Arena# ld -e _start -o out a.o
ld: warning: cannot find entry symbol _start; defaulting to 0000000008048080
root@bt:~/Arena# ld -o out a.o
ld: warning: cannot find entry symbol _start; defaulting to 0000000008048080
4
possible duplicate of "global main" in AssemblyJens Björnhager

4 Answers

45
votes

global directive is NASM specific. It is for exporting symbols in your code to where it points in the object code generated. Here you mark _start symbol global so its name is added in the object code (a.o). The linker (ld) can read that symbol in the object code and its value so it knows where to mark as an entry point in the output executable. When you run the executable it starts at where marked as _start in the code.

If a global directive missing for a symbol, that symbol will not be placed in the object code's export table so linker has no way of knowing about the symbol.

If you want to use a different entry point name other than _start (which is the default), you can specify -e parameter to ld like:

ld -e my_entry_point -o out a.o
5
votes

A label is not explicitly global until you declare it to be global so you have to use the global directive.

The global label "_start" is needed by the linker, if there is no global _start address then the linker will complain because it cant find one. You didnt declare _start as a global so it is not visible outside that module/object of code so not visible to the linker.

This is the opposite of C where things are implied to be global unless you declare them to be local

unsigned int hello;
int fun ( int a )
{
  return(a+1);
}

hello and fun are global, visible outside the object, but this

static unsigned int hello;
static int fun ( int a )
{
  return(a+1);
}

makes them local not visible.

all local:

_start:
hello:
fun:
more_fun:

these are now global available to the linker and other objects

global _start
_start:
global hello
hello:
...
5
votes

_start is used by the default Binutils' ld linker script as the entry point

We can see the relevant part of that linker script with:

 ld -verbose a.o | grep ENTRY

which outputs:

ENTRY(_start)

The ELF file format (and other object format I suppose), explicitly say which address the program will start running at through the e_entry header field.

ENTRY(_start) tells the linker to set that entry the address of the symbol _start when generating the ELF file from object files.

Then when the OS starts running the program (exec system call on Linux), it parses the ELF file, loads the executable code into memory, and sets the instruction pointer to the specified address.

The -e flag mentioned by Sedat overrides the default _start symbol.

You can also replace the entire default linker script with the -T <script> option, here is a concrete example that sets up some bare metal assembly stuff.

.global is an assembler directive that marks the symbol as global in the ELF file

The ELF file contains some metadata for every symbol, indicating its visibility.

The easiest way to observe this is with the nm tool.

For example in a Linux x86_64 GAS freestanding hello world:

main.S

.text
.global _start
_start:
asm_main_after_prologue:
    /* write */
    mov $1, %rax   /* syscall number */
    mov $1, %rdi   /* stdout */
    lea msg(%rip), %rsi  /* buffer */
    mov $len, %rdx /* len */
    syscall

    /* exit */
    mov $60, %rax   /* syscall number */
    mov $0, %rdi    /* exit status */
    syscall
msg:
    .ascii "hello\n"
    len = . - msg

GitHub upstream

compile and run:

gcc -ffreestanding -static -nostdlib -o main.out main.S
./main.out

nm gives:

00000000006000ac T __bss_start
00000000006000ac T _edata
00000000006000b0 T _end
0000000000400078 T _start
0000000000400078 t asm_main_after_prologue
0000000000000006 a len
00000000004000a6 t msg

and man nm tells us that:

If lowercase, the symbol is usually local; if uppercase, the symbol is global (external).

so we see that _global is visible externally (upper case T), but the msg which we didn't mark as .global isn't (lower case t).

The linker then knows how to blow up if multiple global symbols with the same name are seen, or do smarter things is more exotic symbol types are seen.

If we don't mark _start as global, ld becomes sad and says:

cannot find entry symbol _start

1
votes

global _start is just a label that points to a memory address.In the case of _start when it comes to ELF binaries it is the default label used that acts as the address where the program starts.

There is also main or _main or main_ is known to the C language, and is called by "startup code" which is "usually" linked to - if you're using C.

Hope this helps.