I wrote up this program to do the same as the hi.c program, without the c lib call. Then followed a suggestion to use the -S gcc option on hi.c then to dissect the resulting hi.s program.
$ cat hiasm.asm
section .text
global _start
_start:
mov dl, 5
mov esi, msg
xor di,di
xor al,al
inc di
inc al
syscall
xor rdi,rdi
mov al,60
syscall
msg: db "Hello"
$ nasm -f elf64 hiasm.asm && ld -m elf_x86_64 hiasm.o -o hiasm && ./hiasm
Hello
$ echo $?
0
So this works fine
again, here's the simple hi.c
$ cat hi.c
#include <stdio.h>
int main(void)
{
puts("Hello");
return 0;
}
$ gcc -s hi.c && cat hi.s
.file "hi.c"
.section .rodata
.LC0:
.string "Hello"
.text
.globl main
.type main, @function
main:
.LFB0:
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
leaq .LC0(%rip), %rdi
call puts@PLT
movl $0, %eax
popq %rbp
.cfi_def_cfa 7, 8
ret
.cfi_endproc
.LFE0:
.size main, .-main
.ident "GCC: (Debian 6.3.0-18) 6.3.0 20170516"
.section .note.GNU-stack,"",@progbits
$ gcc hi.s -o hi && ./hi
Hello
The labels .LFB0 and .LFE0 do not appear to be referenced within the .s file
After removing both the file still works as expected,
referencing 'as' assembler docs:
https://sourceware.org/binutils/docs/as/index.html
Local symbols are defined and used within the assembler, but they are
normally not saved in object files. Thus, they are not visible when
debugging. You may use the `-L' option (see Include Local Symbols) to
retain the local symbols in the object files.
So as a pure executable with no need for bells and whistles, they can be chopped
So I got rid of the easy ones
Next the function wants to call main, there's not much use for this, so I'll call _start
For ELF targets, the .size directive is used like this:
.size name , expression
This directive sets the size associated with a symbol name. The size
in bytes is computed from expression which can make use of label
arithmetic. This directive is typically used to set the size of
function symbols.
Don't need function symbol sizes, got rid of the .size at the bottom that references main
$cat hi.s
.
file "hi.c" ##tells 'as' that we are about to start a new logical file
.section .rodata ##assembles the following code into section '.rodata'
.LC0: ##.LC0, .LFB0, .LFE0 are just local labels; symbols that
## are guaranteed to be unique over the source code
## that allow the compiler to use names/simple notation
## to reference sections of code
##But here, only .LC0 is actually referenced in the code
.string "Hello" ##
.text
.globl _start
_start:
.cfi_startproc ##used at the beginning of each function that should have an
##entry in .eh_frame. It initializes some internal data
##structures. Don't forget to close by .cfi_endproc
pushq %rbp ##push base pointer onto stack
.cfi_def_cfa_offset 16 ##modifies a rule for computing CFA. Register remains the
##same, but offset is new. Note that it is the absolute
##offset that will be added to a defined register to
##compute CFA address
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
leaq .LC0(%rip), %rdi
call puts@PLT
movl $0, %eax
popq %rbp
.cfi_def_cfa 7, 8
ret
.cfi_endproc ##close of .cfi_startproc
.ident "GCC: (Debian 6.3.0-18) 6.3.0 20170516"
.section .note.GNU-stack,"",@progbits
Trying that:
$ gcc -o hi hi.s
/tmp/ccLxG1jh.o: In function `_start':
hi.c:(.text+0x0): multiple definition of `_start'
/usr/lib/gcc/x86_64-linux-gnu/6/../../../x86_64-linux-gnu/Scrt1.o:(.text+0x0): first defined here
/usr/lib/gcc/x86_64-linux-gnu/6/../../../x86_64-linux-gnu/Scrt1.o: In function `_start':
(.text+0x20): undefined reference to `main'
collect2: error: ld returned 1 exit status
$ ldd hi
linux-vdso.so.1 (0x00007fffb6569000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fe7456e7000)
/lib64/ld-linux-x86-64.so.2 (0x000055edc8bc8000)
It's definitely using libc, which explains our multiple definitions of _start
So I'll try getting rid of std lib with the -nostdlib gcc option
$ gcc -nostdlib -o hi hi.s
/tmp/ccV5QYaT.o: In function `_start':
hi.c:(.text+0xc): undefined reference to puts'
collect2: error: ld returned 1 exit status
Right, still need C for puts, getting rid of puts
.file "hi.c" ##tells 'as' that we are about to start a new logical file
.section .rodata ##assembles the following code into section '.rodata'
.LC0: ##.LC0, .LFB0, .LFE0 are just local labels; symbols that
## are guaranteed to be unique over the source code
## that allow the compiler to use names/simple notation
## to reference sections of code
##But here, only .LC0 is actually referenced in the code
.string "Hello" ##
.text
.globl _start
_start:
.cfi_startproc ##used at the beginning of each function that should have an
##entry in .eh_frame. It initializes some internal data
##structures. Don't forget to close by .cfi_endproc
pushq %rbp ##push base pointer onto stack
.cfi_def_cfa_offset 16 ##modifies a rule for computing CFA. Register remains the
##same, but offset is new. Note that it is the absolute
##offset that will be added to a defined register to
##compute CFA address
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
leaq .LC0(%rip), %rsi ##this reg value and others were changed for write call
movq $1, %rax
movq $1, %rdi
movq $5, %rdx
syscall
movl $0, %eax
popq %rbp
.cfi_def_cfa 7, 8
ret
.cfi_endproc ##close of .cfi_startproc
$ gcc -nostdlib -o hi.s && ./hi
HelloSegmentation fault
Promising
.file "hi.c" ##tells 'as' that we are about to start a new logical file
.section .rodata ##assembles the following code into section '.rodata'
.LC0: ##.LC0, .LFB0, .LFE0 are just local labels; symbols that
## are guaranteed to be unique over the source code
## that allow the compiler to use names/simple notation
## to reference sections of code
##But here, only .LC0 is actually referenced in the code
.string "Hello"
.text
.globl _start
_start:
.cfi_startproc ##used at the beginning of each function that should have an
##entry in .eh_frame. It initializes some internal data
##structures. Don't forget to close by .cfi_endproc
##deleted the base pointer push and pops from stack, don't need stack
.cfi_def_cfa_offset 16 ##modifies a rule for computing CFA. Register remains the
##same, but offset is new. Note that it is the absolute
##offset that will be added to a defined register to
##compute CFA address
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
leaq .LC0(%rip), %rsi
movq $1, %rax
movq $1, %rdi
movq $5, %rdx
syscall
xor %rdi,%rdi
mov $60, %rax
.cfi_def_cfa 7, 8
syscall
.cfi_endproc ##close of .cfi_startproc
$ gcc -g -nostdlib -o hi hi.s && ./hi
Hello
Got it!
Trying to figure out what a CFA is
http://dwarfstd.org/doc/DWARF4.pdf
Section 6.4
An area of memory that is allocated on a stack called a “call frame.”
The call frame is identified by an address on the stack. We refer to
this address as the Canonical Frame Address or CFA. Typically, the
CFA is defined to be the value of th e stack pointer at the call
site in the previous frame (which may be different from its value on
entry to the current frame)
So then all .cfi_def_cfa_offset, .cfi_offset and .cfi_def_cfa_register are doing is computing,
and manipulating the stack. But this program doesn't need the stack at all, so might as well delete that too
$ cat hi.s
.file "hi.c" ##tells 'as' that we are about to start a new logical file
.section .rodata ##assembles the following code into section '.rodata'
.LC0: ##.LC0, .LFB0, .LFE0 are just local labels; symbols that
## are guaranteed to be unique over the source code
## that allow the compiler to use names/simple notation
## to reference sections of code
##But here, only .LC0 is actually referenced in the code
.string "Hello"
.text
.globl _start
_start:
.cfi_startproc ##used at the beginning of each function that should have an
##entry in .eh_frame. It initializes some internal data
##structures. Don't forget to close by .cfi_endproc
leaq .LC0(%rip), %rsi
movq $1, %rax
movq $1, %rdi
movq $5, %rdx
syscall
xor %rdi,%rdi
mov $60, %rax
syscall
.cfi_endproc ##close of .cfi_startproc
.cfi_startproc :
Used at the beginning of each function that should have an entry in
the .eh_frame
What is eh_frame
"When using languages that support exceptions, such as C++, additional information must be provided to the runtime environment that describes the call frames that much be unwound during the processing of an exception. This information is contained in the special sections .eh_frame and .eh_framehdr."
Don't need exception handling, not using C++
$ cat hi.s
.section .rodata
.LC0:
.string "Hello"
.text
.globl _start
_start:
leaq .LC0(%rip), %rsi
movq $1, %rax
movq $1, %rdi
movq $5, %rdx
syscall
xor %rdi,%rdi
mov $60, %rax
syscall
-static
option – Michael Petchwrt ..plt
on the end. So it would look likecall puts wrt ..plt
. I suspect you are on a more recent Ubuntu or Debian based system that defaults to compiling position independent executables. – Michael Petch/usr/bin/ld
on the GCC command line./usr/bin/ld
is an executable that links code. It should look likenasm -f elf64 -l hola.lst hola.asm && gcc -m64 -o hola hola.o
– Michael Petch-S
option:gcc -S hi.c -o hi.s
. When I do this, it's clear that GCC is usingcall puts
. Starting with GCC's assembler, you could remove stuff you don't understand (or learn why it's there) until you have a small assembly file that does what you want. For linking, if you usegcc hi.s -o hi
, then GCC will ensure that the C library is included properly. – Dave M.