You don't need to make it that complicated
int main ( void )
{
return(555);
}
gcc -O2 -c so.c -o so.o
objdump -D so.o
0000000000000000 <main>:
0: b8 2b 02 00 00 mov $0x22b,%eax
5: c3 retq
Or you could look at the assembly generated, I prefer to disassemble, so now I can
.globl main
main:
mov $0x22b,%eax
retq
as so.s -o so.o
gcc so.o -o so
./so
And of course nothing comes out but
so.s
.globl fun
fun:
mov $0x22b,%eax
retq
so.c
#include <stdio.h>
int fun ( void );
int main ( void )
{
printf("%u\n",fun());
return(0);
}
as so.s -o fun.o
gcc so.c fun.o -o so
./so
555
And of course you can then complicate it as much as you like beyond that.
gcc outputs gnu assembler so
int fun ( void )
{
return(333);
}
gcc -O2 -save-temps -c so.c -o so.o
cat so.s
.file "so.c"
.section .text.unlikely,"ax",@progbits
.LCOLDB0:
.text
.LHOTB0:
.p2align 4,,15
.globl fun
.type fun, @function
fun:
.LFB0:
.cfi_startproc
movl $333, %eax
ret
.cfi_endproc
.LFE0:
.size fun, .-fun
.section .text.unlikely
.LCOLDE0:
.text
.LHOTE0:
.ident "GCC: (Ubuntu 5.4.0-6ubuntu1~16.04.12) 5.4.0 20160609"
.section .note.GNU-stack,"",@progbits
And although they often have an excess of directives (useful for debuggers and other things but not always used/required), you can use this fact to help to some extent to learn gnu assembler for this target (x86-64), but you of course need the documentation from the processor vendor (Intel in this case). Understanding that the syntax in that document is not necessarily the syntax used by any particular toolchain that you have or will use, you have to be multi-lingual there but you see what the instructions are and what they do and their limits, etc.
MARS and other similar environments are quite useful for teaching and are often designed for that reason leaving out a lot of the traps that you can fall into. The goal being to learn the instruction set by playing with a simulator and get your feet wet in assembly language. I am not a fan of an assembly interface, for educational purposes I think the student should generate/see the machine code, and perhaps within that sim you can, I have only used it for SO questions, I use real or simulated MIPS processors if I want to play with MIPS.
Assembly language is specific to the tool not the target, assume that each assembler for any target has its own assembly language and if there happens to be overlap then so be it.
global fun
fun:
mov eax, 333
ret
nasm so.s -felf64 -o so.o
gcc so.c so.o -o so
./so
333
There is the well known Intel vs AT&T thing but those are not syntaxes those are source destination swapping from the Intel standard. nasm doesn't like .globl, try it it likes global without the dot.
.globl fun
fun:
movl %eax, $333
ret
so.s:1: error: attempt to define a local label before any non-local labels
so.s:1: error: parser: instruction expected
so.s:3: error: parser: instruction expected
globl fun
fun:
movl %eax, $333
ret
nasm so.s -felf64 -o so.o
so.s:1: error: parser: instruction expected
so.s:3: error: parser: instruction expected
globl fun <-- note this is line 1
fun:
mov %eax, $333 <--- this is line 3
ret
nasm so.s -felf64 -o so.o
so.s:1: error: parser: instruction expected
so.s:3: error: expression syntax error
globl fun
fun:
mov eax, 333
ret
nasm so.s -felf64 -o so.o
so.s:1: error: parser: instruction expected
global fun
fun:
mov eax, 333
ret
And nasm is happy
as so.s -o so.o
so.s: Assembler messages:
so.s:1: Error: no such instruction: `global fun'
so.s:3: Error: too many memory references for `mov'
.global fun
fun:
mov 333, eax
ret
so.s: Assembler messages:
so.s:3: Error: too many memory references for `mov'
.global fun
fun:
mov $333, eax
ret
so.s: Assembler messages:
so.s:3: Error: no instruction mnemonic suffix given and no register operands; can't size instruction
.global fun
fun:
movl $333, eax
ret
and as is happy BUT, this is broken it thinks eax is a label to be filled in later
0000000000000000 <fun>:
0: c7 04 25 00 00 00 00 movl $0x14d,0x0
7: 4d 01 00 00
b: c3 retq
.global fun
fun:
movl $333, %eax
ret
0000000000000000 <fun>:
0: b8 4d 01 00 00 mov $0x14d,%eax
5: c3 retq
.global fun
fun:
movl $333, %eax
retq
0000000000000000 <fun>:
0: b8 4d 01 00 00 mov $0x14d,%eax
5: c3 retq
.global fun
fun:
mov $333, %eax
retq
0000000000000000 <fun>:
0: b8 4d 01 00 00 mov $0x14d,%eax
5: c3 retq
nasm:
global fun
fun:
mov eax, 333
ret
0000000000000000 <fun>:
0: b8 4d 01 00 00 mov $0x14d,%eax
5: c3 retq
Same machine code, different assembly language in more ways than just reversing the source and destination (I used objdump to disassemble so that is why you see that syntax).
gas takes .globl or .global. Since the size of the mov is obvious due to the eax register which is 32 bits the suffix isn't needed movl or mov apparently work with the binutils I have. Likewise ret vs retq produced the same instruction.
The joys of assembly language especially with a painful target like x86 (the last instruction set you want to learn there is a list of more useful/better ones).
But you can see that assembly language can/does differ for the same target the same instructions based on the tool used. And something like MARS starts to make even more sense for that use case.
You won't go wrong learning the gcc/binutils (gnu) tools as you can use them on Windows, Mac, Linux, BSD, etc and all but the system calls and possibly binary file formats are going to be the same experience (okay linker scripts, OS specific stuff will differ).
Depending on the target there may be other good choices too. nasm is popular for the folks that learned Intel syntax from the old days and I suppose others, as well as code that may have been laying about for a while that gas pukes on you might have half a chance with nasm.
And one or the other or both have command line options for the Intel vs ATT source/destination swapping.
.text
is just a shortcut for.section .text
. See the GAS manual sourceware.org/binutils/docs/as. Re: "macro": sort of, yes translating asm source to binary machine language is a pretty simple process. You assemble bytes into a file, link it into an executable, then run it so the OS loads / maps that file into memory. Then the hardware runs it. Other than directives to control what goes where, yes, asm as a text language is just a convenient way to define what bytes of machine code you you want in the output. Each line (instruction) assembles independently. – Peter Cordes