2
votes

I have difficulties understanding the assembly language output created by gcc of a simple C program.

Here's The C-Code of the program:

#include <stdio.h>
#include <stdlib.h>

int sum1=1;
int sum2=1;

int add(int s1, int s2){
    return s1+s2;
}

int main(int argc,char** agrv){
    int res=sum1+sum2;
    return 0;
}

Here's the assembly code created by gcc:

    .file   "main.c"
    .globl  sum1
    .data
    .align 4
sum1:
    .long   1
    .globl  sum2
    .align 4
sum2:
    .long   1
    .text
    .globl  add
    .def    add;    .scl    2;  .type   32; .endef
    .seh_proc   add
add:
    pushq   %rbp
    .seh_pushreg    %rbp
    movq    %rsp, %rbp
    .seh_setframe   %rbp, 0
    .seh_endprologue
    movl    %ecx, 16(%rbp)
    movl    %edx, 24(%rbp)
    movl    16(%rbp), %edx
    movl    24(%rbp), %eax
    addl    %edx, %eax
    popq    %rbp
    ret
    .seh_endproc
    .def    __main; .scl    2;  .type   32; .endef
    .globl  main
    .def    main;   .scl    2;  .type   32; .endef
    .seh_proc   main
main:
    pushq   %rbp
    .seh_pushreg    %rbp
    movq    %rsp, %rbp
    .seh_setframe   %rbp, 0
    subq    $48, %rsp
    .seh_stackalloc 48
    .seh_endprologue
    movl    %ecx, 16(%rbp)
    movq    %rdx, 24(%rbp)
    call    __main
    movl    sum1(%rip), %edx
    movl    sum2(%rip), %eax
    addl    %edx, %eax
    movl    %eax, -4(%rbp)
    movl    $0, %eax
    addq    $48, %rsp
    popq    %rbp
    ret
    .seh_endproc
    .ident  "GCC: (x86_64-posix-seh-rev2, Built by MinGW-W64 project) 7.1.0"

I have difficulties understanding the order of the operands of some of the instructions in the assembly code (see also the memory layout picture for reference Memory Layout). First, there is the instruction

    pushq   %rbp

which pushes the base pointer of the caller onto the stack. After this instruction comes the following instruction:

    movq    %rsp, %rbp

This instruction should set the base pointer of the callee to the value of the current stack pointer. However, shouldn't the order of the two operands be the opposite (e.g. movq %rbp, %rsp)?

A similar "problem" occurs at the instruction:

    addl    %edx, %eax

Here, the result of the operation is stored in the register %edx instead of %eax (which is used to return the function argument).

Pretty much all sources I consulted so far on the Internet claimed that the result of an instruction is stored in the first argument of an instruction?

1
By default GCC outputs AT&T syntax. -masm=intel for Intel syntax (I think). The operands are inverted between the two conventions.Mat
If you enable optimizations the code will actually be reduced to the xor eax, eax ret or in your convention: xorl %eax, %eax ret0___________
@P__J__: gcc will also have to emit a standalone definition of add, because it's not static or inline. Like lea eax, [rdi+rsi] / ret, because those inputs are function args, not globals.Peter Cordes
@PeterCordes yes it will emit to the objext file. But in the executable this function will not be linked so at the end of the day it will end in those two instructions (+all startup, prologue etc etc)0___________
@P__J__: It won't be executed, but it will be present in the final executable. I just tried it, and there's a 0000000000000610 <add>: in the objdump output. (On Arch Linux, gcc7.3 plus standard ld from Binutils 2.29). Anyway, the OP is looking at the compiler's asm output, so that's what they'd see. (They could use __attribute__((noinline)) to see the asm for a function call without the noise of -O0. How to remove "noise" from GCC/clang assembly output?)Peter Cordes

1 Answers

3
votes

The GNU compiler generates assembly in "AT&T syntax" rather then Intel syntax as explained here:

The GNU Assembler, gas, uses a different syntax from what you will likely find in any x86 reference manual, and the two-operand instructions have the source and destinations in the opposite order. Here are the types of the gas instructions:

opcode                    (e.g., pushal)
opcode operand            (e.g., pushl %edx)
opcode source,dest        (e.g., movl %edx,%eax) (e.g., addl %edx,%eax)

Where there are two operands, the rightmost one is the destination. The leftmost one is the source.