4
votes

I am trying to learn more about assembly and which optimizations compilers can and cannot do.

I have a test piece of code for which I have some questions.

See it in action here: https://godbolt.org/z/pRztTT, or check the code and assembly below.

#include <stdio.h>
#include <string.h>

int main(int argc, char* argv[])
{
        for (int j = 0; j < 100; j++) {
                if (argc == 2 && argv[1][0] == '5') {
                        printf("yes\n");
                }
                else {
                        printf("no\n");
                }
        }

        return 0;
}

The assembly produced by GCC 10.1 with -O3:

.LC0:
        .string "no"
.LC1:
        .string "yes"
main:
        push    rbp
        mov     rbp, rsi
        push    rbx
        mov     ebx, 100
        sub     rsp, 8
        cmp     edi, 2
        je      .L2
        jmp     .L3
.L5:
        mov     edi, OFFSET FLAT:.LC0
        call    puts
        sub     ebx, 1
        je      .L4
.L2:
        mov     rax, QWORD PTR [rbp+8]
        cmp     BYTE PTR [rax], 53
        jne     .L5
        mov     edi, OFFSET FLAT:.LC1
        call    puts
        sub     ebx, 1
        jne     .L2
.L4:
        add     rsp, 8
        xor     eax, eax
        pop     rbx
        pop     rbp
        ret
.L3:
        mov     edi, OFFSET FLAT:.LC0
        call    puts
        sub     ebx, 1
        je      .L4
        mov     edi, OFFSET FLAT:.LC0
        call    puts
        sub     ebx, 1
        jne     .L3
        jmp     .L4

It seems like GCC produces two versions of the loop: one with the argv[1][0] == '5' condition but without the argc == 2 condition, and one without any condition.

My questions:

  • What is preventing GCC from splitting away the full condition? It is similar to this question, but there is no chance for the code to get a pointer into argv here.
  • In the loop without any condition (L3 in assembly), why is the loop body duplicated? Is it to reduce the number of jumps while still fitting in some sort of cache?
1
I'd guess that GCC doesn't know that printf won't modify memory pointed-to by argv. It would need special rules for main and printf / puts to know that that char ** arg won't ever point directly or indirectly point to memory that some non-inline function call named puts might modify. Re: unrolling: that's odd, -funroll-loops isn't on by default for GCC at -O3, only with -O3 -fprofile-usePeter Cordes
@PeterCordes: thanks for the information. When I modify the program to copy argv[1][0] into a local char variable first, GCC does move the full condition outside the loop. Would (theoretically) compiling puts() together with this main() allow the compiler to see puts() isn't touching argv and optimize the loop fully?Tomas Creemers
Yes, e.g. if you'd written your own write function that uses an inline asm statement around a syscall instruction, with a memory input operand (and no "memory" clobber) then it could inline. (Or maybe do inter-procedural optimization without inlining.)Peter Cordes
FYI I found what causes the duplicated loop body: -freorder-blocks-algorithm=stc: stc’, the “software trace cache” algorithm, which tries to put all often executed code together, minimizing the number of branches executed by making extra copies of code.Tomas Creemers

1 Answers

2
votes

GCC doesn't know that printf won't modify memory pointed-to by argv, so it can't hoist that check out of the loop.

argc is a local variable (that can't be pointed-to by any pointer global variable), so it knows that calling an opaque function can't modify it. Proving that a local variable is truly private is part of Escape Analysis.

The OP tested this by copying argv[1][0] into a local char variable first: that let GCC hoist the full condition out of the loop.


In practice argv[1] won't be pointing to memory that printf can modify. But we only know that because printf is a C standard library function, and we assume that main is only called by the CRT startup code with the actual command line args. Not by some other function in this program that passes its own args. In C (unlike C++), main is re-entrant and can be called from within the program.

Also, in GNU C, printf can have custom format-string handling functions registered with it. Although in this case, the compiler built-in printf looks at the format string and optimizes it to a puts call.

So printf is already partly special, but I don't think GCC bothers to look for optimizations based on it not modifying any other globally-reachable memory. With a custom stdio output buffer, that might not even be true. printf is slow; saving some spill / reloads around it is generally not a big deal.


Would (theoretically) compiling puts() together with this main() allow the compiler to see puts() isn't touching argv and optimize the loop fully?

Yes, e.g. if you'd written your own write function that uses an inline asm statement around a syscall instruction (with a memory input-only operand to make it safe while avoiding a "memory" clobber) then it could inline and assume that argv[1][0] wasn't changed by the asm statement and hoist a check based on it. Even if you were outputting argv[1].

Or maybe do inter-procedural optimization without inlining.


Re: unrolling: that's odd, -funroll-loops isn't on by default for GCC at -O3, only with -O3 -fprofile-use. Or if enabled manually.