When viewing the assembly output of the following code (no optimizations, -O2 and -O3 produce very similar results):
int main(int argc, char **argv)
{
volatile float f1 = 1.0f;
volatile float f2 = 2.0f;
if(f1 > f2)
{
puts("+");
}
else if(f1 < f2)
{
puts("-");
}
return 0;
}
GCC does something that I have a hard time following:
.LC2:
.string "+"
.LC3:
.string "-"
.text
.globl main
.type main, @function
main:
.LFB2:
pushq %rbp
.LCFI0:
movq %rsp, %rbp
.LCFI1:
subq $32, %rsp
.LCFI2:
movl %edi, -20(%rbp)
movq %rsi, -32(%rbp)
movl $0x3f800000, %eax
movl %eax, -4(%rbp)
movl $0x40000000, %eax
movl %eax, -8(%rbp)
movss -4(%rbp), %xmm1
movss -8(%rbp), %xmm0
ucomiss %xmm0, %xmm1
jbe .L9
.L7:
movl $.LC2, %edi
call puts
jmp .L4
.L9:
movss -4(%rbp), %xmm1
movss -8(%rbp), %xmm0
ucomiss %xmm1, %xmm0
jbe .L4
.L8:
movl $.LC3, %edi
call puts
.L4:
movl $0, %eax
leave
ret
Why does GCC move the the float values into xmm0 and xmm1 twice and also run ucomiss twice?
Wouldn't it be faster to do the following?
.LC2:
.string "+"
.LC3:
.string "-"
.text
.globl main
.type main, @function
main:
.LFB2:
pushq %rbp
.LCFI0:
movq %rsp, %rbp
.LCFI1:
subq $32, %rsp
.LCFI2:
movl %edi, -20(%rbp)
movq %rsi, -32(%rbp)
movl $0x3f800000, %eax
movl %eax, -4(%rbp)
movl $0x40000000, %eax
movl %eax, -8(%rbp)
movss -4(%rbp), %xmm1
movss -8(%rbp), %xmm0
ucomiss %xmm0, %xmm1
jb .L8 # jump if less than
je .L4 # jump if equal
.L7:
movl $.LC2, %edi
call puts
jmp .L4
.L8:
movl $.LC3, %edi
call puts
.L4:
movl $0, %eax
leave
ret
I'm not at all a real assembly programmer, but it just seemed odd to me to have duplicate instructions running. Is there a problem with my version of the code?
Update
If you remove the volatile which I had originally and replace it with scanf(), you get the same results:
int main(int argc, char **argv)
{
float f1;
float f2;
scanf("%f", &f1);
scanf("%f", &f2);
if(f1 > f2)
{
puts("+");
}
else if(f1 < f2)
{
puts("-");
}
return 0;
}
And the corresponding assembler:
.LCFI2:
movl %edi, -20(%rbp)
movq %rsi, -32(%rbp)
leaq -4(%rbp), %rsi
movl $.LC0, %edi
movl $0, %eax
call scanf
leaq -8(%rbp), %rsi
movl $.LC0, %edi
movl $0, %eax
call scanf
movss -4(%rbp), %xmm1
movss -8(%rbp), %xmm0
ucomiss %xmm0, %xmm1
jbe .L9
.L7:
movl $.LC1, %edi
call puts
jmp .L4
.L9:
movss -4(%rbp), %xmm1
movss -8(%rbp), %xmm0
ucomiss %xmm1, %xmm0
jbe .L4
.L8:
movl $.LC2, %edi
call puts
.L4:
movl $0, %eax
leave
ret
Final Update
After reviewing some of the follow up comments, it seems han (who commented under Jonathan Leffler's post) nailed this problem. GCC does not make the optimization not because it can't but because I hadn't told it to. It seems it all comes down to IEEE floating point rules and to handle the strict conditions GCC can't simply do a jump if above or jump if below after the first UCOMISS, because it needs to handle all the special conditions of floating point numbers. When using han's recommendation of the -ffast-math optimizer (none of the -Ox flags enable -ffast-math as it can break some programs) GCC does exactly what I was looking for:
The following assembly was produced using GCC 4.3.2 "gcc -S -O3 -ffast-math test.c"
.LC0:
.string "%f"
.LC1:
.string "+"
.LC2:
.string "-"
.text
.p2align 4,,15
.globl main
.type main, @function
main:
.LFB25:
subq $24, %rsp
.LCFI0:
movl $.LC0, %edi
xorl %eax, %eax
leaq 20(%rsp), %rsi
call scanf
leaq 16(%rsp), %rsi
xorl %eax, %eax
movl $.LC0, %edi
call scanf
movss 20(%rsp), %xmm0
comiss 16(%rsp), %xmm0
ja .L11
jb .L12
xorl %eax, %eax
addq $24, %rsp
.p2align 4,,1
.p2align 3
ret
.p2align 4,,10
.p2align 3
.L12:
movl $.LC2, %edi
call puts
xorl %eax, %eax
addq $24, %rsp
ret
.p2align 4,,10
.p2align 3
.L11:
movl $.LC1, %edi
call puts
xorl %eax, %eax
addq $24, %rsp
ret
Notice the two UCOMISS instructions are now replaced with one COMISS directly followed by a JA (jump if above) and JB (jump if below). GCC is able to nail this optimization if you let it using -ffast-math!
UCOMISS vs COMISS (http://www.softeng.rl.ac.uk/st/archive/SoftEng/SESP/html/SoftwareTools/vtune/users_guide/mergedProjects/analyzer_ec/mergedProjects/reference_olh/mergedProjects/instructions/instruct32_hh/vc315.htm): "The UCOMISS instruction differs from the COMISS instruction in that it signals an invalid SIMD floating-point exception only when a source operand is an SNaN. The COMISS instruction signals invalid if a source operand is either a QNaN or an SNaN."
Thanks again everyone for the helpful discussion.
volatile
qualifier. Remove it and compare. – Kerrek SB