59
votes

Sometimes compilers generate code with weird instruction duplications that can safely be removed. Consider the following piece of code:

int gcd(unsigned x, unsigned y) {
  return x == 0 ? y : gcd(y % x, x);
}

Here is the assembly code (generated by clang 5.0 with optimizations enabled):

gcd(unsigned int, unsigned int): # @gcd(unsigned int, unsigned int)
  mov eax, esi
  mov edx, edi
  test edx, edx
  je .LBB0_1
.LBB0_2: # =>This Inner Loop Header: Depth=1
  mov ecx, edx
  xor edx, edx
  div ecx
  test edx, edx
  mov eax, ecx
  jne .LBB0_2
  mov eax, ecx
  ret
.LBB0_1:
  ret

In the following snippet:

  mov eax, ecx
  jne .LBB0_2
  mov eax, ecx

If the jump doesn't happen, eax is reassigned for no obvious reason.

The other example is two ret's at the end of the function: one would perfectly work as well.

Is the compiler simply not intelligent enough or there's a reason to not remove the duplications?

2
clang, c or c++? - Khalil Khalaf
Interesting that gcc doesn't do that: godbolt.org/g/MxTiaY. - lisyarus
@lisyarus gcc does do it - Justin
How would adding a mov instruction affect the branch predictor? - Oliver Charlesworth
Please feel free to report this missed optimization to both gcc and llvm. The 2 'mov' are in different basic blocks, which makes the optimization a bit harder, but still desirable. - Marc Glisse

2 Answers

42
votes

Compilers can perform optimisations that are not obvious to people and removing instructions does not always make things faster.

A small amount of searching shows that various AMD processors have branch prediction problems when a RET is immediately after a conditional branch. By filling that slot with what is essentially a no-op, the performance problem is avoided.

Update:

Example reference, section 6.2 of the "Software Optimization Guide for AMD64 Processors" (see http://support.amd.com/TechDocs/25112.PDF) says:

Specifically, avoid the following two situations:

  • Any kind of branch (either conditional or unconditional) that has the single-byte near-return RET instruction as its target. See “Examples.”

  • A conditional branch that occurs in the code directly before the single-byte near-return RET instruction.

It also goes into detail on why jump targets should have alignment which is also likely to explain the duplicate RETs at the end of the function.

5
votes

Any compiler will have a bunch of transformations for register renaming, unrolling, hoisting, and so on. Combining their outputs can lead to suboptimal cases such as what you have shown. Marc Glisse offers good advice: it's worth a bug report. You are describing an opportunity for a peephole optimizer to discard instructions that either

  • don't affect the state of registers & memory at all, or
  • don't affect state that matters for a function's post-conditions, won't matter for its public API.

Sounds like an opportunity for symbolic execution techniques. If the constraint solver finds no branch points for a given MOV, perhaps it is really a NOP.