Why do compilers duplicate some instructions?

Question

Sometimes compilers generate code with weird instruction duplications that can safely be removed. Consider the following piece of code:

int gcd(unsigned x, unsigned y) {
  return x == 0 ? y : gcd(y % x, x);
}

Here is the assembly code (generated by clang 5.0 with optimizations enabled):

gcd(unsigned int, unsigned int): # @gcd(unsigned int, unsigned int)
  mov eax, esi
  mov edx, edi
  test edx, edx
  je .LBB0_1
.LBB0_2: # =>This Inner Loop Header: Depth=1
  mov ecx, edx
  xor edx, edx
  div ecx
  test edx, edx
  mov eax, ecx
  jne .LBB0_2
  mov eax, ecx
  ret
.LBB0_1:
  ret

In the following snippet:

  mov eax, ecx
  jne .LBB0_2
  mov eax, ecx

If the jump doesn't happen, eax is reassigned for no obvious reason.

The other example is two ret's at the end of the function: one would perfectly work as well.

Is the compiler simply not intelligent enough or there's a reason to not remove the duplications?

How would adding a mov instruction affect the branch predictor? — Oliver Charlesworth
Please feel free to report this missed optimization to both gcc and llvm. The 2 'mov' are in different basic blocks, which makes the optimization a bit harder, but still desirable. — Marc Glisse

janm janm · Accepted Answer · 2017-12-29T04:48:09

Compilers can perform optimisations that are not obvious to people and removing instructions does not always make things faster.

A small amount of searching shows that various AMD processors have branch prediction problems when a RET is immediately after a conditional branch. By filling that slot with what is essentially a no-op, the performance problem is avoided.

Update:

Example reference, section 6.2 of the "Software Optimization Guide for AMD64 Processors" (see http://support.amd.com/TechDocs/25112.PDF) says:

Specifically, avoid the following two situations:

Any kind of branch (either conditional or unconditional) that has the single-byte near-return RET instruction as its target. See “Examples.”
A conditional branch that occurs in the code directly before the single-byte near-return RET instruction.

It also goes into detail on why jump targets should have alignment which is also likely to explain the duplicate RETs at the end of the function.

Why do compilers duplicate some instructions?

2 Answers