Why is a conditional move not vulnerable for Branch Prediction Failure?

Question

After reading this post (answer on StackOverflow) (at the optimization section), I was wondering why conditional moves are not vulnerable for Branch Prediction Failure. I found on an article on cond moves here (PDF by AMD). Also there, they claim the performance advantage of cond. moves. But why is this? I don't see it. At the moment that that ASM-instruction is evaluated, the result of the preceding CMP instruction is not known yet.

By the way, you might like to know that in my experience on Intel Core2 and Core-i7 CPUs, cmov is not always a performance win. In my tests the branch itself was better as long as the prediction rate was above approx 99%. That might sound high, but is pretty common on Intel's branch predictors. In particular this happens with branches-inside-loops: say a branch that iterates 1000 times, and on the 999th time it does something different. Such a case would always be more efficient using conditional jump rather than cmov. — jstine
@NikolaiTrandafil: That would totally depend on the compiler you chose, which compilation flags you enabled and the target ISA. — Martijn Courteaux
Related: Is CMOVcc considered a branching instruction? - no, it's an ALU select operation. Answer includes some links to details on the performance tradeoff. — Peter Cordes

Pascal Cuoq Pascal Cuoq · Accepted Answer · 2013-01-03T00:15:43

Mis-predicted branches are expensive

A modern processor generally executes between one and three instructions each cycle if things go well (if it does not stall waiting for data dependencies for these instructions to arrive from previous instructions or from memory).

The statement above holds surprisingly well for tight loops, but this shouldn't blind you to one additional dependency that can prevent an instruction to be executed when its cycle comes: for an instruction to be executed, the processor must have started to fetch and decode it 15-20 cycles before.

What should the processor do when it encounters a branch? Fetching and decoding both targets does not scale (if more branches follow, an exponential number of paths would have to be fetched in parallel). So the processor only fetches and decodes one of the two branches, speculatively.

This is why mis-predicted branches are expensive: they cost the 15-20 cycles that are usually invisible because of an efficient instruction pipeline.

Conditional move is never very expensive

Conditional move does not require prediction, so it can never have this penalty. It has data dependencies, same as ordinary instructions. In fact, a conditional move has more data dependencies than ordinary instructions, because the data dependencies include both “condition true” and “condition false” cases. After an instruction that conditionally moves r1 to r2, the contents of r2 seem to depend on both the previous value of r2 and on r1. A well-predicted conditional branch allows the processor to infer more accurate dependencies. But data dependencies typically take one-two cycles to arrive, if they need time to arrive at all.

Note that a conditional move from memory to register would sometimes be a dangerous bet: if the condition is such that the value read from memory is not assigned to the register, you have waited on memory for nothing. But the conditional move instructions offered in instruction sets are typically register to register, preventing this mistake on the part of the programmer.

Why is a conditional move not vulnerable for Branch Prediction Failure?

5 Answers

Mis-predicted branches are expensive

Conditional move is never very expensive

cmov

branch