Intel: serializing instructions and branch prediction

2

votes

The Intel Architecture's Developer's Manual (Vol3A, Section 8-26), says:

The Pentium processor and more recent processor families use branch-prediction techniques to improve performance by prefetching the destination of a branch instruction before the branch instruction is executed. Consequently, instruction execution is not deterministically serialized when a branch instruction is executed.

What does this mean?

It sounds really, really bad. It sounds like a serializing instruction like CPUID breaks branch prediction (or vice-versa), but that seems unlikely. Can any ASM folks help me understand what "non-deterministic" means in this context.

*Edited for clarity

assemblyintelcpu-architecturebranch-prediction

Why do you care? Super-scalar and out of order executions makes your processor much faster! – Basile Starynkevitch

It depends what you mean by "breaks branch prediction"; it does not produce incorrect results (which you seem to be worrying about); it just potentially slows things down by losing the speculatively executed results. – antlersoft

breaks = makes it branch to the wrong point. I inferred this from the phrase "non-deterministic". – Justicle

3

votes

It's very confusingly worded, but I believe that its actual meaning is simple: "branches do not (necessarily) serialize execution". We take this for granted today, but it was not always so.

2

votes

I suspect you've misunderstood that statement, but I can't tell exactly how. What connection do you see between serializing instructions and branch prediction? When it says "instruction execution is not deterministically serialized" what it means is that the prefetch and decode of instructions will be determined based on the branch prediction logic, and therefore it won't work the same way every time. But the point of this whole thing is to make things faster -- if the branch prediction is good, most of the time the correct next instructions will be fetched, decoded, and ready to go.

1

votes

A mispredicted branch is a serializing instruction, a correctly predicted branch is not.

Because you do not know if a branch is predicted correctly or not until it is executed, you can not know beforehand if it will serialize the instruction stream. The behaviour is non-deterministic because it depends on the branch prediction.

You could probably construct a corner case with memory accesses before and after a conditional branch, where the behaviour of the code depends on whether the branch was predicted correctly or not. (i.e. whether the branch was serializing or not.)

1

votes

Modern superscalar processors generally make instruction execution appear to be completely deterministic and in-order from the perspective of something outside of the CPU. Internally, it's fetching instructions ahead, executing instructions speculatively, and executing them in whatever order is most efficient. But anything that shouldn't have been executed (e.g. mispredicted branch) doesn't get committed, and memory accesses are generally put back into the right order before leaving the CPU. The tail-end of a CPU pipeline is called the "re-order buffer" because it's job is to track completing instructions and only permanently commit their results in program order. This is important for proper program behavior, particularly in the face of things like branch misprediction and exceptions; if an exception occurs (e.g. divide by zero), subsequent instructions may have been decoded and executed, and these have to be purged from the ROB and program counter reset properly before handing the exception off to the OS.

With regard to memory ordering, there are some exceptions to the illusion of program ordering, where reads can get reordered arbitrarily, and there can be some (possibly speculative) reordering between reads and writes, but you only care about that when talking to memory-mapped I/O hardware. There are instructions that ensure a particular ordering, and CPUs are very careful with the ordering of accesses to un-cached memory, because that's assumed to mean it's I/O.

Intel: serializing instructions and branch prediction

4 Answers