Return address prediction stack buffer vs stack-stored return address?

Question

Have been reading Agner Fog's "The microarchitecture of Intel, AMD and VIA CPUs" and on page 34 he describes "return address prediction":

http://www.agner.org/optimize/microarchitecture.pdf

3.15 Returns (all processors except P1)

A better method is used for returns. A Last-In-First-Out buffer, called the return stack buffer,remembers the return address every time a call instruction is executed, and it uses this for predicting where the corresponding return will go. This mechanism makes sure that return instructions are correctly predicted when the same subroutine is called from several different locations.

I am a little unclear what the need for this is, given that the return addresses are stored on the stack anyway?

So what is the purpose of storing return addresses on the stack if there is also this technique? Is the stack-stored value only used if this prediction technique doesnt work?

You cannot assume that the processor can predict exactly where in the stack the return address is stored. The ESP register is very often restored just before a return as part of the epilogue of a function. — Hans Passant
@HansPassant ah so we're trying to predict the return address, say 15 CPU cycles before the ret instruction is due to be called because 15 CPU cycles before its called we have no idea what could happen to ESP? — user997112

Brian Brian · Accepted Answer · 2014-03-17T00:05:01

Predictors are normally part of the fetch stage, in order to determine which instructions to fetch next. This takes place before the processor has decoded the instructions, and therefore doesn't even know with certainty that a branch instruction exists. Like all predictors, the intent of the return address predictor is to get the direction / target of the branch faster. A return instruction is a branch, and so it would normally have a branch predictor entry to determine whether it is taken and where the target is. The return address predictor is consulted in lieu of the normal branch target buffer.

So perhaps 50 instructions before the return statement is actually "executed", the fetch stage predicts a return instruction and the instruction address to fetch next. Later, when the return is executed, the address is read from the stack and compared with where the return was predicted to go. If they are the same, execution continues, else execution is rolled back to use the proper return address.

Why store on the stack? First, the processor does not know if the predictor has worked without comparing against the address stored on the stack. Second, the stack is the "official" return address, which might be changed for legitimate reasons. Third, the return address predictor has a limited number of entries. The stack is needed for the return instructions for which there was not room to store the addresses in the predictor.

Return address prediction stack buffer vs stack-stored return address?

2 Answers