12
votes

Have been reading Agner Fog's "The microarchitecture of Intel, AMD and VIA CPUs" and on page 34 he describes "return address prediction":

http://www.agner.org/optimize/microarchitecture.pdf

3.15 Returns (all processors except P1)

A better method is used for returns. A Last-In-First-Out buffer, called the return stack buffer,remembers the return address every time a call instruction is executed, and it uses this for predicting where the corresponding return will go. This mechanism makes sure that return instructions are correctly predicted when the same subroutine is called from several different locations.

I am a little unclear what the need for this is, given that the return addresses are stored on the stack anyway?

So what is the purpose of storing return addresses on the stack if there is also this technique? Is the stack-stored value only used if this prediction technique doesnt work?

2
You cannot assume that the processor can predict exactly where in the stack the return address is stored. The ESP register is very often restored just before a return as part of the epilogue of a function.Hans Passant
@HansPassant ah so we're trying to predict the return address, say 15 CPU cycles before the ret instruction is due to be called because 15 CPU cycles before its called we have no idea what could happen to ESP?user997112

2 Answers

15
votes

Predictors are normally part of the fetch stage, in order to determine which instructions to fetch next. This takes place before the processor has decoded the instructions, and therefore doesn't even know with certainty that a branch instruction exists. Like all predictors, the intent of the return address predictor is to get the direction / target of the branch faster. A return instruction is a branch, and so it would normally have a branch predictor entry to determine whether it is taken and where the target is. The return address predictor is consulted in lieu of the normal branch target buffer.

So perhaps 50 instructions before the return statement is actually "executed", the fetch stage predicts a return instruction and the instruction address to fetch next. Later, when the return is executed, the address is read from the stack and compared with where the return was predicted to go. If they are the same, execution continues, else execution is rolled back to use the proper return address.

Why store on the stack? First, the processor does not know if the predictor has worked without comparing against the address stored on the stack. Second, the stack is the "official" return address, which might be changed for legitimate reasons. Third, the return address predictor has a limited number of entries. The stack is needed for the return instructions for which there was not room to store the addresses in the predictor.

3
votes

On top of Brians' great explanation, there's the fact that the stack resides in memory. You don't want to have to go to the memory unit and do a memory lookup (not to mention address translation into physical), from some stack address everytime you want to predict the outcome of a branch. The branch prediction wants to be self-sufficient. You can also view the RSB as another form of caching data.