Branch prediction and performance

Question

I'm reading a book about computer architecture and I'm on this chapter talking about branch prediction. There is this little exercise that I'm having a hard time wrapping my head around it.

Consider the following inner for loop

for (j = 0; j < 2; j++)
{
    for (i = 10; i > 0; i = i-1)
        x[i] = x[i] + s
}

-------> Inner loop:

L.D       F0, 0(R1)
ADD.D     F4, F0, F2
S.D       F4, 0(R1)
DADDUI    R1, R1, -8
BNE       R1, R3, Loop

Assume register F2 holds the scalar s, R1 holds the address of x[10], and R3 is pre-computed to end the loop when i == 0;

a) How would a predictor that alternates between taken/not taken perform?

---- Since the loop is only executed 2 times, I think that the alternate prediction would harm the performance in this case (?) with 1 miss prediction.

b) Would a 1-bit branch prediction buffer improve performance (compare to a)? Assume the first prediction is "not taken", and no other branches map to this entry.

---- Assuming the first prediction is "not taken", and 1-bit predictor invert the bit if the prediction is wrong. So it will be NT/T/T. Does that make it have the same performance as problem a) ? with 1 miss prediction.

c) Would a 2-bit branch prediction buffer improve performance (compare to a)? Assume the first prediction is "not taken", and no other branches map to this entry.

---- 2-bit branch prediction starting with "not taken". As I remember 2 bit prediction change after it misses twice. So this prediction will go like NT/NT/T/T. Therefore its performance will be worse compare to a). 1 miss prediction

That was my attempt to answer the problems. Can anyone explain to me if my answer is right/wrong in more detail please? Thanks.

Peter Cordes Peter Cordes · Accepted Answer · 2016-05-18T09:57:30

Since the loop is only executed 2 times

You mean the outer-loop conditional, the one you didn't show asm for? I'm only answering part of the question for now, in case this confusion was your main issue. Leave a comment if this wasn't what had you confused.

The conditional branch at the bottom of the inner loop is executed 20 times, with this patter: 9xT, 1xNT, 9xT, 1xNT. An alternating predictor there would be wrong about 50% of the time, +/- 20% depending on whether it started right or wrong.

It's the outer loop that only runs twice: T,NT. The whole inner loop runs twice.

The outer loop branch would either be predicted perfectly or terribly, depending on whether the alternating prediction started with T or with NT.

Branch prediction and performance

1 Answers