I'm reading a book about computer architecture and I'm on this chapter talking about branch prediction. There is this little exercise that I'm having a hard time wrapping my head around it.
Consider the following inner for loop
for (j = 0; j < 2; j++)
{
for (i = 10; i > 0; i = i-1)
x[i] = x[i] + s
}
-------> Inner loop:
L.D F0, 0(R1)
ADD.D F4, F0, F2
S.D F4, 0(R1)
DADDUI R1, R1, -8
BNE R1, R3, Loop
Assume register F2 holds the scalar s, R1 holds the address of x[10], and R3 is pre-computed to end the loop when i == 0;
a) How would a predictor that alternates between taken/not taken perform?
---- Since the loop is only executed 2 times, I think that the alternate prediction would harm the performance in this case (?) with 1 miss prediction.
b) Would a 1-bit branch prediction buffer improve performance (compare to a)? Assume the first prediction is "not taken", and no other branches map to this entry.
---- Assuming the first prediction is "not taken", and 1-bit predictor invert the bit if the prediction is wrong. So it will be NT/T/T. Does that make it have the same performance as problem a) ? with 1 miss prediction.
c) Would a 2-bit branch prediction buffer improve performance (compare to a)? Assume the first prediction is "not taken", and no other branches map to this entry.
---- 2-bit branch prediction starting with "not taken". As I remember 2 bit prediction change after it misses twice. So this prediction will go like NT/NT/T/T. Therefore its performance will be worse compare to a). 1 miss prediction
That was my attempt to answer the problems. Can anyone explain to me if my answer is right/wrong in more detail please? Thanks.