2
votes

We recently discussed NOPs in our circuits class when talking about single cycle and pipeline processors. If we have the following code:

add $t1, $t2, $t3
sub $t4, $t1, $t0

There is a data hazard because of the $t1. In a pipeline processor, with out a data hazard detection unit, before the updated value of $t1 is written back into the register, the sub instruction uses the old value - thus a data hazard. If we add 2 NOPS then we can solve this issue or, if there is a data hazard detection unit, we can forward the result fo $t1 after the execution phase.

However what if we have a branch instruction? For example:

add $t1, $t2, $t3
beq $t0, $t1, Label

Do we also add 2 NOPS here, if we can't use forwarding?

2
Real MIPS I / R2000 evaluates branch conditions in the first half of the EX stage, so yes forwarding to branches is the same as to any ALU instruction. On a hypothetical MIPS without forwarding, who knows? Other possibilities include evaluating branch conditions in the ID stage (like I used to think real MIPS did) How does MIPS I forward from EX to ID for branches without stalling? has the full details on what MIPS actually does.Peter Cordes

2 Answers

2
votes

In the standard pipelined MIPS processor without branch prediction, the equality test of the beq is performed in the ALU, i.e. in the EX stage — which means that it is subject to the same ALU -> ALU hazard and the same bypass would mitigate that hazard.

(This speaks nothing to stalls for pipeline refill that might happen after the branch instruction, from taken or mispredicted branches, but just to the delay for the data dependency you're showing.)

In the case of a theoretical processor that was pipelined but did not have hazard protection (of bypass or delays), the same 2 nops would be required as for your first scenario.

1
votes

It is very difficult to definitely answer without extra details on the architecture. And there so many version of the mips architecture hanging around.

But first have a look at your claims

add $t1, $t2, $t3
sub $t4, $t1, $t0

There is a data hazard because of the $t1...

Right

If we add 2 NOPS then we can solve this issue

Not really. Without any data forwarding mean, with one NOP, new $t1 will be in the MM/WR pipeline regs, and with a second NOP, it will be written back to the register bank. But not to the DI/EX pipeline reg. So to get a proper behavior with only two NOPS, you need either a mean to forward data written back to the register bank to the DI/EX regs, or use tricks like writing on the falling edge of the clock for the register bank and reading it during the second part of the cycle.

We will assume that your assumption is true and that there is some kind of forwarding between the input and output of the register bank.

Concerning the branch instructions, there are several ways to implement them.

The most obvious way is to use the EX stage to compute simultaneously the condition ($t0=?$t1) with the ALU and the branch address with an additional adder. But is has a major drawback: while this computation is done the LI stage is fetching a new instruction (and one is already at the decode stage), that leads to a 2 cycles branch penalty.

What was done in the classic mips pipeline is that the branches were processed at the decode stage. An adder computes the branch address with PC+immediate in this stage and a dedicated comparator was added to directly compare the outputs of the register bank (BTW, it is the reason why you can only have comparisons eq/neq for branch instructions, in order to simplify this comparator, while the ALU comparator can do other kinds of comparison). This way, the branch penalty is only one cycle.

If we assume that this is your actual architecture, and that we have no forwarding means, except for the register bank, then one NOP is sufficient. After one NOP, the new value of $t1 is in the MEM/WR pipeline regs. And at next cycle, it will be written back to the register bank during the first half cycle and can be used for the comparison of the branch during the second half cycle.

Of course, if you assume that the branch is processed during the EX stage (and that you have a 2-cycles branch penalty), then a second NOP is required.