How does data forwarding for data hazards work, in pipeline diagrams

Question

I'm having a bit of difficulty understanding pipeline hazards, especially data hazards and how we can use stall cycles and data forwarding to resolve these hazards. Here's an example of a MIPS assembly code with data and load-use hazards (pointed out):

B0: sll $t2, $a0, 2                        
B1: add $t1, $a0, $a1
B2: lw $t0, 4($t1)                           #data hazard $t1
B3: slt $v0, $t1, $a1                        #data hazard $t1
B4: sub $v1, $v0, $t3                        #data hazard $v0
B5: addi $t0, $t0, 4
B6: lw $t1,8($t0)                            #data hazard $t0
B7: sll $t1, $a1, 1
B8: add $t2, $t1, $v0                        #data hazard $t1
B9: lw $v0, 0($t2)                           #data hazard $t2
B10: lw $t3, 0($v0)                          #load-use hazard $v0
B11: and $v0, $t3, $a2                       #data hazard $t3
B12: sll $a1, $t3, 1                         #data hazard $t3

For this code we got a pipeline "diagram", where we have to fill in the stall cycles and data forwards. Since the diagram has 20 cycles I will only show the first 5 cycles and the data forwards in those 5 cycles. Pipeline Cycle - First 5 cycles with data forwarding.

In the diagram you can see that from BO we forward from EX to EX in B1, but why? Since we don't need any of the two registers from B0 in either B1 or B2, there is no data hazard regarding $t2 or $a0, thus no need for a data forward.

Additionally, how do I know if I have to forward from EX or MEM, because at times I'm not sure if I have to go from EX to EX, EX to MEM, MEM to EX or MEM to EX. I would like some clarification on this.

I don't think that diagram and the code sequence correspond. — Erik Eidt

Erik Eidt Erik Eidt · Accepted Answer · 2020-02-04T21:35:55

The nature of these pipelining hazards is whether or not the pipeline would otherwise try to work with stale values or not. Pipelining spreads the execution of a single instruction across stages (each stage occurs in a succeeding cycle), allowing each stage to have a shorter cycle time. In order to not be a wash on performance, pipelined processors overlap execution of successive instructions. It is this overlap that both speeds performance and introduces potential hazards.

During instruction decode, the register file is consulted for operands for the instruction being executed — in a single cycle (non-pipelined) processor, those values are always 100% architecturally correct, even if they were most recently updated by immediately preceding instructions.

Similarly, in a pipelined processor, the register file is consulted for operands for the instruction being executed: this happens specifically in the ID stage.

For any two back-to-back instructions, however, the WB stage of the 1st instruction will happen after the ID stage of 2nd instruction — and this the source of the hazard.

When back-to-back instructions have a register dependency, then there is a hazard (the ID stage has already read stale value(s) because the most up-to-date value(s) are not actually ready yet. Though, when back-to-back instructions don't have a dependency, there is no hazard as the ID stage already read the proper and most up-to-date values.

A forward (aka bypass) is used to obtain the proper, most up-to-date values, generally without stalling the processor (as would be required if we were to wait for the WB stage of a 1st instruction to complete before doing the ID stage of a 2nd instruction — that would be a two cycle delay).

The point being that we have three options:

forgo overlapped execution — which is not an option for performance
delay when there is a dependency — also not great for performance
use a forward or bypass

The forward/bypass works with the observation that the earliest time the proper value is available is at the end of the EX stage (for ALU hazards), and, that the latest time we could provide the most up-to-date value is at the beginning of the EX stage. Thus, if we can get values to go from the ALU output to the ALU input in between cycles, we can avoid the hazard without introducing stalls.

A forward (aka bypass) has two mechanisms:

Decide when necessary to engage
How to engage

Most complexity is in determining whether or not there is a dependency between two back-to-back instructions. We would need this complexity whether doing a stall or a bypass, so might as well do the latter, since the stalls suffer performance.

Roughly speaking, the logic to determine when to engage is to identify when there is a register/operand dependency between the target of one instruction and a source of the next, so the hardware is constantly looking for this between any two back-to-back instructions.

When such a dependency is found, the stale values that were read by the ID stage are to be overridden with proper values — in the case of the ALU hazard, then coming out of the ALU (output). So, the bypass mechanism, when it kicks in, ignores the ID stage provided operand values and overrides one (or the other or both) with the ALU output value.

How does data forwarding for data hazards work, in pipeline diagrams

1 Answers