0
votes

I understand that all MIPS instructions are 4 bytes long. Since the PC contains the address of the next instruction, the first 2 bits of the PC would always be 0. Including these 2 bits in the PC seems to reduce the range of the PC, so why is the PC implemented this way?

I am new to computer architecture, so please do point out any gaps in my understanding of the concept.

1

1 Answers

3
votes

The mips [32 bit] architecture can only have addresses that are 32 bits. This means a 4GB address space. So, a PC that is 32 bits wide and uses byte addressing can address anything in that space.

What you're thinking of is, instead of the PC containing a byte address, where the rightmost two bits are always zero because instructions must be 4 byte/word aligned, and seem to be "wasted", why not have the PC contain a word address that would be left shifted by two bits to produce a 34 bit address. This would span 16GB.

But, that would exceed what the mips memory system is capable of addressing. So, nothing is gained by this, because the wider resultant address can't be used because it exceeds the addressability of the architecture. So, with byte addresses, nothing is really wasted.

All address calculations for the entire 32 bit/4GB address space fit in 32 bit wide registers. On 64 bit architectures, the registers are 64 bit and can span a much larger range.

So, anyway, the PC itself holds byte addresses, but ...


... Where your idea can be used and is used is when encoding target offsets in branch instructions. They are of the form:

00000000     beqz    $t0,XXXX
00000004     nop

mips is somewhat unique from other architectures:

XXXX is a signed 16 bit word offset relative to PC + 4. In this case, PC + 4 is 0x00000004. We take XXXX and sign extend it to 32 bits. Then, we left shift it by two bits. Then, we add it to PC + 4 to get the final target address of the branch. By "we", I mean the mips branch instruction hardware.

Consider the reverse where we have the following program fragment:

00000000            nop
00000004:           nop
00000008    loop:   nop
0000000C            nop
00000010            nop
00000014            beqz    $t0,loop
00000018            nop

To arrive at the correct value for XXXX in the branch instruction, the assembler takes the address of the label loop: and subtracts PC + 4 from it to produce the relative byte offset. Here, the address of loop is 0x00000008 and PC + 4 is 0x00000018, so we have 0x08 - 0x18, which is -0x10 or 0xFFFFFFF0. This is a byte offset, so we right shift it by two bits to produce a word offset: 0xFFFFFFFC. We use the lower 16 bits of this for XXXX, so we have FFFC

Because branch instructions use word offsets instead of byte offsets, they don't "waste" the two "must be zero" bits. They take advantage of this to extend the range of the branch instruction byte offset from -32768 to 32767 to -131072 to 131068.