This is one of those cases where the ABI starts to bleed in to the ISA. You'll find a handful of these floating around in RISC-V. As a result of us having a pretty significant software stack ported by the time we standardized the ISA we got to fine tune the ISA to match real code. Since an explicit goal of the base RISC-V ISAs was to keep a lot of encoding space available for future expansion.
In this case, the ABI design decision is to answer the question "Is there a canonical representation of types that, when stored in registers, do not need every bit pattern provided by those registers in order to represent every value representable by the type?" In the case of RISC-V we chose to mandate a canonical representation for all types. There's a feedback loop here with some ISA design decisions and I think the best way to go about this is to work through an example of what ISA would have co-evolved with an ABI where we didn't mandate a canonical representation.
As a thought exercise, let's assume that the RISC-V ABI did not mandate a canonical representation for the high bits of int
when stored in an X register on RV64I. The result here is that the existing W family of instructions wouldn't be particularly useful: you can use addiw t0, t0, 0
as a sign extension so the compiler can the rely on what's in the high-order bits, but that adds an additional instruction to many common patterns like compare+branch. The correct ISA design decision to make here would be to have a different set of W instructions, something like "compare on the low 32 bits and branch". If you run the numbers, you end up with about the same number of additional instructions (branch and set as opposed to add, sub, and shift). The issue is that the branch instructions are much more expensive in terms of encoding space because they have much longer offsets. Since encoding space is considered an important resource in RISC-V, when there is no clear performance advantage we tend to chose the design decision that conserves more encoding space. In this case there's no meaningful performance distinction as long as the ABI matches the ISA.
There's a second order design decision to be made here: is the canonical representation to sign extend or to zero extend? There's a trade off here: sign extension results in faster software (for the same amount of encoding space used), but more complicated hardware. Specifically, the common C fragment
long func_pos();
long func_neg();
long neg_or_pos(int a) {
if (a > 0) return func_pos();
return func_neg();
}
compiles very efficiently when sign extension is used
neg_or_pos:
bgtz a0,.L4
tail func_neg
.L4:
tail func_pos
but is slower when zero-extension is used (again, assuming we're unwilling to blow a lot of encoding space on word-sized compare+branch instructions)
neg_or_pos:
addiw a0, a0, 0
bgtz a0,.L4
tail func_neg
.L4:
tail func_pos
When we balanced things out, it appeared that the software cost of zero extension was higher than the hardware cost of sign extension: for the smallest possible design (ie, a microcoded implementation) you still need an arithmetic right shift so you don't lose any datapath, and for the biggest possible design (ie, a wide out of order core) the code would just end up shuffling bits around before branching. Oddly enough, the one place you pay a meaningful cost for sign extension is in in-order machines with short pipelines: you could shave a MUX delay off the ALU path, which is critical in some designs. In practice there are a lot of other places where sign extension is the right decision to make, so just changing this one wouldn't result in the removal of that datapath.
int
(32 bits wide on RISC-V) gets promoted toint
when doing any arithmetic on it. So, no C-family compiler needs any more support for 8- or 16-bit math thanLB
/LBU
andLH
/LHU
. – Davislor