Direct Arithmetic Operations on Small-sized Numbers in RISC Architectures

Question

Are there any RISC architectures which allow arithmetic operations to be applied individually to bytes, half-words and other data cells, whose size is less than the size of the CPU general purpose registers?

In Intel x86 (IA-32) and x86-64 (known as EM64T or AMD64) processors not only the whole register is available, but its smaller parts are operable as well. Intel ISA allows to perform all the arithmetic operations on the whole register, it's half, quarter and a byte (to be more precise, two bytes in the register are available, for example AL and AH in RAX). After the operation is performed, we can make an overflow check, and if an overflow has occurred during the previous operation, it can be easily handled. No matter whether we've operated on the whole word (32-bit wide for IA-32 and 64-bit wide for EM64T) or the arithmetic instruction was executed over the data of smaller size (half-word, quarter-word or a byte), if the result exceeds the size of the chosen data cell, the corresponding flag (OF or CF) will be set to 1. So in Intel architecture there is no need to emulate processing such errors, which occur in operations with small-sized data, with a chain of instructions analyzing higher bits of the result.

The question is are there any RISC architectures in which direct arithmetic operations on small data are possible, these operations are implemented by means of the processor hardware (no software emulation is required to perform them), and overflows, carries and borrows occurring in such operations with bytes, half-words etc. are traced by the processor equipment, they should not be checked in a software manner. Or perhaps this approach contradicts the whole RISC philosophy and no RISC processor neither in the present nor in the past has ever implemented it?

Tagging x86 because the question is basically asking "is RISC like x86, and if not, why not?" — Peter Cordes

Peter Cordes Peter Cordes · Accepted Answer · 2017-12-06T06:27:17

TL:DR: no, AFAIK there are no RISC ISAs with flag-setting partial-register ops narrower than 32 bits. But many 64-bit RISC ISAs (like AArch64) that have FLAGS at all can set them from the result of a 32-bit op.

See the last section: this is because of a general lack of demand for software integer overflow checking, or a chicken/egg problem. Usually you just need to compare/branch on 16-bit values, and you can do that just fine with them zero or sign extended to 32 or 64 bit.

Only a RISC where the register width is 8 or 16 bits can set flags from that operand-size. e.g. AVR 8-bit RISC with 32 registers and 16-bit instruction words. It needs extended-precision add/adc just to implement 16-bit int.

This is mostly a historical thing: x86 has 16-bit operand-size for everything because of the way it evolved from 16-bit-only 286. When 80386 was designed, it was important that it be able to run 16-bit-only code at full speed, and they provided ways to incrementally add 32-bit ops to 16-bit code. And used the same mechanism to allow 16-bit ops in 32-bit code.

The x86 8-bit low/high register stuff (AX=AH:AL) is again partly due to how 8086 was designed as a successor to 8080 and to make porting easy (and even possible to automate) See Why are first four x86 GPRs named in such unintuitive order?. (And also because it was just plain useful to have eight 1-byte registers and four 2-byte registers at the same time.)

Related: Which 2's complement integer operations can be used without zeroing high bits in the inputs, if only the low part of the result is wanted? For many calculations, you don't have to re-zero the high bits after each operation to get the same result. So lack of 8-bit / 16-bit operand size is not an obstacle to efficient implementation of most code that logically wraps its results to 8 or 16 bits.

64-bit RISC machines often have a 32-bit version of at least some important instructions like add, so you can get a zero-extended add result for free without having to separately truncate it, e.g. to make code like array[i++] efficient with uint32_t i and 64-bit pointers. But never partial-register operand sizes narrower than 32-bit, on any RISC I've heard of.

DEC Alpha is interesting because it was a new design, 64-bit from the ground up, not a 64-bit extension to an existing ISA the way MIPS64 is. This table of Alpha mnemonics shows that add/sub/mul/div were all available in 32 and 64-bit forms, but shifts and compares weren't. (There are also byte-manipulation instructions that are basically SIMD shuffle/mask/insert/extract inside 64-bit integer registers, and a SIMD packged-compare for efficient string stuff.)

According to this official MIPS64 ISA doc (section 4.3 CPU Registers).

A MIPS64 processor always produces a 64-bit result, even for those instructions that are architecturally defined to operate on 32 bits. Such instructions typically sign-extend their 32-bit result into 64 bits. In so doing, 32-bit programs work as expected, though the registers are actually 64 bits wide rather than 32 bits.

(You use special instructions for full 64-bit registers, like DADDU (doubleword-add unsigned) instead of ADDU. Note that the non-U versions of add and dadd trap on 2's complement signed overflow (with 32-bit or 64-bit operand size), so you have to use the U version for wrapping signed math. (ISA reference links on mips.com). Anyway, MIPS doesn't have a special mode for 32-bit, but an OS would need to care about 32-bit programs vs. 64-bit, because 32-bit will assume all pointers are in the low 32 of virtual address space.

On a RISC load/store machine, you'd usually just use zero-extending (or sign-extending) byte/halfword loads. When you're done, you'd use a byte / halfword store to get the truncated result. (With for unsigned base2, or 2's complement signed, is typically what you want.) This is how a compiler (or human) would implement C source that used short or uint8_t.

Semi-related: C's integer promotion rules automatically promote everything narrower than int up to int when used as an operand to a binary operator like +, so it mostly maps nicely to this way of computing. (i.e. unsigned result = (a+b) * c in C doesn't have to truncate the a+b result back to uint8_t before the multiply, if a, b, and c are all uint8_t. But it's pretty bad that uint16_t promotes to signed int, so uint16_t a,b; unsigned c = a * b risks signed-overflow UB from promoting to signed int for the multiply.) Anyway, C's promotion rules sort of look like they're designed for machines without full support for narrow operand sizes, because that's common for a lot of hardware.

But you're asking about overflow checking / flag-setting from narrow ops.

Not all RISC machines even have a FLAGS register. ARM does, but for example MIPS and Alpha don't. ARM doesn't set flags on every instruction: you have to explicitly use the flag-setting form of an instruction.

CPUs without FLAGS typically have some simple compare-and-branch instructions (often against zero, like MIPS bltz), and others that compare two inputs and write a 0 / 1 result to another integer register (e.g. MIPS SLTIU -- Set on less than immediate unsigned). You can use the Set instructions + a bne with zero to create more complex branch conditions.

Hardware and software support for efficient overflow-checking is a problem in general. Putting a jcc after every x86 instruction sucks quite a lot, too.

But partly because most languages don't make it easy to write code that needs overflow checking after every instruction, CPU architects don't provide it in hardware, especially not for narrow operand sizes.

MIPS is interesting with trapping add for signed overflow.

Ways to implement it efficiently might include having a "sticky" flag, the way FPU exception flags are sticky: the Invalid flag stays set after dividing by zero (and producing NaN); other FP instructions don't clear it. So you can check for exception flags at the end of a series of computations, or after a loop. This makes it cheap enough to actually use in practice, if there was a software framework for it.

With FP code, usually you don't need to look at flags because NaN itself is "sticky" or "infectious". Most binary operators produce NaN if either input is NaN. But unsigned and 2's complement integer representations don't have any spare bit patterns: they all represent specific numbers. (1's complement has negative zero...)

For more about ISA design that would make overflow checking possible, have a look at discussion on Agner Fog's proposal for a new ISA that combines the best features of x86 (code density, lots of work per instruction) and RISC (easy to decode) for a high performance paper architecture. Some interesting SIMD ideas, including making future extensions to vector width transparent, so you don't have to recompile to run faster with wider vectors.

Direct Arithmetic Operations on Small-sized Numbers in RISC Architectures

3 Answers