Do any vector registers use same exponent bits for single and double precision?

Question

It is possible to store a pair of 32-bit single precision floating point numbers in the same space that would be taken by a 64-bit double precision number. For example, the XMM registers of the SSE2 instruction set, can store four single precision numbers or two double precision numbers.

By the IEEE 754 standard, the difference between single and double precision is not only the precision per se, but also the available range: 8 and 11 exponent bits respectively.

Intuitively, it seems to me that if you were designing an FPU to process either 2N single precision numbers or N double precision numbers in parallel, the circuit design should be simpler if you deviate from the IEEE standard and make both use the same number of exponent bits. For example, the bfloat16 half precision format, trades away some mantissa bits to keep the same number of exponent bits as single precision; part of the justification given for this, is that it's easier to convert between bfloat16 and single precision.

Do any actual vector instruction sets use the same number of exponent bits for single and double precision? If so, do they stick closer to the 8 bits typical for single precision, or 11 bits typical for double precision?

For scalar processing, the DEC VAX initially used this approach with their F (single precision) and D (double precision) formats; both used an 8-bit exponent field. However, the small exponent range caused numerical issues for double-precision computation in some contexts, so a G format (basically IEEE-754 double precision) was added later. — njuffa
@njuffa: Interesting! Worth posting as an answer if you want, even though the question for some reason limited itself to SIMD. It does make more sense for scalar FPUs back when transistor budgets were smaller; if you never need that wider exponent then you don't have to build it at all. — Peter Cordes
@Peter Cordes The approach used by the VAX was not uncommon in older computers. E.g. the IBM System/360 used a radix-16 floating-point format with a 7-bit binary exponent for both single and double precision. This question is focused on newer SIMD-based architectures; that is why I did not post that little tidbit of information as an answer. — njuffa
@njuffa It didn't occur to me that this would also have cropped up with scalar FPUs, which is why I only talked about SIMD, but you're right, that is an interesting example! — rwallace

Peter Cordes Peter Cordes · Accepted Answer · 2020-09-12T11:16:49

AFAIK, nobody does this. Sign-extending and zero-extending are pretty trivial in hardware compared to the transistor cost of building an FPU execution unit overall.

Routing the exponent vs. mantissa bits where they need to go is not a big deal compared to building a multiplier you can use as one 52-bit multiplier or 2 separate 23-bit multipliers. (That way the same transistors can be used for the mantissas of packed-single and packed-double multiplies / FMAs; that's a large fraction of the die areas for an FMA/multiplier unit.)

AFAIK, all CPUs modern enough to have SIMD at all use IEEE-754 formats because that's what people want, and there's no compelling reason to do otherwise. Certainly the vast majority of them use the standard formats.

ARM NEON for example initially didn't support full IEEE 754, but what they left out was gradual underflow (subnormals). They still used IEEE binary32 and binary64 (standard float and double) data formats.

Do any vector registers use same exponent bits for single and double precision?

2 Answers