Relationships between 128, 64, and 32 bit IEEE-754 floating point numbers

Question

I want to get familiar and comfortable with floating-point numbers. I'm doing a project that would hopefully help me achieve this by creating dynamically allocated, arbitrarily sized, floating point numbers in C++. I've looked through the IEEE-754 specifications for the standard floating point definitions but I could not find a common correlation between them (I used references from wikipedia on 32, 64, and 128 bit floating point numbers). So my question is: Is there a common pattern between floating point numbers that can be applied to any arbitrarily sized floating point number?

If not, from a programming perspective, would it be easier to define my own floating point representation that does have a pattern?

EDIT: By pattern I mean bits in the mantissa and exponent.

By "pattern" do you mean the relationship between the number of mantissa and exponent bits? — Oliver Charlesworth
@OliverCharlesworth Yes, I should have clarified that. The number of bits in the mantissa and exponent. — Max
Ok. I guess that each was tuned individually to satisfy a trade-off between a number of potential use cases, rather than to fit a mathematical pattern. — Oliver Charlesworth
related: What is the rationale for exponent and mantissa sizes in IEEE floating point standards? — phuclv

Eric Postpischil Eric Postpischil · Accepted Answer · 2018-05-30T01:14:37

There is no mandated mathematical rule for the numbers of bits in the significand¹ or the exponent. IEEE 754-2008 does show a formula that describes its listed interchange formats for certain sizes, but this is in a non-normative note:

For a storage width k bits, the number of bits in the significand (the mathematical significand with the leading bit, not the field that primarily encodes it without the leading bit), p, is k−round(4×log2(k))+13.
The number of bits in the exponent field, w, is k−p.

The formula does not hold for 16 or 32 bits; it is only said to hold for 64 bits and widths that are multiples of 32 greater than or equal to 128 (so not widths 32 or 96). I suppose you can consider it a suggestion for larger sizes, but it is not binding.

As far as I know, the parameters specified in table 3.5 of clause 3.6 of IEEE 754-2008 arise from striking balances and historic usage. You can define formats with other parameters as described in clause 3.7. 3.7 gives some recommendations for defining extended precisions using parameters of the precision (digits in the significand) and maximum exponent or just the precision. Or you can disregard IEEE 754 and define your own formats. The standards are not mandatory, and what your design should be is a function of what the goals are.

Note

¹ “Significand” is the preferred term for the fraction part of a floating-point number. “Mantissa” is a term for the fraction part of a logarithm. Significands are linear (if the number increases by a factor of 1.2, the significand increases by a factor of 1.2, unless an exponent threshold is crossed), mantissas are logarithmic.

Relationships between 128, 64, and 32 bit IEEE-754 floating point numbers

3 Answers

Note