Conversion of a number from Single precision floating point representation to a Half precision floating point

Question

I have a code where I have to work on Half precision floating point representation numbers. To achieve that I have created my own C++ class fp16 with all operators(arithmetic logical, relational) related to this type overloaded with my custom functions, while using a Single precision floating point number with a Half precision floating point number.

Half precision floating point = 1 Sign bit , 5 exponent bits , 10 significand bits = 16 bit

Single precision floating point = 1 Sign bit, 8 exponent bits, 23 significand bits = 32 bits

So what I do to convert from a Single precision floating point number to a Half precision floating point number:-

For significand bits - I use truncation i.e. loose 13 bits from the 32 bits to get 10 bits significand for half precision float.

What should I do to handle the exponent bits. How do I go from 8 exponent bits to 5 exponent bits?

Any good reading material would help.

If the exponent is not representable with 5 bits, then you're in overflow condition. If you use IEEE754-like representation, you might want to give inf as result. I think all reading on double->float conversion is relevant. — eudoxos
(and, oh, did you notice the wikipedia article on half-precision references c/c++ code (for matlab) to do that conversion both ways? That could be a good inspiration.) — eudoxos
@eudoxos - thanks for the MAtlab link. It explains nicely what could be done. — goldenmean

goldenmean goldenmean · Accepted Answer · 2011-10-01T09:20:21

I found a solution in a library developed by OpenEXR. Basically there are two options OpenEXR uses this option a) below- a)Use a 16 bit unsigned short type to stored the half precision float data type and it has a lookup table store of values precomputed , which is used in converting a float to half and also half to float.

I used this way- b)I can just loose the precision of a Single precision float to get a half precision float. Store this in a "float" native type. Leave the exponent untouched, since we are still using float(single precision) to store the reduced precision halfprecision float data.

Thanks @eudoxos for the Matlab link explaining some details about this whole thing.

Conversion of a number from Single precision floating point representation to a Half precision floating point

1 Answers