In OpenCL, I want to store a vector (3D) using a "Shared Exponent" representation for compact storage. Typically, if you store a 3D floating point vector, you simply store 3 separate float values (or 4 when aligned properly). This requires 12 (16) bytes storage for single precision and if you don't require this accuracy you can use the "half" precision float and shrink it down to 6 (8) bytes.
When using half precision and 3 separate values, the memory looks like this (no alignment considered):
- x coordinate: 1 bit sign, 5 bits exponent, 10 bits mantissa
- y coordinate: 1 bit sign, 5 bits exponent, 10 bits mantissa
- z coordinate: 1 bit sign, 5 bits exponent, 10 bits mantissa
I'd like to shrink this down to 4 bytes by using a shared exponent, as OpenGL uses this in one of its internal texture formats ("RGB9_E5"). This means, the absolutely largest component decides what the exponent of the whole number is. This exponent is then used for each component implicitly. Tricks such as "normalized" storage with an implicit "1." in front of the mantissa don't work in this case. Such a representation works like this (we could tweak the acutal parameters, so this is an example):
- x coordinate: 1 bit sign, 8 bits mantissa
- y coordinate: 1 bit sign, 8 bits mantissa
- z coordinate: 1 bit sign, 8 bits mantissa
- 5 bits shared exponent
I'd like to store this in an OpenCL uint
type (32 bits) or something equivalent (e.g. uchar4
). The question now is:
How can I convert from and into this representation to and from float3
as fast as possible?
My idea is like this, but I'm sure there is some "bit hacking" trick which uses the bit representation of IEEE floats to circumvent the floating point ALU:
- Use
uchar4
as the representative type. Store x, y, z mantisssa in x, y, z components of thisuchar4
. The w component is split up into 5 less significant bits(w & 0x1F)
for the shared exponent and the three more significant bits(w >> 5) & 1
,(w >> 6) & 1
and(w >> 7) & 1
are the signs for x, y and z, respectively. - Note that the exponent is "biased" by 16, i.e. a stored value of 16 means that the represented numbers are up to (not including) 1.0, a stored value of 19 means values up to (not including) 8.0 and so on.
"Unpacking" this representation into a
float3
could be done using this code:float3 unpackCompactVector(uchar4 packed) { float exp = (float)(packed.w & 0x1F) - 16.0; float factor = exp2(exp) / 256.0; float x = (float)(packed.x) * factor * (packed.w & 0x20 ? -1.0 : 1.0); float y = (float)(packed.y) * factor * (packed.w & 0x40 ? -1.0 : 1.0); float z = (float)(packed.z) * factor * (packed.w & 0x80 ? -1.0 : 1.0); float3 result = { x, y, z }; return result; }
"Packing" a
float3
into this representation could be done using this code:uchar4 packCompactVector(float3 vec) { float xAbs = abs(vec.x); uchar xSign = vec.x < 0.0 ? 0x20 : 0; float yAbs = abs(vec.y); uchar ySign = vec.y < 0.0 ? 0x40 : 0; float zAbs = abs(vec.z); uchar zSign = vec.z < 0.0 ? 0x80 : 0; float maxAbs = max(max(xAbs, yAbs), zAbs); int exp = floor(log2(maxAbs)) + 1; float factor = exp2(exp); uchar xMant = floor(xAbs / factor * 256); uchar yMant = floor(yAbs / factor * 256); uchar zMant = floor(zAbs / factor * 256); uchar w = ((exp + 16) & 0x1F) + xSign + ySign + zSign; uchar4 result = { xMant, yMant, zMant, w }; return result; }
I've put an equivalent implementation in C++ online on ideone. The test cases shows the transition from exp = 3
to exp 4
(with the bias of 16 this is encoded as 19 and 20, respectively) by encoding numbers around 8.0
.
This implementation seems to work on the first sight. But:
- There are some corner cases I didn't cover, in particular over- and underflow (of the exponent).
- I don't want to use floating point math functions like
log2
because they are slow.
Can you suggest a better way to achieve my goal?
Note that I only need an OpenCL "device code" for this, I don't need to convert between the representations in the host program. But I added the C
tag since a solution is most probably independent of the OpenCL language features (OpenCL is almost C and it also uses IEEE 754 floats, bit manipulation works the same, etc.).
floor(log2())
can be replaced by a bit of bit-twiddling and integer arithmetic to extract and re-size/re-bias the exponent ofmaxAbs
without having to calculate the fractional part of the logarithm. It doesn't look applicable here, but when you've got an integer, you can also useclz
(count leading zeros), which will often be a single machine instruction. – user57368