0
votes

Encoding

As part of a graphical application I'm currently working on, I need to store three signed floats per each pixel of a 32 bit texture. At the moment, to reach this goal, I'm using the following C++ function:

void encode_full(float* rgba, unsigned char* c) {
    int range = 8;
    for (int i = 0; i < 3; i++) {
        rgba[i] += range / 2;
        rgba[i] /= range;
        rgba[i] *= 255.0f;         
        c[i] = floor(rgba[i]);
    }
    c[3] = 255;
}

Although this encoding function brings along a considerable loss in precision, things are made better by the fact that the range of considered values is limited to the interval (-4,4).

Nonetheless, even though the function yields decent results, I think I could do a considerably better job by exploiting the alpha channel (currently unused) to get additional precision. In particular I was thinking to use 11 bits for the first float, 11 bits for the second, and 10 bits for the last float, or 10 - 10 - 10 - 2 (unused). OpenGL has a similar format, called R11F_G11F_B10F.

However, I'm having some difficulties coming up with an encoding function for this particular format. Does anyone know how to write such a function in C++?

Decoding

On the decoding side, this is the function I'm using within my shader.

float3 decode(float4 color) { 
    int range = 8;
    return color.xyz * range - range / 2; 
}

Please, notice that the shader is written in Cg, and used within the Unity engine. Furthermore, notice that Unity's implementation of Cg shaders handles only a subset of the Cg language (for instance pack/unpack functions are not supported).

If possible, along with the encoding function, a bit of help for the decoding function would be highly appreciated. Thanks!

Edit

I've mentioned the R11F_G11F_B10F only as a frame of reference for the way the bits are to be split among the color channels. I don't want a float representation, since this would actually imply a loss of precision for the given range, as pointed out in some of the answers.

2
Can you please clarify if you want to pack them as 11 bit ints or 11 bit floats? In your example code you convert the floats to ints, but R11F_G11F_B10F stores packed floats. - samgak

2 Answers

0
votes

"10 bits" translates to an integer between 0 and 1023, so the mapping from [-4.0,+4.0] trivially is floor((x+4.0) * (1023.0/8.0)). For 11 bits, substitute 2047.

Decoding is the other way around, (y*8.0/1023.0) - 4.0.

0
votes

I think using GL_R11F_G11F_B10F is not going to help in your case. As the format name suggests, the components here are 11-bit and 10-bit float numbers, meaning that they are stored as a mantissa and exponent. More specifically, from the spec:

An unsigned 11-bit floating-point number has no sign bit, a 5-bit exponent (E), and a 6-bit mantissa (M).

An unsigned 10-bit floating-point number has no sign bit, a 5-bit exponent (E), and a 5-bit mantissa (M).

In both cases, as common for floating point formats, there is an implicit leading 1 bit for the mantissa. So effectively, the mantissa has 7 bits of precision for the 11-bit case, 6 bits for the 10-bit case.

This is less than the 8-bit precision you're currently using. Now, it's important to understand that the precision for the float case is non-uniform, and relative to the size of the number. So very small numbers would actually have better precision than an 8-bit fixed point number, while numbers towards the top of the range would have worse precision. If you use the natural mapping of your [-4.0, 4.0] range to positive floats, for example by simply adding 4.0 before converting to the 11/10-bit signed float, you would get better precision for values close to -4.0, but worse precision for values close to 4.0.

The main advantage of float formats is really that they can store a much wider range of values, while still maintaining good relative precision.

As long as you want to keep memory use at 4 bytes/pixel, a much better choice for you would be a format like GL_RGB10, giving you an actual precision of 10 bits for each component. This is very similar to GL_RGB10_A2 (and its unsigned sibling GL_RGB10_A2UI), except that it does not expose the alpha component you are not using.

If you're willing to increase memory usage beyond 4 bytes/pixel, you have numerous options. For example, GL_RGBA16 will give you 16 bits of fixed point precision per component. GL_RGB16F gives you 16-bit floats (with 11 bits relative precision). Or you can go all out with GL_RGB32F, which gives you 32-bit float for each component.