Store 3 signed floats (from -4 to 4) for each pixel of a 32 bit texture (R11F_G11F_B10F)

Question

Encoding

As part of a graphical application I'm currently working on, I need to store three signed floats per each pixel of a 32 bit texture. At the moment, to reach this goal, I'm using the following C++ function:

void encode_full(float* rgba, unsigned char* c) {
    int range = 8;
    for (int i = 0; i < 3; i++) {
        rgba[i] += range / 2;
        rgba[i] /= range;
        rgba[i] *= 255.0f;         
        c[i] = floor(rgba[i]);
    }
    c[3] = 255;
}

Although this encoding function brings along a considerable loss in precision, things are made better by the fact that the range of considered values is limited to the interval (-4,4).

Nonetheless, even though the function yields decent results, I think I could do a considerably better job by exploiting the alpha channel (currently unused) to get additional precision. In particular I was thinking to use 11 bits for the first float, 11 bits for the second, and 10 bits for the last float, or 10 - 10 - 10 - 2 (unused). OpenGL has a similar format, called R11F_G11F_B10F.

However, I'm having some difficulties coming up with an encoding function for this particular format. Does anyone know how to write such a function in C++?

Decoding

On the decoding side, this is the function I'm using within my shader.

float3 decode(float4 color) { 
    int range = 8;
    return color.xyz * range - range / 2; 
}

Please, notice that the shader is written in Cg, and used within the Unity engine. Furthermore, notice that Unity's implementation of Cg shaders handles only a subset of the Cg language (for instance pack/unpack functions are not supported).

If possible, along with the encoding function, a bit of help for the decoding function would be highly appreciated. Thanks!

Edit

I've mentioned the R11F_G11F_B10F only as a frame of reference for the way the bits are to be split among the color channels. I don't want a float representation, since this would actually imply a loss of precision for the given range, as pointed out in some of the answers.

Can you please clarify if you want to pack them as 11 bit ints or 11 bit floats? In your example code you convert the floats to ints, but R11F_G11F_B10F stores packed floats. — samgak

MSalters MSalters · Accepted Answer · 2016-08-03T12:08:16

"10 bits" translates to an integer between 0 and 1023, so the mapping from [-4.0,+4.0] trivially is floor((x+4.0) * (1023.0/8.0)). For 11 bits, substitute 2047.

Decoding is the other way around, (y*8.0/1023.0) - 4.0.

Store 3 signed floats (from -4 to 4) for each pixel of a 32 bit texture (R11F_G11F_B10F)

2 Answers