More efficient way to blend pixels (semi-transparency)?

Question

I'm working on drawing semi-transparent images on top of other images for a small 2d game. To currently blend the images I'm using the formula found here: https://en.wikipedia.org/wiki/Alpha_compositing#Alpha_blending

My implementation of this is as follows;

private static int blend(int source, int dest, int trans)
{
    double alpha = ((double) trans / 255.0);
    int sourceRed = (source >> 16 & 0xff);
    int sourceGreen = (source >> 8 & 0xff);
    int sourceBlue = (source & 0xff);
    int destRed = (dest >> 16 & 0xff);
    int destGreen = (dest >> 8 & 0xff);
    int destBlue = (dest & 0xff);

    int blendedRed = (int) (alpha * sourceRed + (1.0 - alpha) * destRed);
    int blendedGreen = (int) (alpha * sourceGreen + (1.0 - alpha) * destGreen);
    int blendedBlue = (int) (alpha * sourceBlue + (1.0 - alpha) * destBlue);

    return (blendedRed << 16) + (blendedGreen << 8) + blendedBlue;
}

Now, it works fine, but it has a pretty high overhead since it's being called for every single pixel every single frame. I get a performance drop of around 30% FPS as opposed to simply rendering the image without blending.

I just wanted to know if anyone can think of a better way to optimise this code as I'm probably doing too many bit operations.

Spektre Spektre · Accepted Answer · 2020-06-12T06:24:08

not a java coder (so read with prejudice) but you are doing some things really wrong (from mine C++ and low level gfx perspective):

mixing integers and floating point

that requires conversions which are sometimes really costly... Its much better to use integer weights (alpha) in range <0..255> and then just divide by 255 or bitshift by 8. That would be most likely much faster.
bitshifting/masking to obtain bytes

yes its fine but there are simpler and faster methods simply by using
```
enum{
    _b=0,   // db
    _g=1,
    _r=2,
    _a=3,
    };

union color
    {
    DWORD dd;    // 1x32 bit unsigned int
    BYTE db[4];  // 4x8 bit unsigned int
    };

color col;
col.dd=some_rgba_color;
r = col.dd[_r]; // get red channel
col.dd[_b]=5;   // set blue channel
```
decent compilers could optimize some parts of your code to this internally on its own but I doubt it can do it everywhere...

You can also use pointers instead of union in the same way...
function overhead

you got function blending single pixel. That means it will be called a lot. its usually much faster to blend region (rectangle) per single call than call stuff on per pixel basis. Because you trash the stack this way. To limit this you can try these (for functions that are called massively):

Recode your app so you can blend regions instead of pixels causing much less function calls.

Lower the stack trashing by lowering operands, return values and internal variables of called function to limit the amount of RAM being allocated/freed/overwritten/copied each call... For example by using static or global variables for example the Alpha will most likely not be changing much. Or you can use alpha encoded in the color directly instead of having alpha as operand.

use inline or macros like #define to place the source code directly to code instead of function call.

For starters I would try to recode your function body to something like this:

enum{
    _b=0,   // db
    _g=1,
    _r=2,
    _a=3,
    };

union color
    {
    unsigned int dd;    // 1x32 bit unsigned int
    unsigned char db[4];  // 4x8 bit unsigned int
    };

private static unsigned int blend(unsigned int src, unsigned int dst, unsigned int alpha)
    {
    unsigned int i,a,_alpha=255-alpha;
    color s,d;
    s.dd=src;
    d.dd=dst;
    for (i=0;i<3;i++)
        { 
        a=(((unsigned int)(s.db[i]))*alpha) + (((unsigned int)(d.db[i]))*_alpha);
        a>>=8;
        d.db[i]=a;
        }
    return d.dd;
    }

However if you want true speed use GPU (OpenGL Blending).

More efficient way to blend pixels (semi-transparency)?

1 Answers