Can someone tell me a fast function to find the gaussian blur of an image using a 5x5 mask. I need it for iOS app dev. I am working directly on the memory of the image defined as
unsigned char *image_sqr_Baseaaddr = (unsigned char *) malloc(noOfPixels);
for (row = 2; row < H-2; row++)
{
for (col = 2; col < W-2; col++)
{
newPixel = 0;
for (rowOffset=-2; rowOffset<=2; rowOffset++)
{
for (colOffset=-2; colOffset<=2; colOffset++)
{
rowTotal = row + rowOffset;
colTotal = col + colOffset;
iOffset = (unsigned long)(rowTotal*W + colTotal);
newPixel += (*(imgData + iOffset)) * gaussianMask[2 + rowOffset][2 + colOffset];
}
}
i = (unsigned long)(row*W + col);
*(imgData + i) = newPixel / 159;
}
}
This is obviously the slowest function possible. I heard that ARM Neon intrinsics on the iOS can be used to make several operations in 1 cycle. Maybe that's the way to go ?
The problem is that I am not very familiar and don't have enough time to learn assembly language at the moment. So it would be great if anyone can post a Neon intrinsics code for the problem mentioned above or any other fast implementation in C/C++.