0
votes

I'm trying to implement a Gaussian Blur from scratch (using C++). In the code below I've hard-coded the Gaussian kernel I'm using. I only kept one dimension as I'm trying to use the optimization I've read about where you can do a horizontal convolution pass and a vertical one over that to make your blur more efficient. Unfortunately, I'm running into some issues. Here is my code:

float gKern[5] = {0.05448868, 0.24420134, 0.40261995, 0.24420134, 0.05448868};
int** gaussianBlur(int** image, int height, int width) {
    int **ret  = new int*[height];
    for(int i = 0; i < height; i++) {
        ret[i] = new int[width];
    }

    for (int i = 0; i < height; i++) {
        for (int j = 0; j < width; j++) {
            if (i == 0) {
                ret[i][j] = (gKern[0] * image[2][j]) + (gKern[1] * image[1][j]) + (gKern[2] * image[0][j]) + (gKern[3] * image[1][j]) + (gKern[4] * image[2][j]);
            } else if (i == 1) {
                ret[i][j] = (gKern[0] * image[1][j]) + (gKern[1] * image[0][j]) + (gKern[2] * image[1][j]) + (gKern[3] * image[2][j]) + (gKern[4] * image[3][j]);
            } else if (i == (height - 2)) {
                ret[i][j] = (gKern[0] * image[i - 2][j]) + (gKern[1] * image[i - 1][j]) + (gKern[2] * image[i][j]) + (gKern[3] * image[i + 1][j]) + (gKern[4] * image[i][j]);
            } else if (i == (height - 1)) {
                ret[i][j] = (gKern[0] * image[i - 2][j]) + (gKern[1] * image[i - 1][j]) + (gKern[2] * image[i][j]) + (gKern[3] * image[i - 1][j]) + (gKern[4] * image[i - 2][j]);
            } else {
                ret[i][j] = (gKern[0] * image[i - 2][j]) + (gKern[1] * image[i - 1][j]) + (gKern[2] * image[i][j]) + (gKern[3] * image[i + 1][j]) + (gKern[4] * image[i + 2][j]);
            }
        }
    }

    int** temp = image;
    image = ret;

    for (int i = 0; i < height; i++) {
        for (int j = 0; j < width; j++) {
            if (j == 0) {
                ret[i][j] = (gKern[0] * image[i][2]) + (gKern[1] * image[i][1]) + (gKern[2] * image[i][0]) + (gKern[3] * image[i][1]) + (gKern[4] * image[i][2]);
            } else if (j == 1) {
                ret[i][j] = (gKern[0] * image[i][1]) + (gKern[1] * image[i][0]) + (gKern[2] * image[i][1]) + (gKern[3] * image[i][2]) + (gKern[4] * image[i][3]);
            } else if (j == (width - 2)) {
                ret[i][j] = (gKern[0] * image[i][j - 2]) + (gKern[1] * image[i][j - 1]) + (gKern[2] * image[i][j]) + (gKern[3] * image[i][j + 1]) + (gKern[4] * image[i][j]);
            } else if (j == (width - 1)) {
                ret[i][j] = (gKern[0] * image[i][j - 2]) + (gKern[1] * image[i][j - 1]) + (gKern[2] * image[i][j]) + (gKern[3] * image[i][j - 1]) + (gKern[4] * image[i][j - 2]);
            } else {
                ret[i][j] = (gKern[0] * image[i][j - 2]) + (gKern[1] * image[i][j - 1]) + (gKern[2] * image[i][j]) + (gKern[3] * image[i][j + 1]) + (gKern[4] * image[i][j + 2]);
            }
        }
    }

    image = temp;

    return ret;
}

The first pass (the first for block) seems to work fine as when I comment out the second block I do get a slightly blurred image. But when I use both I get a choppy "weird" image, as shown below (the first image is my grayscale input, the second is the choppy output):

1

1 Answers

0
votes

The problem is with the pointers you use.

The function starts with image as input and ret as the intermediate result of the first step.

The second step must use ret as input, and write either to the original input (overwrite the input image) or to a new image. Instead, you do:

int** temp = image;
image = ret;
// read from image and write to ret
image = temp;
return ret;

That is, going into the second pass, both image and ret point to the same data, you then read and write to the same data. Next you do a pointer assignment that has no effect (image is never used after this) and return the intermediate buffer.

If you want to write to the input image, simply swap the image and ret pointers before the second pass:

std::swap(image, res);

If you don’t want that, you’ll have to new another image to write into.


It is bad practice to use an array of arrays to store an image. If you look into the source code of any image processing library, you’ll see they allocate a single large memory block for the image, which stores all image rows concatenated. Knowing the width of the image, you know how to index: image[x + y*width].

This not only simplifies code (no loops to allocate a single image), but it also greatly speeds up code: there is no pointer lookup any more, and all data is close together to best use the cache.

This whole code can be simplified significantly by following the advice above: the two passes can be done with the same code. Write a function that filters one line of the image. It takes a pointer to the first pixel, a line length, and a step (which is 1 for horizontal lines, and width for vertical lines). This 1D function is then called in a loop over the lines in another function. This second function is then called once to do the horizontal pass, and once to do the vertical pass. (See here for details.)

In this situation, it is easy to avoid intermediate images by using a buffer of the size of a single image line. Write into that buffer, then copy the whole line back into the input image after it is filtered. This means you have a single buffer of size max(width,height) rather than a buffer of size width*height.

The 1D filter function can also be simplified. That loop should not have any if statements, they will significantly slow down. Execution. Instead, special-case the first two and last two pixels, and loop only over the bulk of the pixels where you don’t have to worry about the image edge.