Image processing convolution: Why do I slide my kernel from np.arange(pad, imgWidth+pad)?

Question

I am trying to learn kernel convolution for image processing. Now, I understand the concept of kernel convolution, but I am a bit confused about code that I have found for it at https://www.pyimagesearch.com/2016/07/25/convolutions-with-opencv-and-python/

Specifically, I am confused about the bounds in the for loops and the location of the convolution output.

def convolve(image, kernel):
    # grab the spatial dimensions of the image, along with
    # the spatial dimensions of the kernel
        (iH, iW) = image.shape[:2]
        (kH, kW) = kernel.shape[:2]

    # allocate memory for the output image, taking care to
    # "pad" the borders of the input image so the spatial
    # size (i.e., width and height) are not reduced
    pad = (kW - 1) // 2
    image = cv2.copyMakeBorder(image, pad, pad, pad, pad,
        cv2.BORDER_REPLICATE)
    output = np.zeros((iH, iW), dtype="float32")


    # loop over the input image, "sliding" the kernel across
    # each (x, y)-coordinate from left-to-right and top to
    # bottom
#QUESTION 1 SECTION BEGIN
    for y in np.arange(pad, iH + pad):
        for x in np.arange(pad, iW + pad):
            # extract the ROI of the image by extracting the
            # *center* region of the current (x, y)-coordinates
            # dimensions
            roi = image[y - pad:y + pad + 1, x - pad:x + pad + 1]

#QUESTION 1 SECTION END

    # perform the actual convolution by taking the
    # element-wise multiplication between the ROI and
    # the kernel, then summing the matrix
    k = (roi * kernel).sum()

#QUESTION 2 SECTION BEGIN

    # store the convolved value in the output (x,y)-
    # coordinate of the output image
    output[y - pad, x - pad] = k

#QUESTION 2 SECTION END

Question 1: Why is np.arange from pad to iH+pad, and not from pad to iH-pad ? I assume that we start from pad so that the center pixel in the region of interest is never on the edge of the image. However, I would think that going to iH+pad would overshoot and have the center pixel end up outside of image dimensions.

Question 2: This code has us store the output pixel at a location to the left and up from where I centered my convolution roi, no ? If so, could someone explain the logic behind doing this for me?

Thank you!

Cris Luengo Cris Luengo · Accepted Answer · 2021-08-24T18:54:09

np.arange(pad, iH + pad) runs over iH pixels, which is the width of the original input image. The padded image has a width of iH + 2*pad, so this is running from pad pixels from the beginning to pad pixels from the end of an image column, such that one can index up to pad pixels in both directions without exiting the padded image.

Regarding your second question: the input image was padded, the indexing is into the padded image. image[pad,pad] obtains the top-left pixel of the original image before padding, and corresponds to output[0,0]. output is not padded.

Image processing convolution: Why do I slide my kernel from np.arange(pad, imgWidth+pad)?

1 Answers