I want to perform a simple 2D image convolution but my kernel is even-sized. Which indices I should pick for my kernel center? I tried googling for an answer and looking existing codes. People usually center their kernel so there would be one sample more before the new 0. So, if we have a 4x4 kernel the centered indices should be -2 -1 0 +1
. Is that correct? And if it is, why is that so? Can someone explain why -2 -1 0 +1
is correct while -1 0 +1 +2
is not? Keep in mind that I want to perform the convolution without using FFT.
3 Answers
If I understand your question correctly, then for even sized kernels you are correct that it is the convention to centre the kernel so that there is one more sample before the new zero.
So, for a kernel of width 4, the centred indices will be -2 -1 0 +1
as you say above.
However, this really is just a convention - an asymmetric convolution is very rarely used anyway and the exact nature of the asymmetry (to the left/right etc.) has no relation to the "correct" result. I would imagine that the reason that most implementations behave this way is so that they can give comparable results given the same inputs.
When performing the convolution in the frequency domain, the kernel is padded to match the image size anyway, and you've already stated that you are performing the convolution in the spatial domain.
I'm much more intrigued as to why you need to use an even sized kernel in the first place.
The correct answer is to return the results pixel in the upper left corner, regardless whether your matrix is evenly sized or not. Then you can simply perform the operation in a simple scanline, and they require no memory.
private static void applyBlur(int[] pixels, int stride) {
int v0, v1, v2, r, g, b;
int pos;
pos = 0;
try {
while (true) {
v0 = pixels[pos];
v1 = pixels[pos+1];
v2 = pixels[pos+2];
r = ((v0 >> 16) & 0xFF) + ((v1 >> 16) & 0xFF) + ((v2 >> 16) & 0xFF);
g = ((v0 >> 8 ) & 0xFF) + ((v1 >> 8) & 0xFF) + ((v2 >> 8) & 0xFF);
b = ((v0 ) & 0xFF) + ((v1 ) & 0xFF) + ((v2 ) & 0xFF);
r/=3;
g/=3;
b/=3;
pixels[pos++] = r << 16 | g << 8 | b;
}
}
catch (ArrayIndexOutOfBoundsException e) { }
pos = 0;
try {
while (true) {
v0 = pixels[pos];
v1 = pixels[pos+stride];
v2 = pixels[pos+stride+stride];
r = ((v0 >> 16) & 0xFF) + ((v1 >> 16) & 0xFF) + ((v2 >> 16) & 0xFF);
g = ((v0 >> 8 ) & 0xFF) + ((v1 >> 8) & 0xFF) + ((v2 >> 8) & 0xFF);
b = ((v0 ) & 0xFF) + ((v1 ) & 0xFF) + ((v2 ) & 0xFF);
r/=3;
g/=3;
b/=3;
pixels[pos++] = r << 16 | g << 8 | b;
}
}
catch (ArrayIndexOutOfBoundsException e) { }
}
After some thinking on even sized convolution and its application in Temporal Convolutional Networks, I decided, that following experiment will give an answer for centering of even sized convolution in tensorflow/keras:
import keras
import numpy as np
import tensorflow as tf
import keras.backend as K
import keras.layers as layers
from keras.layers import Conv2D, Input
from keras.initializers import Constant
if __name__ == '__main__':
inputs = Input(shape=(None,1,1))
even_conv = Conv2D(1,(4,1),padding="same",
kernel_initializer=Constant(value=1.),use_bias=False)(inputs)
f = K.function(inputs=[inputs],outputs=[even_conv])
test_input = np.arange(10)[np.newaxis,...,np.newaxis,np.newaxis].astype(np.float)
result = f(inputs=[test_input])[0]
print(np.squeeze(test_input))
# [0. 1. 2. 3. 4. 5. 6. 7. 8. 9.]
print(np.squeeze(result))
# [ 3. 6. 10. 14. 18. 22. 26. 30. 24. 17.]
As you can see for "same" padding input array was padded with 1 zero in the beginning and 2 zeros in the end : [0. 0. 1. 2. 3. 4. 5. 6. 7. 8. 9. 0. 0.]
. So for tensorflow even sized kernel centering will be following for 4-kernel: -1 0 +1 +2
and for 2*n
sized kernel: -(n-1), -(n-2),... -1, 0, +1,... +(n-1), +n,