I read about convolutional neural networks from here. Then I started playing with torch7. I am having confusion with the convolutional layer of a CNN.
From the tutorial,
1
The neurons in a layer will only be connected to a small region of the layer before it, instead of all of the neurons in a fully-connected manner.
2
For example, suppose that the input volume has size [32x32x3], (e.g. an RGB CIFAR-10 image). If the receptive field is of size 5x5, then each neuron in the Conv Layer will have weights to a [5x5x3] region in the input volume, for a total of 5*5*3 = 75 weights.
3
if the input layer is [32x32x3], CONV layer will compute the output of neurons that are connected to local regions in the input, each computing a dot product between their weights and the region they are connected to in the input volume. This may result in volume such as [32x32x12].
I started playing with what a CONV layer might do to an image. I did that in torch7. Here is my implementation,
require 'image'
require 'nn'
i = image.lena()
model = nn.Sequential()
model:add(nn.SpatialConvolutionMM(3, 10, 5, 5)) --depth = 3, #output layer = 10, filter = 5x5
res = model:forward(i)
itorch.image(res)
print(#i)
print(#res)
3
512
512
[torch.LongStorage of size 3]
10
508
508
[torch.LongStorage of size 3]
Now lets see the structure of a CNN
So, my questions are,
Question 1
Is the convolution done like this - lets say we take an image 32x32x3. And there is 5x5 filter. Then the 5x5 filter will pass through the whole 32x32 image and produce the convoluted images? Okay, so sliding 5x5 filter across the whole image, we get one image, if there are 10 output layers, we get 10 images(as you see from the output). How do we get these? (see the image for clarification if required)
Question 2
What is the number of neurons in the conv layer? Is it the number of output layers? In the code I've written above, model:add(nn.SpatialConvolutionMM(3, 10, 5, 5))
. Is it 10? (no. of output layers?)
If so the point number 2 does not make any sense. According to that If the receptive field is of size 5x5, then each neuron in the Conv Layer will have weights to a [5x5x3] region in the input volume, for a total of 5*5*3 = 75 weights.
So what will be the weight here? I am greatly confused in this. In the model defined in torch, there is no weight. So how the weight is playing a role here?
Can someone explain what is going on?