2
votes

http://i60.tinypic.com/no7tye.png Fig. 1 Convolutional Neural Network (LeNet5)

On Convolutional Neural Network (LeNet 5), Fig. 1 proceeding of Convolution (C1), Max Pooling(Subsampling) (S2,S4) layers are computed by iterative manneur. But I did not understood how correctly proceed C3 (Convolution) layer.

http://tinypic.com/r/fvzp86/8 Fig. 2 Proceeding C1 layer

Firstly as an input we recieve a MNIST 32*32 grayscale image of number, perceiving it as an Array of Bytes of size 32*32. In C1 layer we have 6 distinct(various) kernels filled with random small values. Each kernel from 1 to 6 is used to build 6 various feature maps (one kernel per one feature map). Moving receptive field of size 5*5 one 1 pixel stride (bias) from left to right, multiplying value in image Array on kernel value adding bias and passing through sigmoid function. The result is i,j of a current constructed feature map. Once we have reached the end of Image Array we finished building of current feature map.

http://i57.tinypic.com/rk0jk9.jpg Fig. 3 Proceeding S2 layer

Next we start to produce S2 layer, again there will be 6 feature maps, as we using 2*2 receptive field individually for each of 6 feature maps of C1 layer (using max pooling operations, selecting maximal value in 2*2 receptive field). Proceeding of C1,S2,S4 conducting on iterative manneur.

http://i58.tinypic.com/ifsidu.png Fig. 4 Connection list of C3 layer

But next we need to compute C3 layer. According to various papers there exist a connection map. Could you please say what is perceived under connection list? Does this mean that we will still use 5*5 receptive field as in C1 layer. And for example we see that in first row there is a marked feature maps corresponding to columns (0,4,5,6,9,10,11,12,14,15). Does this means that to construct 0,4,5,6,9,10,11,12,14,15 feature maps of C3 layer we will proceed convolutional operation under the first feature map of S2 layer with 5*5 receptive field. What concrete kernel will be used during convolutional operation, or again we need to randomly generate 16 kernels filled with small numbers as we did it in C1 layer. If yes we see that feature maps 0,4,5,6,9,10,11,12,14,15 of C3 colored in light grey, light grey, dark grey, light grey, dark grey, light grey, dark grey, light grey, light grey, dark grey. It can be clearly see that first feature map of S2 is light grey but only 0,4,6,10,12,14 are colored in light grey. So maybe the building of 16 feature maps in C3 proceeding by different way. Could you please say how also produce C5 layer, will it have some certain connection list?

2

2 Answers

2
votes

Disclaimer: I have just started with this topic so please do point out mistakes in my concept!

  1. In the original Lenet paper, on page 8, you can find a connection map that links different layers of S2 to layers of C3. This connection list tells us which layers of S2 are being convolved with the kernel(details coming up) to produce the layers of C3.

  2. You will notice that each layer of S2 is involved in producing exactly 10 (not all 16) layers of C3. This shows that the size of kernel is (5x5x6) x 10.

  3. In C1 we had a (5x5) x 6 kernel i.e. 5x5 with 6 feature maps. This is 2D convolution. In C3 we have (5x5x6) x 10 kernel i.e. a "kernel-box" with 10 feature maps. These 10 feature maps and the kernel-box combine to produce 16 layers rather than 6 as these are not fully connected.

  4. Regarding generation of kernel weights, it depends on the algo. It can be random, pre-defined or using some scheme e.g. xavier in caffe.

What confused me is that the kernel details are not well defined and have to be derived from the given information.

Update: How is C5 Produced?

Layer C5 is a convolutional layer with 120 feature maps. C5 feature maps have size of 1x1 as a 5x5 kernel is applied on S4. In the case of a 32x32 input, we can also say that S4 and C5 are fully connected. Size of Kernel applied on S4 to get C5 is (5x5x16) x 120 (bias not shown). Details on how these 120 kernel-boxes connect to S4 are not given explicitly in the paper. However, as a hint, it is mentioned that S4 and C5 are fully connected.

0
votes

The key point in the paper concerning "C5" seems to be that the 5x5 kernel is applied to ALL 16 or S4's feature maps - a fully connected layer.

"Each unit is connected to a 5x5 neighborhood on all 16 of S4's feature maps".

Since we have 120 output units, we should have 120 bias unit connections (or else the architecture details don't tally).

We then connect all the 25x16 input units to produce one of the feature map outputs.

So in total we have

num_connections = (25x16+1)x120 = 48000+120 = 48120