4
votes

I've been studying machine learning for 4 months, and I understand the concepts behind the MLP. The problem came when I started reading about Convolutional Neural Networks. Let me tell you what I know and then ask what I'm having trouble with.

The core parts of a CNN are:

  • Convolutional Layer: you have "n" number of filters that you use to generate "n" feature maps.
  • RELU Layer: you use it for normalizing the output of the convolutional layer.
  • Sub-sampling Layer: used for "generating" a new feature map that represents more abstract concepts.

Repeat the first 3 layers some times and the last part is a common Classifier, such as a MLP.

My doubts are the following:

  1. How do I create the filters used in the Convolutional Layer? Do I have to create a filter, train it, and then put it in the Conv Layer, or do I train it with the backpropagation algorithm?
  2. Imagine I have a conv layer with 3 filters, then it will output 3 feature maps. After applying the RELU and Sub-sampling layer, I will still have 3 feature maps (smaller ones). When passing again through the Conv Layer, how do I calculate the output? Do I have to apply the filter in each feature map separately, or do some kind of operation over the 3 feature maps and then make the sum? I don't have any idea of how to calculate the output of this second Conv Layer, and how many feature maps it will output.
  3. How do I pass the data from the Conv layers to the MLP (for classification in the last part of the NN)?

If someone knows of a simple implementation of a CNN without using a framework I will appreciate it. I think the best way of learning how stuff works is by doing it by yourself. In another time, when you already know how stuff works, you can use frameworks, because they save you a lot of time.

1

1 Answers

1
votes
  1. You train it with backpropagation algorithm, the same way as you train MLP.
  2. You apply each filter separately. For example if you have 10 feature maps in the first layer and the filter shape of one of the feature maps from the second layer is 3*3, then you apply 3*3 filter to each of the ten feature maps in the first layer, weights for each feature map are different, in this case one filter will have 3*3*10 weights. To understand it easier, keep in mind that a pixel of a non-grayscale image has three values - red, green and blue, so if you're passing images to a convolutional neural network ,then in the input layer you alredy have 3 feature maps(for RGB), so one value in the next layer will be connected too all 3 feature maps in the first layer The following image demonstrates it
  3. You should flatten the convolutional feature maps, for example if you have 10 feature maps with the size of 5*5, then you will have a layer with 250 values and then nothing different from MLP, you connect all of these artificial neurons to all of the artificial neurons in the next layer by weights.

Here someone has implemented convolutional neural network without frameworks.

I would also recommend you those lectures.