1
votes

I was reading this interesting article on convolutional neural networks. It showed this image, explaining that for every receptive field of 5x5 pixels/neurons, a value for a hidden value is calculated.

receptive field to neuron

We can think of max-pooling as a way for the network to ask whether a given feature is found anywhere in a region of the image. It then throws away the exact positional information.

So max-pooling is applied.

enter image description here

With multiple convolutional layers, it looks something like this:

enter image description here

But my question is, this whole architecture could be build with perceptrons, right?

For every convolutional layer, one perceptron is needed, with layers:

input_size = 5x5;
hidden_size = 10; e.g.
output_size = 1;

Then for every receptive field in the original image, the 5x5 area is inputted into a perceptron to output the value of a neuron in the hidden layer. So basically doing this for every receptive field: enter image description here

So the same perceptron is used 24x24 amount of times to construct the hidden layer, because:

is that we're going to use the same weights and bias for each of the 24×24 hidden neurons.

And this works for the hidden layer to the pooling layer as well, input_size = 2x2; output_size = 1;. And in the case of a max-pool layer, it's just a max() function on an array.

and then finally:

The final layer of connections in the network is a fully-connected layer. That is, this layer connects every neuron from the max-pooled layer to every one of the 10 output neurons.

which is a perceptron again.

So my final architecture looks like this:

-> 1 perceptron for every convolutional layer/feature map
-> run this perceptron for every receptive field to create feature map
-> 1 perceptron for every pooling layer
-> run this perceptron for every field in the feature map to create a pooling layer
-> finally input the values of the pooling layer in a regular ALL to ALL perceptron

Or am I overseeing something? Or is this already how they are programmed?

1
Only correction I think is that the 5*5 receptive field is not a single perceptron. It is a grid of 25 perceptrons.Krishna Kishore Andhavarapu
@KrishnaKishoreAndhavarapu that comment makes no sense at all. The receptive fields consists of inputs/neurons. They get feeded into a perceptron.Thomas Wagenaar
ohh.. yes, my mistake. It's the other way round. There should be one perceptron for each 5*5 grid in the input image.Krishna Kishore Andhavarapu
@KrishnaKishoreAndhavarapu but the article says "is that we're going to use the same weights and bias for each of the 24×24 hidden neurons."Thomas Wagenaar
I think, my previous comment is badly written. I meant one perceptron for each 5*5 kernel applied on input image. That is basically what you have posted. I was confused when I put my first comment. Thanks.Krishna Kishore Andhavarapu

1 Answers

0
votes

The answer very much depends on what exactly you call a Perceptron. Common options are:

  1. Complete architecture. Then no, simply because it's by definition a different NN.

  2. A model of a single neuron, specifically y = 1 if (w.x + b) > 0 else 0, where x is the input of the neuron, w and b are its trainable parameters and w.b denotes the dot product. Then yes, you can force a bunch of these perceptrons to share weights and call it a CNN. You'll find variants of this idea being used in binary neural networks.

  3. A training algorithm, typically associated with the Perceptron architecture. This would make no sense to the question, because the learning algorithm is in principle orthogonal to the architecture. Though you cannot really use the Perceptron algorithm for anything with hidden layers, which would suggest no as the answer in this case.

  4. Loss function associated with the original Perceptron. This notion of Peceptron is orthogonal to the problem at hand, you're loss function with a CNN is given by whatever you try to do with your whole model. You can eventually use it, but it is non-differentiable, so good luck :-)

A sidenote rant: You can see people refer to feed-forward, fully-connected NNs with hidden layers as "Multilayer Perceptrons" (MLPs). This is a misnomer, there are no Perceptrons in MLPs, see e.g. this discussion on Wikipedia -- unless you go explore some really weird ideas. It would make sense call these networks as Multilayer Linear Logistic Regression, because that's what they used to be composed of. Up till like 6 years ago.