0
votes

I learned from several articles that to compute the gradients for the filters, you just do a convolution with the input volume as input and the error matrix as the kernel. After that, you just subtract the filter weights by the gradients(multiplied by the learning rate). I implemented this process but it's not working.

I even tried doing the backpropagation process myself with pen and paper but the gradients I calculated doesn't make the filters perform any better. So am I understanding the whole process wrong?

Edit: I will provide an example of my understanding of the backpropagation in CNNs and the problem with it.

Consider a randomised input matrix for a convolutional layer:

1, 0, 1

0, 0, 1

1, 0, 0

And a randomised weight matrix:

1, 0

0, 1

The output would be (applied ReLU activator):

1, 1

0, 0

The target for this layer is a 2x2 matrix filled with zeros. This way, we know the weight matrix should be filled with zeros also.

Error:

-1, -1

0, 0

By applying the process as stated above, the gradients are:

-1, -1

1, 0

So the new weight matrix is:

2, 1

-1, 1

This is not getting anywhere. If I repeat the process, the filter weights just go to extremely high values. So I must have made a mistake somewhere. So what is it that I'm doing wrong?

1
"am I understanding the whole process wrong?" without a detailed example of how exactly you have understood it does not make much sense, and certainly does not make a valid SO question. Since as you say you have indeed implemented the process, please share the implementation here, otherwise your question is way too broad and vaguedesertnaut

1 Answers

0
votes

I'll give you a full example, not going to be short but hopefully you will get it. I'm omitting both bias and activation functions for simplicity, but once you get it it's simple enough to add those too. Remember, backpropagation is essentially the SAME in CNN as in a simple MLP, but instead of having multiplications you'll have convolutions. So, here's my sample:

Input:

.7 -.3 -.7 .5
.9 -.5 -.2 .9
-.1 .8 -.3 -.5
0 .2 -.1 .6

Kernel:

.1 -.3
-.5 .7

Doing the convolution yields (Result of 1st convolutional layer, and input for the 2nd convolutional layer):

.32 .27 -.59
.99 -.52 -.55
-.45 .64 .13

L2 Kernel:

-.5 .1
.3 .9

L2 activation:

.73 .29
.37 -.63

Here you would have a flatten layer and a standard MLP or SVM to do the actual classification. During backpropagation you'll recieve a delta which for fun let's assume is the following:

-.07 .15
-.09 .02

This will always be the same size as your activation before the flatten layer. Now, to calculate the kernel's delta for the current L2, you'll convolve L1's activation with the above delta. I'm not writting this down again but the result will be:

.17 .02
-.05 .13

Updating the kernel is done as L2.Kernel -= LR * ROT180(dL2.K), meaning you first rotate the above 2x2 matrix and then update the kernel. This for our toy example turns out to be:

-.51 .11
.3  .9

Now, to calculate the delta for the first convolutional layer, recall that in MLP you had the following: current_delta * current_weight_matrix. Well in Conv layer, you pretty much have the same. You have to convolve the original Kernel (before update) of L2 layer with your delta for the current layer. But this convolution will be a full convolution. The result turns out to be:

.04 -.08 .02
.02 -.13 .14
-.03 -.08 .01

With this you'll go for the 1st convolutional layer, and will convolve the original input with this 3x3 delta:

.16 .03
-.09 .16

And update your L1 kernel the same way as above:

.08 -.29
-.5 .68

Then you can start over from feeding forward. The above calculations were rounded to 2 decimal places and a learning rate of .1 was used for calculating the new kernel values.

TLDR:

  • You get a delta

  • You calculate the next delta that will be used for the next layer as: FullConvolution(Li.Input, delta)

  • Calculate the kernel delta that is used to update the kernel: Convolution(Li.W, delta)

  • Go to next layer and repeat.